Delimited Extract (Table Extract Method)

From Grooper Wiki
Revision as of 15:22, 1 June 2021 by Dgreenwood (talk | contribs)

Delimited Extract is one of the Extract Method options for Data Table objects in a Data Model. This method extracts tabular data from a delimiter-separated text file, such as a CSV file.

This is the fastest, simplest and most effective method of extracting data from character delimited files, such as comma delimited CSV files or TXT files delimited by commas or other characters.

FYI

Delimited Extract is new to version 2021. In older versions, this functionality was achieved with the CSV Extract table extraction method. Delimited Extract expands the CSV Extract functionality to extract tabular data from TXT files that delimiter characters besides a comma, such as semicolons or pipe characters.

About

A delimited text file is, effectively, a table of information saved to a simple text format. Rows in the table are easy to represent in a text file. Each row is a new line in the file. Columns are little trickier. They are represented by a "delimiter" character. This character is used to define where the table columns are in the text file.

For example, CSV (comma-separated values) files use a comma (,) as a delimiter. When opened in a spreadsheet program, such as Microsoft Excel or Google Sheets, these files look like standard spreadsheets. So, a CSV file with a table with some generic information collected for individuals might look something like the table below.

First Name Last Name Email SSN Favorite Animal
Aaron Aaronson aaron@example.com 987-65-4321 aardvark
Billy Billiamson 999-88-7777 billy goat
Cathy Catherton cat@example.com 900-00-1234 dog

But a delimited file is just text data. There are no lines dividing the rows and columns. Instead, the vertical line boundaries between each column would be replaced with the delimiter character (a , for CSV files) and each row would be a new line in the file. So, it would look like the text below.

First Name,Last Name,Email,SSN,Favorite Animal
Aaron,Aaronson,aaron@example.com,987-65-4321,aardvark
Billy,Billiamson,,999-88-7777,billy goat
Cathy,Catherton,cat@example.com,900-00-1234,dog

TXT files can also be formatted as delimiter-separated tables. They may use a comma (,) or a different separator such as a semicolon (;) or a "pipe" character (|). For example, a "pipe-delimited" TXT file version of the data above would look like the text below.

First Name|Last Name|Email|SSN|Favorite Animal
Aaron|Aaronson|aaron@example.com|987-65-4321|aardvark
Billy|Billiamson||999-88-7777|billy goat
Cathy,Catherton|cat@example.com|900-00-1234|dog

The only difference is one file is a CSV file, using a comma (,) to distinguish the columns and the other is a TXT file, using the pipe character (|).


The Delimited Extract table extraction method will parse these kinds of files and return their data to a Data Table object in a Data Model.

  1. For example, here we have a Data Table named "People Data" configured to extract tabular data from a CSV file.
  2. You can see the attached file to this Batch Folder is a CSV file named "PEOPLE_DATA_COMMA.csv".
  3. In the Document Viewer, we can see this is a simple CSV file with the data described above.
    • Each column ("First Name", "Last Name", "Email", "SSN", and "Favorite Animal") is separated by a single , character.
  4. The Data Table's Extract Method property is set to Delimited Extract.
  5. The Delimited Extract sub-properties are configured according to the extracted file's type and delimited format.
    • These properties dictate what the imported document's filetype is (i.e. CSV or TXT), what delimiter character is used, how the data is mapped to the Data Table object's Data Columns.
    • These settings will be discussed in more depth, in the How To section of this article.
  6. Upon extraction, the Data Table returns the CSV file's data.

Currently, Grooper does not support "tabs" as a delimiter character. At the time of writing this article, the Delimited Extract method cannot extract data from a tab-delimited TXT file. Alternate Extract Methods, such as the Row Match method, must be used instead.

How To

<tabs style="margin:20px"> <tab name="Prereqs" style="margin:20px">

Prereqs

If you are testing Delimited Extract in Grooper Design Studio and are not returning any results or receiving an error message, read this first.

This will address the two most common issues when testing the Delimited Extract method.