Delimited Extract (Table Extract Method)

Delimited Extract is one of the Extract Method options for Data Table objects in a Data Model. This method extracts tabular data from a delimiter-separated text file, such as a CSV file.

This is the fastest, simplest and most effective method of extracting data from character delimited files, such as comma delimited CSV files or TXT files delimited by commas or other characters.

FYI

Delimited Extract is new to version 2021. In older versions, this functionality was achieved with the CSV Extract table extraction method. Delimited Extract expands the CSV Extract functionality to extract tabular data from TXT files that delimiter characters besides a comma, such as semicolons or pipe characters.

About

A delimited text file is, effectively, a table of information saved to a simple text format. Rows in the table are easy to represent in a text file. Each row is a new line in the file. Columns are little trickier. They are represented by a "delimiter" character. This character is used to define where the table columns are in the text file.

For example, CSV (comma-separated values) files use a comma (,) as a delimiter. When opened in a spreadsheet program, such as Microsoft Excel or Google Sheets, these files look like standard spreadsheets. So, a CSV file with a table with some generic information collected for individuals might look something like the table below.

First Name	Last Name	Email	SSN	Favorite Animal
Aaron	Aaronson	aaron@example.com	987-65-4321	aardvark
Billy	Billiams		999-88-7777	billy goat
Cathy	Catherton	cat@example.com	900-00-1234	dog

But a delimited file is just text data. There are no lines dividing the rows and columns. Instead, the vertical line boundaries between each column would be replaced with the delimiter character (a , for CSV files) and each row would be a new line in the file. So, it would look like the text below.

First Name,Last Name,Email,SSN,Favorite Animal
Aaron,Aaronson,aaron@example.com,987-65-4321,aardvark
Billy,Billiams,,999-88-7777,billy goat
Cathy,Catherton,cat@example.com,900-00-1234,dog

TXT files can also be formatted as delimiter-separated tables. They may use a comma (,) or a different separator such as a semicolon (;) or a "pipe" character (|). For example, a "pipe-delimited" TXT file version of the data above would look like the text below.

First Name|Last Name|Email|SSN|Favorite Animal
Aaron|Aaronson|aaron@example.com|987-65-4321|aardvark
Billy|Billiams||999-88-7777|billy goat
Cathy,Catherton|cat@example.com|900-00-1234|dog

The only difference is one file is a CSV file, using a comma (,) to distinguish the columns and the other is a TXT file, using the pipe character (|).

The Delimited Extract table extraction method will parse these kinds of files and return their data to a Data Table object in a Data Model.

For example, here we have a Data Table named "People Data" configured to extract tabular data from a CSV file.
You can see the attached file to this Batch Folder is a CSV file named "PEOPLE_DATA_COMMA.csv".
In the Document Viewer, we can see this is a simple CSV file with the data described above.
- Each column ("First Name", "Last Name", "Email", "SSN", and "Favorite Animal") is separated by a single , character.
The Data Table's Extract Method property is set to Delimited Extract.
The Delimited Extract sub-properties are configured according to the extracted file's type and delimited format.
- These properties dictate what the imported document's filetype is (i.e. CSV or TXT), what delimiter character is used, how the data is mapped to the Data Table object's Data Columns.
- These settings will be discussed in more depth, in the How To section of this article.
Upon extraction, the Data Table returns the CSV file's data.

⚠	Currently, Grooper does not support "tabs" as a delimiter character. At the time of writing this article, the Delimited Extract method cannot extract data from a tab-delimited TXT file. Alternate Extract Methods, such as the Row Match method, must be used instead.

How To

Prereqs

⚠	If you are testing Delimited Extract in Grooper Design Studio and are not returning any results or receiving an error message, read this first. This will address the most common issues when testing the Delimited Extract method.

There are three critical perquisites in order for the Delimited Extract method to work properly, whether testing its configuration in Grooper Design Studio or collecting the data during the Extract activity of a Batch Process.

The Batch Folder must have an attached character-delimited file of a CSV or TXT format.
- Excel files (XLSX) are capable of utilizing the Delimited Extract method. However, they must be converted to a CSV format first.
- For more information on converting Excel files to CSV files, visit the Microsoft Office Integration article.
If imported sparsely, the Batch Folder's attached CSV or TXT file must be loaded prior to extraction.
The Batch Folder must be classified.

1. Attached File Considerations2. Load the File3. Classify the Document Folder

1. The Batch Folder must have an attached file of a CSV or TXT format.

When you import digital files into Grooper (ie PDFs, Word documents, Excel spreadsheets, TXT files, etc), they are stored as attachments on the created Batch Folder in the Batch. When inspecting a Batch, the attached files are listed at the bottom of the Batch Folder icon.

The documents here have CSV and a character-delimited TXT file attached to the Batch Folders. These documents are suitable for Delimited Extract.
Microsoft Excel files can also take advantage of Delimited Extract. However, they must first be converted to the CSV file format.
Excel files (XLSX) are not natively readable by Grooper. Without converting them to a CSV file (which is natively readable), you will see this message in the document viewer.

From Grooper Design Studio , you can convert a single XLSX file quickly and easily if you're trying to test out your Data Table's Delimited Extract configuration. You can perform the following steps from the Batch Viewer at any node in the Node Tree. If you can get to a Batch Viewer, and select a batch using the node's UI, you can do the following to convert an Excel file to CSV.

Right-click the document folder with the attached Excel file.
Select "Excel Document".
Select "Excel to CSV..."
This will bring up the following "Excel to CSV" configuration window.
Using the Conversion Method property choose how you want to convert the file.
- Convert will convert the attached Excel file, replacing it with a CSV file.
- Burst will convert the file and save the results as one or more (in the case of Excel files with multiple sheets) children Batch Folders.
- SaveNew will convert the file and save it as a new, additional file attached to the Batch Folder.
Press the "Execute" button to convert the Excel file to a CSV file.

Since we chose the Convert option, the attached file is converted, replacing the XLSX file with a CSV file.
CSV files are natively readable by Grooper. Now we can view the file in the Document Viewer and Grooper can process the document.
This is now an extractable file. Upon extraction, the Data Table (using the Delimited Extract method) returns the comma-separated values.

FYI

This is a very "ad hoc" method of converting Excel files to CSV files. You can also perform this conversion using the Execute activity, selecting Excel Document as the Object Type and choosing Excel to CSV as the Command. You will use the Execute activity to automate converting Excel files in a Batch Process. For more information, visit the Microsoft Office Integration article.

Click to return to the top

2. If imported sparsely, the Batch Folder's attached CSV or TXT file must be loaded prior to extraction.

Digital files are imported into Grooper either "sparsely" or "fully".

Fully imported files are fully loaded into the Grooper Repository. Their metadata properties (file name and extension, file dimensions, MIME type information, etc) are stored in the Grooper Database and their content (the full file itself) is copied to the Grooper File Store.
Sparsely imported files are only partially loaded. Their metadata properties are stored in the Grooper Database, but their content is not copied to the Grooper File Store. Instead the content is accessed with a link to the file's location (for example, if you import a file from your Windows hard drive, it would be its Windows folder location).

Typically, a sparsely imported document is described as a "usable" document in Grooper. You can classify it. You can extract data from it. You can apply Activities to it and it will behave as if it were fully loaded. Just instead of using a local copy in the Grooper File Store to process the document, Grooper travels the document link to access the file.

However, this is not the case for Delimited Extract. The attached file must be loaded (copied into the Grooper File Store) in order to extract any data.

A sparsely imported file will be indicated by a gold chain link on the Batch Folder icon.
If you attempt to extract the data using Delimited Extract, nothing will return at this point.

From Grooper Design Studio , you can load file quickly and easily if you're trying to test out your Data Table's Delimited Extract configuration. You can perform the following steps from the Batch Viewer at any node in the Node Tree. If you can get to a Batch Viewer, and select a batch using the node's UI, you can do the following to load a Batch Folder's attached file (be it a CSV file, a TXT file, an XLSX file, a PDF file, or any other file).

Right-click the document folder with the attached file.
Select "File System Link".
Select "Load Content..."
This will bring up the following "Load Content" configuration window.
Press the "Execute" button to load the file into the Grooper File Store.

Upon loading the content, the chain link will change color from grey to silver.
- The silver link indicates the content is loaded but Grooper retains a link to the file's location. This link can be used in the future if the content needs to be reloaded or otherwise accessed from its external location. Alternatively the link can be removed if no longer needed once the content is fully loaded into Grooper.
With the content loaded, the data can now be extracted using Delimited Extract.

FYI

This too is a very "ad hoc" method of loading files. You do the same thing using the Execute activity, selecting File System Link as the Object Type (or CMIS Document Link, FTP Link, Mail Link or one of the other Link options depending how the file was imported from what external storage platform) and choosing Load Content as the Command. You will use the Execute activity to automate foading files in a Batch Process.

Click to return to the top

3. The Batch Folder must be classified.

When building a Batch Process you will always add an Extract activity after a Classify activity or some other operation classifies the Batch Folders in a Batch. This is because the Extract activity looks for the Content Type assigned to the document folder to access the Data Model configured to collect data from the document.

Typically, document folders do not need to be classified when performing unit testing in Grooper Design Studio. You can navigate to a Data Model or its child Data Elements in the Node Tree, select an unclassified Batch Folder object in a Batch, press the "Test Extraction" button and a preview of the extracted data will be presented in the UI.

This is not the case when testing the Delimited Extract method! The Batch Folder must be classified (assigned a Document Type) in order to test extraction.

These documents are not classified. They do not have a Document Type assigned to them. They are generically listed as "Document".
Upon testing the Data Table (using the Delimited Extract method), no data returns.

These documents are classified. They have either the "CSV Report" or "TXT Report" Document Type (added to this Content Model) assigned to them.
As a classified document, upon testing the Data Table (using the Delimited Extract method), the table data is returned.

Click to return to the top

Delimited Extract Configuration

Enable Delimited ExtractChoose the Mapping MethodFilename Pattern and Delimiter Considerations

Enable Delimited Extract

To use Delimited Extract you will choose this as the Extract Method property of a Data Table

Add or select a Data Table in the Node Tree.
- We've selected the Data Table named "People Data". It is a Data Element of the "CSV Report" Document Type's Data Model.
Select the Extract Method property.
Choose Delimited Extract from the list of table extraction methods.

Choose the Mapping Method

With Delimited Extract enabled, arguably the most important consideration is how you configure the Mapping Method property. This property determines how the "columns" in the delimiter-separated file correspond to the Data Columns in your Data Table. This can be one of three options:

Named
Positional
Mapped

Named

The Named option will use column headers in the delimiter-separated file to map columns to the Data Columns in our Data Table. As long as the names match, Grooper will populate the Data Column with the data in the correspondingly named column in the CSV or TXT file.

For example, the Data Table in our example has five Data Columns.

First Name
Last Name
Email
Social
Animal

The CSV files in our batch also have five columns.

However, some of the names are the same and some are different.

Only the "First Name", "Last Name" and "Email" columns match.

The "Social" and "Animal" Data Columns are labeled "SSN" and "Favorite Animal" in the CSV file.

If we were to test extraction of this CSV file using the Named method, only part of the delimiter-separated values will be collected. Those in the columns whose names match will be returned, but the columns whose names do not will not.

The default Mapping Method option is Named
Upon extraction, Grooper will search the first row of the file for the column header labels.
- The column headers must be present on the first row in order for this method to work.
If a column header matches the name of one of the Data Table's child Data Columns, that Data Column will return that column's data in the delimiter-separated file.
- In our case, the names for the "First Name", "Last Name", and "Email" columns match. So that data is returned to the correspondingly named Data Column.
- This is also order independent. If "Email" was the first Data Column instead of the third, it would still return the data from the "Email" column on the document because their names still match.
The "Social" and "Animal" Data Columns are blank because their names do not match.

If we changed the name of our Data Columns to match (both be named "SSN" and "Favorite Animal" respectively), we would get the full table populated.

However, let's completely ignore that and look at the next Mapping Method option, Positional.

FYI

Before we do that, you probably noticed the first line of the table (the one with the column headers) is returned along with the actual table data. This is probably "junk data" you don't ultimately want collected at time of extraction.

This is what the Skip Rows property is for.
Set this to 1 and it will skip the first row in the file.
This will remove that header row from our extracted results, extracting only the second and beyond.

Positional

The document from the previous example is actually a good candidate for the Positional Mapping Method.

This method does not rely on how columns are named but instead how they are ordered. It presumes the first column in the delimiter-separated file is the Data Table's first child Data Column.