Delimited Extract (Table Extract Method): Difference between revisions

From Grooper Wiki
No edit summary
 
(14 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<blockquote style="font-size:14pt">
{{AutoVersion}}
''Delimited Extract'' is one of the '''''Extract Method''''' options for '''Data Table''' objects in a '''Data Model'''.  This method extracts tabular data from a delimiter-separated text file, such as a CSV file.
</blockquote>


This is the fastest, simplest and most effective method of extracting data from character delimited files, such as comma delimited CSV files or TXT files delimited by commas or other characters.
<blockquote>{{#lst:Glossary|Delimited Extract}}</blockquote>


{|cellpadding="10" cellspacing="5"
{|class="download-box"
|-style="background-color:#36b0a7; color:white"
|style="font-size:14pt"|'''FYI'''
|
|
''Delimited Extract'' is new to version '''2021'''.  In older versions, this functionality was achieved with the ''CSV Extract'' table extraction method.  ''Delimited Extract'' expands the ''CSV Extract'' functionality to extract tabular data from TXT files that delimiter characters besides a comma, such as semicolons or pipe characters.
[[File:Asset 22@4x.png]]
|}
 
== About ==
 
A delimited text file is, effectively, a table of information saved to a simple text format.  Rows in the table are easy to represent in a text file.  Each row is a new line in the file.  Columns are little trickier.  They are represented by a "delimiter" character.  This character is used to define where the table columns are in the text file. 
 
For example, CSV (comma-separated values) files use a comma (<code>,</code>) as a delimiter.  When opened in a spreadsheet program, such as Microsoft Excel or Google Sheets, these files look like standard spreadsheets.  So, a CSV file with a table with some generic information collected for individuals might look something like the table below.
 
{|class="wikitable" style="margin:auto"
|First Name||Last Name||Email||SSN||Favorite Animal
|-
|Aaron||Aaronson||aaron@example.com||987-65-4321||aardvark
|-
|Billy||Billiams|| ||999-88-7777||billy goat
|-
|Cathy||Catherton||cat@example.com||900-00-1234||dog
|}
 
But a delimited file is just text data.  There are no lines dividing the rows and columns.  Instead, the vertical line boundaries between each column would be replaced with the delimiter character (a <code>,</code> for CSV files) and each row would be a new line in the file.  So, it would look like the text below.
 
<pre style="margin-left:50px; margin-right:50px">
First Name,Last Name,Email,SSN,Favorite Animal
Aaron,Aaronson,aaron@example.com,987-65-4321,aardvark
Billy,Billiams,,999-88-7777,billy goat
Cathy,Catherton,cat@example.com,900-00-1234,dog
</pre>
 
TXT files can also be formatted as delimiter-separated tables.  They may use a comma (<code>,</code>) or a different separator such as a semicolon (<code>;</code>) or a "pipe" character (<code>|</code>).  For example, a "pipe-delimited" TXT file version of the data above would look like the text below.
 
<pre style="margin-left:50px; margin-right:50px">
First Name|Last Name|Email|SSN|Favorite Animal
Aaron|Aaronson|aaron@example.com|987-65-4321|aardvark
Billy|Billiams||999-88-7777|billy goat
Cathy,Catherton|cat@example.com|900-00-1234|dog
</pre>
 
The only difference is one file is a CSV file, using a comma (<code>,</code>) to distinguish the columns and the other is a TXT file, using the pipe character (<code>|</code>).
 
 
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
The ''Delimited Extract'' table extraction method will parse these kinds of files and return their data to a '''Data Table''' object in a '''Data Model'''.
 
# For example, here we have a '''Data Table''' named "People Data" configured to extract tabular data from a CSV file.
# You can see the attached file to this '''Batch Folder''' is a CSV file named "PEOPLE_DATA_COMMA.csv".
# In the Document Viewer, we can see this is a simple CSV file with the data described above.
#* Each column ("First Name", "Last Name", "Email", "SSN", and "Favorite Animal") is separated by a single <code>,</code> character.
# The '''Data Table's''' '''''Extract Method''''' property is set to ''Delimited Extract''.
# The ''Delimited Extract'' sub-properties are configured according to the extracted file's type and delimited format.
#* These properties dictate what the imported document's filetype is (i.e. CSV or TXT), what delimiter character is used, how the data is mapped to the '''Data Table''' object's '''Data Columns'''.
#* These settings will be discussed in more depth, in the [[#How To|How To]] section of this article.
# Upon extraction, the '''Data Table''' returns the CSV file's data.
|
|
[[File:Delimited-extract-01.png]]
You may download the ZIP(s) below and upload it into your own Grooper environment (version 2025). The first contains one or more Batches of sample documents.  The second contains one or more Projects with resources used in examples throughout this article.
* [[Media:2025_Wiki_Delimited-Extract_Batch.zip]]
* [[Media:2025_Wiki_Delimited-Extract_Project.zip]]
|}
|}


{|cellpadding="10" cellspacing="5"
== Introduction ==
|-style="background-color:#f89420; color:white"
'''Delimited Extract''' is a Table Extract Method that reads and parses delimited text files (for example CSV, TSV, or pipe-delimited) and maps fields into a [[Data Table|Data Table's]] [[Data Column|Data Columns]]. Instead of reading from page layout or OCR, it ingests a text file attached to the corresponding [[Batch Folder]] and converts each parsed line into a table row.
|style="font-size:22pt"|'''&#9888;'''
|
Currently, Grooper does not support "tabs" as a delimiter character. At the time of writing this article, the ''Delimited Extract'' method cannot extract data from a tab-delimited TXT file.  Alternate '''''Extract Methods''''', such as the ''Row Match'' method, must be used instead.
|}


== How To ==
How it differs from other table extract methods:
* '''[[Tabular Layout]]''' / '''[[Grid Layout]]''' operate on page layout, headers, and geometric alignment; Delimited Extract does not use document layout.
* '''[[Row Match]]''' and '''[[Fixed Width]]''' parse text in the document region; Delimited Extract reads from an external delimited file.
* '''[[Fluid Layout]]''' switches between Tabular Layout and Row Match based on labels; Delimited Extract is purpose-built for delimited files.
* '''[[AI Table Reader]]''' uses a large language model to interpret semi-structured tables; Delimited Extract is rules-based, deterministic parsing of delimited inputs.


== What it is for ==
Use Delimited Extract when your tabular data arrives as a delimited text file (e.g., exports, system feeds, or data interchanges) stored on the Batch Folder. Typical scenarios:
* Accounting exports: <code>InvoiceNo,Date,Amount</code>
* System logs or reports delivered as CSV/TSV
* Partner data feeds using pipe (<code>|</code>) or semicolon delimiters


'''Benefits'''
* Simple, deterministic parsing for consistent, well-formed files
* Flexible column mapping: by header name, by position, or via a mapping lexicon
* Robust quoting and optional whitespace trimming
* Minimal configuration for standard CSV


=== Prereqs ===
'''Drawbacks'''
* Requires a delimited file stored on the Batch Folder; it does not analyze page layout or OCR
* Sensitive to delimiter and quoting configuration—misconfiguration leads to misaligned columns
* No label-based header/footer detection (unlike page-layout methods)


{|cellpadding="10" cellspacing="5"
== How to add and configure Delimited Extract ==
|-style="background-color:#f89420; color:white"
Following are instructions for general setup of the Delimited Extract Table Extract Method.
|style="font-size:22pt"|'''&#9888;'''
{|class="fyi-box"
|
If you are testing ''Delimited Extract'' in '''Grooper Design Studio''' and are not returning ''any'' results or receiving an error message, read this first.
 
This will address the most common issues when testing the ''Delimited Extract'' method.
|}
 
There are three critical perquisites in order for the ''Delimited Extract'' method to work properly, whether testing its configuration in '''Grooper Design Studio''' or collecting the data during the '''Extract''' activity of a '''Batch Process'''.
# The '''Batch Folder''' must have an attached character-delimited file of a CSV or TXT format.
#* Excel files (XLSX) are capable of utilizing the ''Delimited Extract'' method.  However, they must be converted to a CSV format first.
#* For more information on converting Excel files to CSV files, visit the [[Microsoft Office Integration]] article.
# If imported sparsely, the '''Batch Folder's''' attached CSV or TXT file must be loaded prior to extraction.
# The '''Batch Folder''' must be classified.
 
<tabs style="margin:20px">
<tab name="1. Attached File Considerations" style="margin:20px">
 
=== 1. The Batch Folder must have an attached file of a CSV or TXT format. ===
 
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
When you import digital files into '''Grooper''' (ie PDFs, Word documents, Excel spreadsheets, TXT files, etc), they are stored as attachments on the created '''Batch Folder''' in the '''Batch'''.  When inspecting a '''Batch''', the attached files are listed at the bottom of the '''Batch Folder''' icon.
 
# The documents here have CSV and a character-delimited TXT file attached to the '''Batch Folders'''.  These documents are suitable for ''Delimited Extract''.
# Microsoft Excel files can ''also'' take advantage of ''Delimited Extract''.  However, they must first be converted to the CSV file format.
# Excel files (XLSX) are not natively readable by '''Grooper'''.  Without converting them to a CSV file (which is natively readable), you will see this message in the document viewer.
|
[[File:Delimited-extract-03.png]]
|-
|-
|valign=top|
From '''Grooper Design Studio''' , you can convert a single XLSX file quickly and easily if you're trying to test out your '''Data Table's''' ''Delimited Extract'' configuration.  You can perform the following steps from the Batch Viewer at any node in the Node Tree.  If you can get to a Batch Viewer, and select a batch using the node's UI, you can do the following to convert an Excel file to CSV.
# Right-click the document folder with the attached Excel file.
# Select "Excel Document".
# Select "Excel to CSV..."
# This will bring up the following "Excel to CSV" configuration window.
# Using the '''''Conversion Method''''' property choose how you want to convert the file.
#* ''Convert'' will convert the attached Excel file, replacing it with a CSV file.
#* ''Burst'' will convert the file and save the results as one or more (in the case of Excel files with multiple sheets) children '''Batch Folders'''.
#* ''SaveNew'' will convert the file and save it as a new, additional file attached to the '''Batch Folder'''.
# Press the "Execute" button to convert the Excel file to a CSV file.
|
|
[[File:Delimited-extract-04.png]]
'''FYI'''
|-
|valign=top|
# Since we chose the ''Convert'' option, the attached file is converted, replacing the XLSX file with a CSV file.
# CSV files are natively readable by '''Grooper'''.  Now we can view the file in the Document Viewer and '''Grooper''' can process the document.
# This is now an extractable file. Upon extraction, the '''Data Table''' (using the ''Delimited Extract'' method) returns the comma-separated values.
|
|
[[File:Delimited-extract-05.png]]
'''Please see the demos below for example setups with screenshots and highlighted instructions.'''
|}
|}


{|cellpadding="10" cellspacing="5"
# Create or open a [[Data Model]] and add a Data Table for your repeating records (for example, '''Line Items''').
|-style="background-color:#36b0a7; color:white"
# Add one or more Data Columns under the Data Table (for example, '''Name''', '''Date''', '''Amount'''). Name the columns exactly as you want them to appear in the output.
|style="font-size:14pt"|'''FYI'''
# Select the Data Table and set its "Extract Method" to '''Delimited Extract'''.
|
# Configure Delimited Extract properties:
This is a very "ad hoc" method of converting Excel files to CSV files.  You can also perform this conversion using the '''Execute''' activity, selecting ''Excel Document'' as the '''''Object Type''''' and choosing ''Excel to CSV'' as the '''''Command'''''. You will use the '''Execute''' activity to automate converting Excel files in a '''Batch Process'''. For more information, visit the [[Microsoft Office Integration]] article.
#* Choose the correct "Delimiter" (for example, <code>,</code>, <code>\t</code>, or <code>|</code>).
|}
#* Decide the "Mapping Method":
#** '''Named''' if the file has a header row whose values match your Data Column names.
#** '''Positional''' if you want to map by column order.
#** '''Mapped''' if you need a lexicon to translate CSV headers (or numeric indices) to Data Column names.
#* If using '''Named''' or '''Mapped''', confirm "Header Row" (1-based) and adjust "Skip Rows" if introductory lines should be ignored before data rows.
#* Enable or disable "Quoted" and "Trim Whitespace" to match your file's conventions.
#* Ensure "Filename Pattern" matches the file stored on the Batch Folder (default matches <code>.csv</code>).
# <li value=5> Attach or drop the delimited file onto the Batch Folder so it is available during extraction.</li>
# Classify the document to a Content Type that uses the Data Model that contains the Delimited Extract configured Data Table.
# Test your setup:
#* Run a Batch through an Extract step or use the Tester tab of the Data Table or Data Model.
#* If values look misaligned, verify delimiter, header row, mapping strategy, and quoting settings.
#* Use diagnostics to review any parsing errors and row counts.


[[#Prereqs|Click to return to the top]]
'''Tips'''
</tab>
* For TSV files set "Delimiter" to <code>\t</code>.
<tab name="2. Load the File" style="margin:20px">
* For pipe-delimited files set "Delimiter" to <code>|</code>.
=== 2. If imported sparsely, the '''Batch Folder's''' attached CSV or TXT file must be loaded prior to extraction. ===
* If headers in the file do not exactly match your Data Column names, use '''Mapped''' and provide a mapping lexicon.
* Use the "Convert to CSV" object command to convert XLSX files to CSV before attempting to use Delimited Extract on them.


Digital files are imported into '''Grooper''' either "sparsely" or "fully".
<div style="padding-left: 1.5em;">
* Fully imported files are fully loaded into the '''Grooper Repository'''.  Their metadata properties (file name and extension, file dimensions, MIME type information, etc) are stored in the '''Grooper Database''' and their content (the full file itself) is copied to the '''Grooper File Store'''.
=== Example: Excel Document - Convert to CSV ===
* Sparsely imported files are only partially loaded.  Their metadata properties are stored in the '''Grooper Database''', but their content is ''not'' copied to the '''Grooper File Store'''. Instead the content is accessed with a link to the file's location (for example, if you import a file from your Windows hard drive, it would be its Windows folder location).
When working with CSV files it is common to come across XLSX files that are native to Microsoft Excel. Grooper cannot natively read XLSX files, so as a result, Delimited Extract will not work with this file type. It is easy enough, however, to convert XSLX files to CSV in Grooper. A simple object command will get the job done.


Typically, a sparsely imported document is described as a "usable" document in '''Grooper'''. You can classify it. You can extract data from it.  You can apply '''Activities''' to it and it will behave as if it were fully loaded. Just instead of using a local copy in the '''Grooper File Store''' to process the document, '''Grooper''' travels the document link to access the file.
# XLSX is a common file type that might be expected to be used for Delimited Extract. Expand the Node Tree and select the provided "Delimited-Extract" Batch, then click the Viewer tab.
# Select the third Batch Folder.
#* You'll notice it is an XLSX file type. The XLSX file type cannot be natively read by Grooper, so it's data will seem incomprehensible.
# Right-click the XLSX document. In the pop-out menu hover over "Excel Document", then click the "Convert to CSV" object command.
# In the "Convert to CSV" window, click the drop-down button to the right of the Save As property. Select "Attachment" from the drop-down menu.
# "Children" and "Files" are also options for this command, so your choice may vary depending on your desired output. Click the "Execute" button when done.
# The XLSX file will be converted to a CSV and its contents are now viewable.


{|cellpadding=10 cellspacing=5
<div style="position: relative; box-sizing: content-box; max-height: 80vh; max-height: 80svh; width: 100%; aspect-ratio: 1.8; padding: 40px 0 40px 0;"><iframe src="https://app.supademo.com/embed/cmjbv4wda003kxi0i5plj4o1y?embed_v=2&utm_source=embed" loading="lazy" title="Excel Document - Convert to CSV" allow="clipboard-write" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
|valign=top style="width:40%"|
However, this is not the case for ''Delimited Extract''. The attached file ''must'' be loaded (copied into the '''Grooper File Store''') in order to extract any data.
 
# A sparsely imported file will be indicated by a gold chain link on the '''Batch Folder''' icon.
# If you attempt to extract the data using ''Delimited Extract'', nothing will return at this point.
|
[[File:Delimited-extract-06.png]]
|-
|valign=top|
From '''Grooper Design Studio''' , you can load file quickly and easily if you're trying to test out your '''Data Table's''' ''Delimited Extract'' configuration. You can perform the following steps from the Batch Viewer at any node in the Node Tree. If you can get to a Batch Viewer, and select a batch using the node's UI, you can do the following to load a '''Batch Folder's''' attached file (be it a CSV file, a TXT file, an XLSX file, a PDF file, or any other file).
 
# Right-click the document folder with the attached file.
# Select "File System Link".
# Select "Load Content..."
# This will bring up the following "Load Content" configuration window.
# Press the "Execute" button to load the file into the '''Grooper File Store'''.
|
[[File:Delimited-extract-07.png]]
|-
|valign=top|
# Upon loading the content, the chain link will change color from grey to silver.
#* The silver link indicates the content is loaded but '''Grooper''' retains a link to the file's location.  This link can be used in the future if the content needs to be reloaded or otherwise accessed from its external location.  Alternatively the link can be removed if no longer needed once the content is fully loaded into '''Grooper'''.
# With the content loaded, the data can now be extracted using ''Delimited Extract''.
|
[[File:Delimited-extract-08.png]]
|}


{|cellpadding="10" cellspacing="5"
=== Example: Mapping Method - Named ===
|-style="background-color:#36b0a7; color:white"
The first option for the Mapping Method property is "Named". This allows the Delimited Extract Table Extract Method to read the header row, typically the first line, of a CSV file and attempt to match it to the names of Data Columns. If the names of the headers in the header row of the CSV match exactly the names of the Data Columns, Grooper will automatically know which entry to insert upon return of the data. This Mapping Method is very easy to configure and works will under the aformentioned circumstances, but breaks easily when there aren't exact matches. Adjustments to the names of Data Columns can be made to accommodate differences, however. A strength, however, of the "Named" Mapping Method is that the positions of the "columns" within the CSV have no bearing on its ability to function. As long as the names match, their order does not matter.
|style="font-size:14pt"|'''FYI'''
|
This too is a very "ad hoc" method of loading files. You do the same thing using the '''Execute''' activity, selecting ''File System Link'' as the '''''Object Type''''' (or ''CMIS Document Link'', ''FTP Link'', ''Mail Link'' or one of the other ''Link'' options depending how the file was imported from what external storage platform) and choosing ''Load Content'' as the '''''Command'''''. You will use the '''Execute''' activity to automate foading files in a '''Batch Process'''.  
|}


[[#Prereqs|Click to return to the top]]
# Expand the Node Tree and select the "Delimited Extract - Named" Data Table from the provided Project. Notice the Extract Method is set to "Delimited Extract".
</tab>
# Click the drop-down arrow to the left of the Extract Method property set to "Delimited Extract" to expand its sub-properties. Notice the Mapping Method property is set to "Named".
<tab name="3. Classify the Document Folder" style="margin:20px">
# Because the Header Row is on the first line, the Skip Rows property is set to "1" to skip that row for output. This will prevent the header row from being included in the returned results.
=== 3. The '''Batch Folder''' must be classified. ===
# Because our file uses commas to separate its values, we'll use the default Delimiter property setting of a comma.
# Click the Tester tab.
# Click the "Select Batch" button in the Batch Viewer, then be sure to select the "Delimited-Extract" Batch.
# In order to use Delimited Extract for extraction, we need to have a classified Batch Folder. Right-click the first Batch Folder, then click the "Assign Document Type" object command from the pop-out menu.
# Point the Content Type property at the "Delimited Extract - Document Type" Document Type.
# Select the now classified first Batch Folder, then click the "Test Extraction" button.
# Notice there is a partial output. Because the "Social" and "Animal" Data Columns names do not match that of the header row, the Named Mapping Method falls short here.
# Changing the names of the Data Columns is an option, but we'll leave them alone for now.


When building a '''Batch Process''' you will always add an '''Extract''' activity ''after'' a '''Classify''' activity or some other operation classifies the '''Batch Folders''' in a '''Batch'''. This is because the '''Extract''' activity looks for the '''Content Type''' assigned to the document folder to access the '''Data Model''' configured to collect data from the document.
<div style="position: relative; box-sizing: content-box; max-height: 80vh; max-height: 80svh; width: 100%; aspect-ratio: 1.8; padding: 40px 0 40px 0;"><iframe src="https://app.supademo.com/embed/cmjbv4ywz02bryw0inoki3yu2?embed_v=2&utm_source=embed" loading="lazy" title="Mapping Method - Named" allow="clipboard-write" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>


Typically, document folders do not need to be classified when performing unit testing in '''Grooper Design Studio'''. You can navigate to a '''Data Model''' or its child '''Data Elements''' in the Node Tree, select an unclassified '''Batch Folder''' object in a '''Batch''', press the "Test Extraction" button and a preview of the extracted data will be presented in the UI.
=== Example: Mapping Method - Positional ===
The second option for the Mapping Method property is "Positional". Unlike "Named", this Mapping Method does not rely on the header row of the CSV at all. Instead, the assumption is that the "columns" of the CSV file, from left to right, match exactly the "position" of the Data Columns from top to bottom. Like the "Named" Mapping Method, this is very easily configured, however, it breaks easily if the columns of the CSV file do not match the order of the Data Columns exactly. Changing the oder of Data Columns is easily done if you know that the CSV files being processed are consistent.


{|cellpadding=10 cellspacing=5
# Expand the Node Tree and select the "Delimited Extract - Positional" Data Table from the provided Project. Notice the Extract Method property is set to "Delimited Extract".
|valign=top style="width:40%"|
# Click the drop-down arrow to the left of the Extract Method property set to "Delimited Extract" to expand its sub-properties. Notice the Mapping Method property is set to "Positional".
This is ''not'' the case when testing the ''Delimited Extract'' method!  The '''Batch Folder''' ''must'' be classified (assigned a '''Document Type''') in order to test extraction.
# Because the Header Row is on the first line, the Skip Rows property is set to "1" to skip that row for output. This will prevent the header row from being included in the returned results.
# Because our file uses commas to separate its values, we'll use the default Delimiter property setting of a comma.
# Click the Tester tab.
# Click the "Select Batch" button in the Batch Viewer, then be sure to select the "Delimited-Extract" Batch.
# In order to use Delimited Extract for extraction, we need to have a classified Batch Folder. Right-click the second Batch Folder, then click the "Assign Document Type" object command from the pop-out menu.
# Point the Content Type property at the "Delimited Extract - Document Type" Document Type.
# Select the now classified second Batch Folder, then click the "Test Extraction" button.
# We're getting results, but because the first column in the source is an "ID" column, this ends up pushing our results. This shows a shortcoming of the "Positional" Mapping Method.


# These documents are ''not'' classified. They do not have a '''Document Type''' assigned to them.  They are generically listed as "Document".
<div style="position: relative; box-sizing: content-box; max-height: 80vh; max-height: 80svh; width: 100%; aspect-ratio: 1.8; padding: 40px 0 40px 0;"><iframe src="https://app.supademo.com/embed/cmjbv545l01kixc0irjha24i2?embed_v=2&utm_source=embed" loading="lazy" title="Mapping Method - Positional" allow="clipboard-write" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
# Upon testing the '''Data Table''' (using the ''Delimited Extract'' method), no data returns.
|
[[File:Delimited-extract-09.png]]
|-
|valign=top|
# These documents ''are'' classified.  They have either the "CSV Report" or "TXT Report" '''Document Type''' (added to this '''Content Model''') assigned to them.
# As a classified document, upon testing the '''Data Table''' (using the ''Delimited Extract'' method), the table data ''is'' returned.
|
[[File:Delimited-extract-10.png]]
|}


[[#Prereqs|Click to return to the top]]
=== Example: Mapping Method - Mapped ===
</tab>
The third and final option for the Mapping Method property is "Mapped". This is the most robust of the options available as you use key-value pair entries in a Lexicon, or the Local Entries sub-property of this option, to directly map the names of "columns" of the CSV file to Data Columns in Grooper. The number and position of columns within the CSV file will have no impact on this Mapping Method's ability to function.
</tabs>


=== Delimited Extract Configuration ===
When mapping entries for this Mapping Method the key, or the value to the left of the <code>=</code> sign, is the name of a Data Column in Grooper, while the value, or the value to the right of the <code>=</code> sign, is the name of the column found in the header row of the CSV. For example:


<tabs style="margin:20px">
<pre>
<tab name="Enable Delimited Extract" style="margin:20px">
First Name=First Name
=== Enable Delimited Extract ===
Last Name=Last Name
{|cellpadding=10 cellspacing=5
Email=Email
|valign=top style="width:40%"|
Social=SSN
To use ''Delimited Extract'' you will choose this as the '''''Extract Method''''' property of a '''Data Table'''
Animal=Favorite Animal
</pre>


# Add or select a '''Data Table''' in the Node Tree.
# Expand the Node Tree and select the "Delimited Extract - Mapped" Data Table from the provided Project. Notice the Extract Method property is set to "Delimited Extract".
#* We've selected the '''Data Table''' named "People Data". It is a '''Data Element''' of the "CSV Report" '''Document Type's''' '''Data Model'''.
# Click the drop-down arrow to the left of the Extract Method property set to "Delimited Extract" to expand its sub-properties. Notice the Mapping Method property is set to "Mapped".
# Select the '''''Extract Method''''' property.
# Because the Header Row is on the first line, the Skip Rows property is set to "1" to skip that row for output. This will prevent the header row from being included in the returned results.
# Choose ''Delimited Extract'' from the list of table extraction methods.
# Because our file uses commas to separate its values, we'll use the default Delimiter property setting of a comma.
|valign=top|
# Click the drop-down arrow to the left of the Mappings property to expand its sub-properties, then click the ellipsis button to the right of the Local Entries property to open the Local Entries editor.
[[File:Delimited-extract-11.png]]
#* It's worth noting that you could use the Included Lexicons property to point at a Lexicon instead of using Local Entries.
|}
# The entries used for mapping are key-value pairs, where the key represents the name of the Data Column, and the value represents the name of the target column from the CSV file.
# Click the Tester tab.
# Click the "Select Batch" button in the Batch Viewer, then be sure to select the "Delimited-Extract" Batch.
# Select the first (classified) Batch Folder in the Batch Viewer, then click the "Test Extraction" button.
# Notice all results are returned accurately.
# Select the second (classified) Batch Folder in the Batch Viewer, then click the "Test Extraction" button.
# Notice all results are returned accurately.


</tab>
<div style="position: relative; box-sizing: content-box; max-height: 80vh; max-height: 80svh; width: 100%; aspect-ratio: 1.8; padding: 40px 0 40px 0;"><iframe src="https://app.supademo.com/embed/cmjbv57l202dtyw0ibs7j8vd4?embed_v=2&utm_source=embed" loading="lazy" title="Mapping Method - Mapped" allow="clipboard-write" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<tab name="Choose the Mapping Method" style="margin:20px">
</div>
=== Choose the Mapping Method ===


With ''Delimited Extract'' enabled, arguably the most important consideration is how you configure the '''''Mapping Method''''' property. This property determines how the "columns" in the delimiter-separated file correspond to the '''Data Columns''' in your '''Data Table'''. This can be one of three options:
== Properties overview ==
# ''Named''
The following properties are available for Delimited Extract. Property names are shown in quotes as they appear in the UI.
# ''Positional''
# ''Mapped''


=== Named ===
* '''Filename Pattern'''
** ''Definition:'' A regular expression that selects which file on the [[Batch Folder]] to parse (default: <code>\.csv$</code>).
** ''Remarks:'' Only files whose names match this pattern are processed. Use standard .NET regex.
** ''Use cases:''
*** Match TSV: <code>\.tsv$</code>
*** Match files containing “export”: <code>export.*\.csv$</code>


The ''Named'' option will use column headers in the delimiter-separated file to map columns to the '''Data Columns''' in our '''Data Table'''.  As long as the names match,  '''Grooper''' will populate the '''Data Column''' with the data in the correspondingly named column in the CSV or TXT file.
* '''Mapping Method'''
{|cellpadding=10 cellspacing=5
** ''Definition:'' Chooses how CSV columns map to [[Data Column]]s.
|valign=top|
** ''Remarks:'' Options:
For example, the '''Data Table''' in our example has five '''Data Columns'''.
*** '''Named''' — match CSV header names to Data Column names.
* First Name
*** '''Positional''' — map by order: CSV column 1 → first Data Column, and so on.
* Last Name
*** '''Mapped''' — use a lexicon to translate header text or numeric indices to Data Column names.
* Email
** ''Use cases:''
* Social
*** Use '''Named''' when headers match your column names.
* Animal
*** Use '''Positional''' when there is no header row or order is reliable.
|valign=top|
*** Use '''Mapped''' when headers differ, have synonyms, or you want to reference numeric indices.
[[File:Delimited-extract-12.png]]
|-
|valign=top|
The CSV files in our batch also have five columns.
|valign=top|
[[File:Delimited-extract-13.png]]
|-
|valign=top|
However, some of the names are the same and some are different.


Only the "First Name", "Last Name" and "Email" columns match.
* '''Mappings''' (shown when "Mapping Method" = ''Mapped'')
 
** ''Definition:'' An embedded lexicon that defines custom column mappings.
The "Social" and "Animal" '''Data Columns''' are labeled "SSN" and "Favorite Animal" in the CSV file.
** ''Remarks:'' Each entry maps a table column name (key) to a CSV header or a numeric index (value). If the value is a number, it is treated as a zero-based index; otherwise, the CSV header row is searched for that header text.
|
** ''Use cases:''
[[File:Delimited-extract-14.png]]
*** Handle synonyms, inconsistent header names, or index-based mapping.
[[File:Delimited-extract-15.png]]
*** ''Example:''
|}
<pre>
 
Name=Full Name
{|cellpadding=10 cellspacing=5
Date=Transaction Date
|valign=top style="width:40%"|
Total=Amount Due
If we were to test extraction of this CSV file using the ''Named'' method, only ''part'' of the delimiter-separated values will be collected.  Those in the columns whose names match will be returned, but the columns whose names ''do not'' will ''not''.
</pre>
 
# The default '''''Mapping Method''''' option is ''Named''
# Upon extraction, '''Grooper''' will search the first row of the file for the column header labels.
#* The column headers ''must'' be present on the ''first'' row in order for this method to work.
# If a column header matches the name of one of the '''Data Table's''' child '''Data Columns''', that '''Data Column''' will return that column's data in the delimiter-separated file.
#* In our case, the names for the "First Name", "Last Name", and "Email" columns match.  So that data is returned to the correspondingly named '''Data Column'''.
#* This is also order ''independent''.  If "Email" was the first '''Data Column''' instead of the third, it would still return the data from the "Email" column on the document because their names still match.
# The "Social" and "Animal" '''Data Columns''' are blank because their names ''do not'' match.
 
If we changed the name of our '''Data Columns''' to match (both be named "SSN" and "Favorite Animal" respectively), we would get the full table populated.
 
However, let's completely ignore that and look at the next '''''Mapping Method''''' option, ''Positional''.
|
[[File:Delimited-extract-18.png]]
|}


{|cellpadding="10" cellspacing="5"
* '''Header Row''' (1-based; hidden when "Mapping Method" = '''Positional''')
|-style="background-color:#36b0a7; color:white"
** ''Definition:'' The row number that contains the column names for header-based mapping.
|style="font-size:14pt"|'''FYI'''
** ''Remarks:'' If blank, row 1 is used by default. Ignored for Positional mapping.
|
** ''Use cases:''
{|cellpadding=10 cellspacing=5
*** Files with a title row above the header require "Header Row" > 1.
|valign=top style="width:40%"|
Before we do that, you probably noticed the first line of the table (the one with the column headers) is returned along with the actual table data.  This is probably "junk data" you don't ultimately want collected at time of extraction.


# This is what the '''''Skip Rows''''' property is for.
* '''Skip Rows'''
# Set this to ''1'' and it will skip the first row in the file.
** ''Definition:'' Number of rows to skip at the beginning of the file before parsing data.
# This will remove that header row from our extracted results, extracting only the second and beyond.
** ''Remarks:'' Applied after processing the header (if applicable). Default is 0.
|
** ''Use cases:''
[[File:Delimited-extract-19.png]]
*** Ignore banner rows or multi-line descriptions preceding data.
|}
|}


=== Positional ===
* '''Delimiter'''
** ''Definition:'' The character (or string) that separates fields (default: <code>,</code>).
** ''Remarks:'' Examples: comma (<code>,</code>), tab (<code>\t</code>), pipe (<code>|</code>).
** ''Use cases:'' Required for correct parsing; must match the file's delimiter.


The document from the previous example is actually a good candidate for the ''Positional'' '''''Mapping Method'''''.  This method does not rely on how columns are ''named'' but instead how they are ''ordered''.  It presumes the first column in the delimiter-separated file is the '''Data Table's''' first child '''Data Column'''.
* '''Quoted'''
 
** ''Definition:'' Specifies whether fields may be enclosed in double quotes to escape delimiters (default: true).
{|cellpadding=10 cellspacing=5
** ''Remarks:'' Enable for standard CSV where fields like <code>"ACME, Inc."</code> should be treated as one value.
|valign=top style="width:40%"|
** ''Use cases:'' Disable if the file never uses quotes.
Such is the case for the CSV file we looked at earlier.  The first '''Data Column''' corresponds to the first column in the CSV file, the second corresponds to the second, the third the third, and so on. 
 
In this case, it doesn't matter what they're named, just how the '''Data Columns''' are positioned in '''Grooper''' (child order from top to bottom) and how the columns are positioned in the delimiter-separated file (column order from left to right).  If this lines up, you can just marry the '''Data Columns''' with the document's columns, returning the first column's data to the first '''Data Column''' and so on.
|valign=top style="width:30%"|
[[File:Delimited-extract-20.png|center]]
|valign=top style="width:30%"|
[[File:Delimited-extract-21.png|center]]
|}
 
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
# Here, the '''''Mapping Method''''' property is set to ''Positional''.
# In this case, the full table extracts successfully.
#* Even though the '''Data Column''' names and the column labels in the document do not match, their ''positions'' do.  The first child '''Data Column''' logically matches the data in the left-most column on the document, the next '''Data Column''' matches the next column on the document, and so on.
#* The ''Positional'' '''''Mapping Method''''' simply returns the first column on the document to the first '''Data Column''', the second the second, and so on.
 
{|cellpadding="10" cellspacing="5"
|-style="background-color:#36b0a7; color:white"
|style="font-size:14pt"|'''FYI'''
|
Notice this method doesn't need a row of column labels ''at all''.  If your CSV or delimited TXT file does not have column labels, you can still use the ''Positional'' '''''Mapping Method''''' to collect the data.
|}
 
|
[[File:Delimited-extract-22.png|center]]
|
|-
|valign=top|
The (probably obvious) drawback to this method is the positions must match all the time for every document.  If the columns are out of order, or there are extra columns in the document, this '''''Mapping Method'''' can start to break down.
 
# This document has an "ID" column as its first table column.
# Since this is now the first column, it maps directly to the first '''Data Column'''.
#* But that's not what we want!  Now the column data is shifted.  The first names are in the "Last Name" '''Data Column'''.  The last names are in the "Email" '''Data Column''' and so on.
 
In this case, the last '''''Mapping Method''''' option, ''Mapped'' may be more suitable.
|
[[File:Delimited-extract-23.png|center]]
|}
 
=== Mapped ===
 
The ''Mapped'' '''''Mapping Method''''' gives you the most control, dictating what '''Data Column''' matches which column in the delimited file.  You can use a '''Lexicon''' or a '''''Local Entries''''' text editor and specific syntax to dictate this.
 
You will map the '''Data Column''' to the column label in the delimited file using the following format:
: <code><span style="color:red">{Data Column name}</span>=<span style="color:blue">{table column label name}</span></code>
{|cellpadding=10 cellspacing=5 style="margin:auto"
|style="text-align:center"|
For example, our mappings will look like this:
 
[[File:Delimited-extract-26.png]]
|style="text-align:center"|
'''Data Column''' names
 
[[File:Delimited-extract-24.png]]
|style="text-align:center"|
table column labels
 
[[File:Delimited-extract-25.png]]
|}
 
{|cellpadding="10" cellspacing="5"
|-style="background-color:#f89420; color:white"
|style="font-size:22pt"|'''&#9888;'''
|
Notice the '''Data Column''' named "First Name" was entered as <code>First_Name</code>.
<li><span style="color:white">Spaces and special characters in a '''Data Column's''' name ''must'' be replaced with an underscore (<code>_</code>).</li></span>
 
 
On the other hand, notice the column label "First Name" was entered ''with'' the space as <code>First Name</code>.
<li><span style="color:white">'''DO NOT''' replace spaces and special characters when entering the column label.</li></span>
<li><span style="color:white">What you enter for the column label must match the column label ''exactly''.</li></span>
|}
 
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
To enter these mappings, do the following.
 
# Change the '''''Mapping Method''''' property to ''Mapped''.
# Expand the '''''Mappings''''' sub-properties.
# To add a local list of mappings, select the '''''Local Entries''''' property and press the ellipsis button at the end.
# This will bring up a list editor window.
# Using the syntax described above, enter a list of '''Data Column''' to table column label mappings.
# Press the "OK" button when finished.
 
{|cellpadding="10" cellspacing="5"
|-style="background-color:#36b0a7; color:white"
|style="font-size:14pt"|'''FYI'''
|
You can also use the '''''Included Lexicons''''' property to point to a '''Lexicon''' object in the Node Tree instead of using a local list.
 
You will use the exact same syntax as described above in both cases.
|}
|
[[File:Delimited-extract-27.png]]
|-
|valign=top|
Now, with these mappings listed out using the ''Mapped'' '''''Mapping Method'''''.  ''Delimited Extract'' will successfully extract the data for ''both'' CSV files sharing the same column labels, but have those columns in different positional order.
 
# For this document...
# ...the listed mappings match up the columns labeled here...
# ...with their corresponding '''Data Columns''', and the table's data is returned.
|
[[File:Delimited-extract-28.png]]
|-
|valign=top|
# This document extracts correctly as well!
# Even though these column labels have different positions (due to there being an additional "ID" column).
# The ''Mapped'' method ignores the unlisted "ID" column's data and the table is returned, mapping the listed columns with the listed '''Data Columns'''.
|
[[File:Delimited-extract-29.png]]
|}


</tab>
* '''Trim Whitespace'''
<tab name="Filename Pattern and Delimiter Considerations" style="margin:20px">
** ''Definition:'' Removes leading and trailing whitespace from parsed fields (default: true).
=== Filename Pattern and Delimiter ===
** ''Remarks:'' Helps normalize values; disable if whitespace is significant.
** ''Use cases:'' Keep enabled for most CSV/TSV data to reduce cleanup.


</tab>
== Configuration patterns and troubleshooting ==
</tabs>
* '''Files not found:''' Confirm "Filename Pattern" matches the file in the Batch Folder.
* '''Wrong columns:''' Verify "Delimiter", "Quoted", and "Trim Whitespace" align with the file's format.
* '''Header mismatches:''' With "Mapping Method" = '''Named''', ensure the file's header text exactly matches your Data Column names; otherwise use '''Mapped'''.
* '''Mapped indices off by one:''' Mapped numeric values are zero-based indices; confirm positions accordingly.
* '''Extra leading rows:''' Increase "Skip Rows" to start at the first data line after the header row.

Latest revision as of 14:36, 18 December 2025

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 20232021

Delimited Extract is a Table Extract Method that extracts tabular data from a delimiter-separated text file, such as a CSV file.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2025). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

Introduction

Delimited Extract is a Table Extract Method that reads and parses delimited text files (for example CSV, TSV, or pipe-delimited) and maps fields into a Data Table's Data Columns. Instead of reading from page layout or OCR, it ingests a text file attached to the corresponding Batch Folder and converts each parsed line into a table row.

How it differs from other table extract methods:

  • Tabular Layout / Grid Layout operate on page layout, headers, and geometric alignment; Delimited Extract does not use document layout.
  • Row Match and Fixed Width parse text in the document region; Delimited Extract reads from an external delimited file.
  • Fluid Layout switches between Tabular Layout and Row Match based on labels; Delimited Extract is purpose-built for delimited files.
  • AI Table Reader uses a large language model to interpret semi-structured tables; Delimited Extract is rules-based, deterministic parsing of delimited inputs.

What it is for

Use Delimited Extract when your tabular data arrives as a delimited text file (e.g., exports, system feeds, or data interchanges) stored on the Batch Folder. Typical scenarios:

  • Accounting exports: InvoiceNo,Date,Amount
  • System logs or reports delivered as CSV/TSV
  • Partner data feeds using pipe (|) or semicolon delimiters

Benefits

  • Simple, deterministic parsing for consistent, well-formed files
  • Flexible column mapping: by header name, by position, or via a mapping lexicon
  • Robust quoting and optional whitespace trimming
  • Minimal configuration for standard CSV

Drawbacks

  • Requires a delimited file stored on the Batch Folder; it does not analyze page layout or OCR
  • Sensitive to delimiter and quoting configuration—misconfiguration leads to misaligned columns
  • No label-based header/footer detection (unlike page-layout methods)

How to add and configure Delimited Extract

Following are instructions for general setup of the Delimited Extract Table Extract Method.

FYI

Please see the demos below for example setups with screenshots and highlighted instructions.

  1. Create or open a Data Model and add a Data Table for your repeating records (for example, Line Items).
  2. Add one or more Data Columns under the Data Table (for example, Name, Date, Amount). Name the columns exactly as you want them to appear in the output.
  3. Select the Data Table and set its "Extract Method" to Delimited Extract.
  4. Configure Delimited Extract properties:
    • Choose the correct "Delimiter" (for example, ,, \t, or |).
    • Decide the "Mapping Method":
      • Named if the file has a header row whose values match your Data Column names.
      • Positional if you want to map by column order.
      • Mapped if you need a lexicon to translate CSV headers (or numeric indices) to Data Column names.
    • If using Named or Mapped, confirm "Header Row" (1-based) and adjust "Skip Rows" if introductory lines should be ignored before data rows.
    • Enable or disable "Quoted" and "Trim Whitespace" to match your file's conventions.
    • Ensure "Filename Pattern" matches the file stored on the Batch Folder (default matches .csv).
  5. Attach or drop the delimited file onto the Batch Folder so it is available during extraction.
  6. Classify the document to a Content Type that uses the Data Model that contains the Delimited Extract configured Data Table.
  7. Test your setup:
    • Run a Batch through an Extract step or use the Tester tab of the Data Table or Data Model.
    • If values look misaligned, verify delimiter, header row, mapping strategy, and quoting settings.
    • Use diagnostics to review any parsing errors and row counts.

Tips

  • For TSV files set "Delimiter" to \t.
  • For pipe-delimited files set "Delimiter" to |.
  • If headers in the file do not exactly match your Data Column names, use Mapped and provide a mapping lexicon.
  • Use the "Convert to CSV" object command to convert XLSX files to CSV before attempting to use Delimited Extract on them.

Example: Excel Document - Convert to CSV

When working with CSV files it is common to come across XLSX files that are native to Microsoft Excel. Grooper cannot natively read XLSX files, so as a result, Delimited Extract will not work with this file type. It is easy enough, however, to convert XSLX files to CSV in Grooper. A simple object command will get the job done.

  1. XLSX is a common file type that might be expected to be used for Delimited Extract. Expand the Node Tree and select the provided "Delimited-Extract" Batch, then click the Viewer tab.
  2. Select the third Batch Folder.
    • You'll notice it is an XLSX file type. The XLSX file type cannot be natively read by Grooper, so it's data will seem incomprehensible.
  3. Right-click the XLSX document. In the pop-out menu hover over "Excel Document", then click the "Convert to CSV" object command.
  4. In the "Convert to CSV" window, click the drop-down button to the right of the Save As property. Select "Attachment" from the drop-down menu.
  5. "Children" and "Files" are also options for this command, so your choice may vary depending on your desired output. Click the "Execute" button when done.
  6. The XLSX file will be converted to a CSV and its contents are now viewable.

Example: Mapping Method - Named

The first option for the Mapping Method property is "Named". This allows the Delimited Extract Table Extract Method to read the header row, typically the first line, of a CSV file and attempt to match it to the names of Data Columns. If the names of the headers in the header row of the CSV match exactly the names of the Data Columns, Grooper will automatically know which entry to insert upon return of the data. This Mapping Method is very easy to configure and works will under the aformentioned circumstances, but breaks easily when there aren't exact matches. Adjustments to the names of Data Columns can be made to accommodate differences, however. A strength, however, of the "Named" Mapping Method is that the positions of the "columns" within the CSV have no bearing on its ability to function. As long as the names match, their order does not matter.

  1. Expand the Node Tree and select the "Delimited Extract - Named" Data Table from the provided Project. Notice the Extract Method is set to "Delimited Extract".
  2. Click the drop-down arrow to the left of the Extract Method property set to "Delimited Extract" to expand its sub-properties. Notice the Mapping Method property is set to "Named".
  3. Because the Header Row is on the first line, the Skip Rows property is set to "1" to skip that row for output. This will prevent the header row from being included in the returned results.
  4. Because our file uses commas to separate its values, we'll use the default Delimiter property setting of a comma.
  5. Click the Tester tab.
  6. Click the "Select Batch" button in the Batch Viewer, then be sure to select the "Delimited-Extract" Batch.
  7. In order to use Delimited Extract for extraction, we need to have a classified Batch Folder. Right-click the first Batch Folder, then click the "Assign Document Type" object command from the pop-out menu.
  8. Point the Content Type property at the "Delimited Extract - Document Type" Document Type.
  9. Select the now classified first Batch Folder, then click the "Test Extraction" button.
  10. Notice there is a partial output. Because the "Social" and "Animal" Data Columns names do not match that of the header row, the Named Mapping Method falls short here.
  11. Changing the names of the Data Columns is an option, but we'll leave them alone for now.

Example: Mapping Method - Positional

The second option for the Mapping Method property is "Positional". Unlike "Named", this Mapping Method does not rely on the header row of the CSV at all. Instead, the assumption is that the "columns" of the CSV file, from left to right, match exactly the "position" of the Data Columns from top to bottom. Like the "Named" Mapping Method, this is very easily configured, however, it breaks easily if the columns of the CSV file do not match the order of the Data Columns exactly. Changing the oder of Data Columns is easily done if you know that the CSV files being processed are consistent.

  1. Expand the Node Tree and select the "Delimited Extract - Positional" Data Table from the provided Project. Notice the Extract Method property is set to "Delimited Extract".
  2. Click the drop-down arrow to the left of the Extract Method property set to "Delimited Extract" to expand its sub-properties. Notice the Mapping Method property is set to "Positional".
  3. Because the Header Row is on the first line, the Skip Rows property is set to "1" to skip that row for output. This will prevent the header row from being included in the returned results.
  4. Because our file uses commas to separate its values, we'll use the default Delimiter property setting of a comma.
  5. Click the Tester tab.
  6. Click the "Select Batch" button in the Batch Viewer, then be sure to select the "Delimited-Extract" Batch.
  7. In order to use Delimited Extract for extraction, we need to have a classified Batch Folder. Right-click the second Batch Folder, then click the "Assign Document Type" object command from the pop-out menu.
  8. Point the Content Type property at the "Delimited Extract - Document Type" Document Type.
  9. Select the now classified second Batch Folder, then click the "Test Extraction" button.
  10. We're getting results, but because the first column in the source is an "ID" column, this ends up pushing our results. This shows a shortcoming of the "Positional" Mapping Method.

Example: Mapping Method - Mapped

The third and final option for the Mapping Method property is "Mapped". This is the most robust of the options available as you use key-value pair entries in a Lexicon, or the Local Entries sub-property of this option, to directly map the names of "columns" of the CSV file to Data Columns in Grooper. The number and position of columns within the CSV file will have no impact on this Mapping Method's ability to function.

When mapping entries for this Mapping Method the key, or the value to the left of the = sign, is the name of a Data Column in Grooper, while the value, or the value to the right of the = sign, is the name of the column found in the header row of the CSV. For example:

First Name=First Name
Last Name=Last Name
Email=Email
Social=SSN
Animal=Favorite Animal
  1. Expand the Node Tree and select the "Delimited Extract - Mapped" Data Table from the provided Project. Notice the Extract Method property is set to "Delimited Extract".
  2. Click the drop-down arrow to the left of the Extract Method property set to "Delimited Extract" to expand its sub-properties. Notice the Mapping Method property is set to "Mapped".
  3. Because the Header Row is on the first line, the Skip Rows property is set to "1" to skip that row for output. This will prevent the header row from being included in the returned results.
  4. Because our file uses commas to separate its values, we'll use the default Delimiter property setting of a comma.
  5. Click the drop-down arrow to the left of the Mappings property to expand its sub-properties, then click the ellipsis button to the right of the Local Entries property to open the Local Entries editor.
    • It's worth noting that you could use the Included Lexicons property to point at a Lexicon instead of using Local Entries.
  6. The entries used for mapping are key-value pairs, where the key represents the name of the Data Column, and the value represents the name of the target column from the CSV file.
  7. Click the Tester tab.
  8. Click the "Select Batch" button in the Batch Viewer, then be sure to select the "Delimited-Extract" Batch.
  9. Select the first (classified) Batch Folder in the Batch Viewer, then click the "Test Extraction" button.
  10. Notice all results are returned accurately.
  11. Select the second (classified) Batch Folder in the Batch Viewer, then click the "Test Extraction" button.
  12. Notice all results are returned accurately.

Properties overview

The following properties are available for Delimited Extract. Property names are shown in quotes as they appear in the UI.

  • Filename Pattern
    • Definition: A regular expression that selects which file on the Batch Folder to parse (default: \.csv$).
    • Remarks: Only files whose names match this pattern are processed. Use standard .NET regex.
    • Use cases:
      • Match TSV: \.tsv$
      • Match files containing “export”: export.*\.csv$
  • Mapping Method
    • Definition: Chooses how CSV columns map to Data Columns.
    • Remarks: Options:
      • Named — match CSV header names to Data Column names.
      • Positional — map by order: CSV column 1 → first Data Column, and so on.
      • Mapped — use a lexicon to translate header text or numeric indices to Data Column names.
    • Use cases:
      • Use Named when headers match your column names.
      • Use Positional when there is no header row or order is reliable.
      • Use Mapped when headers differ, have synonyms, or you want to reference numeric indices.
  • Mappings (shown when "Mapping Method" = Mapped)
    • Definition: An embedded lexicon that defines custom column mappings.
    • Remarks: Each entry maps a table column name (key) to a CSV header or a numeric index (value). If the value is a number, it is treated as a zero-based index; otherwise, the CSV header row is searched for that header text.
    • Use cases:
      • Handle synonyms, inconsistent header names, or index-based mapping.
      • Example:
Name=Full Name
Date=Transaction Date
Total=Amount Due
  • Header Row (1-based; hidden when "Mapping Method" = Positional)
    • Definition: The row number that contains the column names for header-based mapping.
    • Remarks: If blank, row 1 is used by default. Ignored for Positional mapping.
    • Use cases:
      • Files with a title row above the header require "Header Row" > 1.
  • Skip Rows
    • Definition: Number of rows to skip at the beginning of the file before parsing data.
    • Remarks: Applied after processing the header (if applicable). Default is 0.
    • Use cases:
      • Ignore banner rows or multi-line descriptions preceding data.
  • Delimiter
    • Definition: The character (or string) that separates fields (default: ,).
    • Remarks: Examples: comma (,), tab (\t), pipe (|).
    • Use cases: Required for correct parsing; must match the file's delimiter.
  • Quoted
    • Definition: Specifies whether fields may be enclosed in double quotes to escape delimiters (default: true).
    • Remarks: Enable for standard CSV where fields like "ACME, Inc." should be treated as one value.
    • Use cases: Disable if the file never uses quotes.
  • Trim Whitespace
    • Definition: Removes leading and trailing whitespace from parsed fields (default: true).
    • Remarks: Helps normalize values; disable if whitespace is significant.
    • Use cases: Keep enabled for most CSV/TSV data to reduce cleanup.

Configuration patterns and troubleshooting

  • Files not found: Confirm "Filename Pattern" matches the file in the Batch Folder.
  • Wrong columns: Verify "Delimiter", "Quoted", and "Trim Whitespace" align with the file's format.
  • Header mismatches: With "Mapping Method" = Named, ensure the file's header text exactly matches your Data Column names; otherwise use Mapped.
  • Mapped indices off by one: Mapped numeric values are zero-based indices; confirm positions accordingly.
  • Extra leading rows: Increase "Skip Rows" to start at the first data line after the header row.