2023:Data Table
Data Tables are Data Elements in a Data Model used to extract data from tables in documents. Each column of data is represented by a Data Column object created as the Data Table's children.
Data is extracted using one of the following Table Extraction methods. Each of these methods take a different approach in modeling a table's structure:
About
Many documents contain data in a table, presented on the page as some kind of grid of information divided into rows and columns. The Data Table object's purpose is to define the processing logic to model and collect tabular data.
A Data Table can be added to your Data Model to extract data from the table's cells. Once added to your Data Model, you will add Data Columns to the Data Table. You can add as many columns as you need to collect data from all (or only some) of the table's columns on the document. Data Columns will exist as children of the Data Table in the node hierarchy.
- Data Columns also allow for additional configuration, such as assigning the Value Type for the extracted data in that column (decimal, string, Boolean, etc).
- There are several table extraction methods. Some will require configuration of a Data Table's Data Columns. Others will not (or allow for optional configurations). Please visit the Table Extraction article for more information on tabular data extraction in general.
|
FYI |
Generally, the first Data Column underneath your Data Table will correspond to the leftmost column and the last will be the rightmost. The top Data Column lines up with the first column, the second Data Column the second column, and so on. However, this is not strictly necessary. You can re-order the column order how you see fit. You can change the order of columns within your Data Table by right clicking a Data Column in the Node Tree and choosing either "Move Up" or "Move Down". Keyboard shortcuts are also available. "Move Up" is |
Table Extract Methods
There are six different extraction methods available in Grooper. Using the Data Table's Extract Method property you will select and configure one of the following:
- Row Match - This uses an extractor to match each row. You could reference a Data Type extractor that returns each whole row in the table to populate the rows in the Data Table.
- Header-Value - This method detects the layout of a table by analyzing results from header extractors and value extractors defined on each Data Column.
- Grid Layout - This method creates a grid from header positions, using extractors to match column and (sometimes optionally) row headers. Once the grid is created ("inferred" from the column and row header positions) it extracts the corresponding text data from the cells within the grid.
- Tabular Layout - This method is an improvement upon the Header-Value method. It also detects a table's layout using a table's column headers and value extractors defined on the Data Column objects. However, in general, there is much less configuration required up front with more ability to fine tune configuration according to your needs. This method also can make efficient use of Label Sets to aid in table extraction.
- Fluid Layout - This method requires Label Sets in order to function. It can be configured in a way to use either the Row Match or the Tabular Layout method based on how a Document Type's labels are collected.
- Delimited Extract - This method allows for efficient extraction of character delimited text files, such as CSV files.
- Fixed Width - This method reads tabular data from "fixed width" formatted text files.
Property Details
This section expands on the Grooper documentation for various Data Table properties.
Maximum Display Rows
The Maximum Display Rows property specifically has to do with how rows are displayed in the Data Viewer when a user executes the Review activity.
Imagine you have a data dense document with a table with several hundred rows. Grooper extracts the table and now it's time to present it to a data entry clerk in Review. It's going to take Grooper a while to load that Data Table. This can be an unnecessary lag point in the user's review experience.
The Maximum Display Rows property allows you to dynamically load rows. Instead of loading them all at once, you can restrict this to only load, say 50 at a time. After you scroll to the bottom of the first 50 loaded rows, the next 50 will load, then the next, and so on until you reach the end of the table. This way, the user doesn't have to wait for the entire table to load up front and can start reviewing the extracted data quicker.