2023:Data Table: Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Line 1: Line 1:
<onlyinclude>
<blockquote style="font-size:14pt">
<blockquote style="font-size:14pt">
'''Data Tables''' are [[Data Element]]s used to extract data from tables in documents.  
'''Data Tables''' are '''[[Data Element]]s''' in a '''[[Data Model]]''' used to extract data from tables in documents.  Each column of data is represented by a '''[[Data Column]]''' object created as the '''Data Table's''' children.
</blockquote>
</blockquote>


Each column of data is represented by a [[Data Column]] object created as the Data Table's children. Data is extracted using one of the following:
Data is extracted using one of the following [[Table Extraction]] methods:


* Header values to find data for each column
* ''[[Row Match]]'' - Using an extractor to match each row.
* Creating a grid based on header and row values
* ''[[Header-Value]]'' - Header values to find data for each column
* Using an extractor to match each row.
* ''[[Infer Grid]]'' - Creating a grid based on header and row values
</onlyinclude>


== About ==
== About ==


Many different document types contain data in a table.  A Data Table can be added to your Data Model to extract data elements from tables.  Once added to your Data Model, you will add Data Columns to the Data Table.  You can add as many columns as you need.  They will be listed underneath your Data Table in the order you add them.  The first Data Column underneath your Data Table will be the leftmost column and the last will be the rightmost.
Many different document types contain data in a table.  A '''Data Table''' can be added to your '''Data Model''' to extract data from the cells in these tables.  Once added to your '''Data Model''', you will add '''Data Columns''' to the '''Data Table'''.  You can add as many columns as you need.  They will be listed underneath your '''Data Table''' in the order you add them.  Generally, the first '''Data Column''' underneath your '''Data Table''' will be the leftmost column and the last will be the rightmost.  However, it is not strictly necessary the top '''Data Column''' lines up with the first column, the second '''Data Column''' the second column, and so on.  You can re-order the column structure how you see fit.


{| cellpadding="10" cellspacing="5"
{| cellpadding="10" cellspacing="5"
|- style="background-color:#36b0a7; color:white"
|- style="background-color:#36b0a7; color:white"
|'''FYI'''||You can change the order of columns within your Data Table by right clicking a Data Column in the Node Tree and choosing either "Move Up" or "Move Down".  Keyboard shortcuts are also available.  "Move Up" is  <code>Ctrl + Up</code> and "Move Down" is <code>Ctrl + Down</code>.  
|'''FYI'''||You can change the order of columns within your '''Data Table''' by right clicking a '''Data Column''' in the Node Tree and choosing either "Move Up" or "Move Down".  Keyboard shortcuts are also available.  "Move Up" is  <code>Ctrl + Up</code> and "Move Down" is <code>Ctrl + Down</code>.  
|}
|}


Line 22: Line 24:
There are three different extraction methods available to a Data Table.
There are three different extraction methods available to a Data Table.


* '''Header Value''' - This method detects the layout of a table by analyzing results from header extractors defined on each "Data Column."  Data is then populated according to value extractors defined on the data column.
* ''[[Row Match]]'' - This uses an extractor to match each row.  You might reference a '''Data Type''' extractor that returns each whole row in the table to populate the rows in the '''Data Table'''.
* '''Row Match''' - This uses an extractor to match each row.  You might reference a Data Type extractor that references several extractors to populate multiple rows.
* ''[[Header-Value]]'' - This method detects the layout of a table by analyzing results from header extractors defined on each '''Data Column'''.  Data is then populated according to value extractors defined on the '''Data Column'''.
* '''Infer Grid''' - This method creates a grid from header positions, using extractors to match column and row headers.  Once the grid is created (or infered) it extracts the corresponding OCR data from the cells within the grid.
* ''[[Infer Grid]]'' - This method creates a grid from header positions, using extractors to match column and row headers.  Once the grid is created ("inferred" from the column and row header positions) it extracts the corresponding text data from the cells within the grid.

Revision as of 13:20, 23 November 2020

Data Tables are Data Elements in a Data Model used to extract data from tables in documents.  Each column of data is represented by a Data Column object created as the Data Table's children.

Data is extracted using one of the following Table Extraction methods:

  • Row Match - Using an extractor to match each row.
  • Header-Value - Header values to find data for each column
  • Infer Grid - Creating a grid based on header and row values


About

Many different document types contain data in a table.  A Data Table can be added to your Data Model to extract data from the cells in these tables.  Once added to your Data Model, you will add Data Columns to the Data Table.  You can add as many columns as you need.  They will be listed underneath your Data Table in the order you add them.  Generally, the first Data Column underneath your Data Table will be the leftmost column and the last will be the rightmost. However, it is not strictly necessary the top Data Column lines up with the first column, the second Data Column the second column, and so on. You can re-order the column structure how you see fit.

FYI You can change the order of columns within your Data Table by right clicking a Data Column in the Node Tree and choosing either "Move Up" or "Move Down".  Keyboard shortcuts are also available.  "Move Up" is Ctrl + Up and "Move Down" is Ctrl + Down.

Table Extract Methods

There are three different extraction methods available to a Data Table.

  • Row Match - This uses an extractor to match each row.  You might reference a Data Type extractor that returns each whole row in the table to populate the rows in the Data Table.
  • Header-Value - This method detects the layout of a table by analyzing results from header extractors defined on each Data Column.  Data is then populated according to value extractors defined on the Data Column.
  • Infer Grid - This method creates a grid from header positions, using extractors to match column and row headers.  Once the grid is created ("inferred" from the column and row header positions) it extracts the corresponding text data from the cells within the grid.