2.80:Infer Grid (Table Extract Method)

From Grooper Wiki
Revision as of 12:42, 16 December 2019 by Configadmin (talk | contribs) (Created page with "frame <blockquote style="font-size:14pt> '''Infer Grid''' uses the positional location of row and column headers to interpret where a tabluar...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Infer Grid uses the positional location of row and column headers to interpret where a tabluar grid would be around each value in a table and extract values from each cell in the interpreted grid.

Infer Grid is one of three methods to extract data from tables on documents. This method extracts information from tables which have both row and column headers, inferring a grid from the header positions.  This is done by assigning an "X Axis Extractor" to match the column headers and a "Y Axis Extractor" to match row headers.  A grid is created from the header positions extracted from the "X Axis Extractor".  OCR data will populate each cell of the grid according to where it is on the page.  If everything is set up correctly, the inferred grid created will match the boundaries of the table on the document.

How To:  Configure Infer Grid

Consider the following table:

Revenue Expenses Profit
January 10,000 11,000 12,000
February 6,000 6,000 6,000
March 4,000 5,000 6,000

In the Data Model of a Content Model, create a **Data Table** and add as many **Data Columns** as necessary.  We will have four for this example: Month, Revenue, Expenses, and Profit.


File:1560961419448-949.png


Select the **Data Table** object you created ("Infer Grid Example" for our example).  In the **Data Table** tab select **Infer Grid** from the "Extract Method" dropdown list.


File:1560961660491-617.png


Click the carat next to "Extract Method" to show the configurable properties for **Infer Grid**.


File:1560961832010-849.png


Next, set your "X Axis Extractor" and "Y Axis Extractor".  You will write an "X Axis Extractor" to return the values of your column headers.  Whatever is returned by the "Y Value Extractor" will be the row headers.

  • The X axis extractor should match the entire header row at the top of the table and return sub instances for each individual column.  This can be done using regular expression, named groups or Collation Providers such as Ordered Array.
  • The Y Axis Extractor should match the entire header row on the left side of the table and return sub instances for each individual row.  This also can be done using regular expression, named groups or Collation Providers such as Ordered Array.

For this example, the extractor named "X Axis" is a Data Type with three Data Formats to find "Revenue" "Expenses" and "Profit" respectively.  On the Data Type level, the collation method was changed to an ordered array looking horizontally.  "Y Axis" was also an ordered array but combining vertically for formats looking for "January" "February" and "Profit".


File:1560966666978-588.png


If we test extraction now, you will see blank cells under the Month column.


File:1560965426119-601.png


Change the "Header Column" property to the **Data Column** you want to receive row header values.  This will populate this column with the values returned from your Y Axis Extractor.  For our example, we will set it to the Month column.


File:1560964862457-911.png


As you can see from our example, the months our **Y Axis** extractor returned have populated the Month column of the **Data Table**.  Notice we never wrote any extractors to find numerical values during this example.  All the numerical values in the table were extracted from the OCR data, using the grid our Data Table inferred, using the header values extracted from our X and Y Axis Extractors.


File:1560965558895-514.png


Tabular OMR: Checkboxes and Table Extraction

As of 2.72, the "Infer Grid" method now supports columns containing OMR data.  This makes it much easier to read checkbox information from tables.  The set up is very easy.  Simply mark one or more columns as OMR columns.

Let's take the following table as an example.  We're going to keep things extremely simple and make a "Data Table" that only has two columns.  One for reading the "Plant" check box and one for "Simulator".

File:1560978866459-669.png

Select your **Data Table** in your Data Model. Under the **Extract Method** settings, select **OMR Columns**.


File:1560978775013-210.png


Select the columns you wish to use as OMR Columns, which columns you want to read OMR data from.


File:1560979011602-166.png


Press **Test Extraction** to see the result.  The rows where the box was checked now show "True" where the blank boxes are marked "False".


File:1560979203350-805.png


That's pretty much it! As you can see, for both the rows on the left side of the table as well as the rows on the right, any filled in box has been marked "True".