2.80:Infer Grid (Table Extract Method): Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Line 39: Line 39:
The Infer Grid method is the easiest way to read checkbox states inside a table.  Once the table's structure is found using the axis extractors, you can choose which columns contain checkboxes.  Grooper will use layout data obtained from a Box Detection or Box Removal IP Command to determine if the box is filled in or left blank.  Refer to the [[Infer Grid (Table Extract Method)#Configure Infer Grid for OMR Checkboxes|tutorial below]] for more information on how to configure this use.
The Infer Grid method is the easiest way to read checkbox states inside a table.  Once the table's structure is found using the axis extractors, you can choose which columns contain checkboxes.  Grooper will use layout data obtained from a Box Detection or Box Removal IP Command to determine if the box is filled in or left blank.  Refer to the [[Infer Grid (Table Extract Method)#Configure Infer Grid for OMR Checkboxes|tutorial below]] for more information on how to configure this use.


{|
{|style="margin:auto"
|+Marking the "Farm" and "Simulator" columns as OMR Columns in the Infer Grid Property Panel will return a value of "True" if the box is checked and "False" if it is blank.
|+Marking the "Farm" and "Simulator" columns as OMR Columns in the Infer Grid Property Panel will return a value of "True" if the box is checked and "False" if it is blank.
[[file:infer grid omr.png|center]]
[[file:infer grid omr.png|center]]

Revision as of 13:48, 27 January 2020

Infer Grid is one of three methods to extract data from tables on documents. It uses the positional location of row and column headers to interpret where a tabluar grid would be around each value in a table and extract values from each cell in the interpreted grid.

This method extracts information by inferring a grid from the header positions.  This is done by assigning an "X Axis Extractor" to match the column headers and a "Y Axis Extractor" to match row headers.  A grid is created from the header positions extracted from the two extractors.  Furthermore, if table line positions can be obtained from a Line Detection or Line Removal IP Command, only one Axis Extractor is needed. In these cases, the X Axis Extractor can be used to find the column header labels, and the grid will be created using the table lines in the documents layout data. The raw text data obtained from the Recognize activity will populate each cell of the grid according to where it is on the page.


Use Cases

Non-Standard Tables

The Infer Grid method excels at many cases where the table structure is not easily understood by the Row Match or Header-Value methods. This is especially true for tables with table lines present. Examine the table below.

Row Match might work, but it would be a heavy lift. First, each row's pattern is different. There are names on one, addresses on another, phone numbers on another. Every row has a different pattern. It would take some creative configuration. You could try to make a row out of the columns. It would take a series of extractors, be very effort intensive and complicated to set up.

Header-Value would also have problems. The column header labels ("Lender", "Mortgage Broker", etc), would be straightforward. But the value extractors would be tricky. It's possible a generic text segment extractor could get you close, but at least the "Address" row presents problems because it is a two line value instead of a single line. Again, it could be doable, but it would take some effort.

Row Match can do this job with a single extractor. All you would need to do is write an extractor to find the "Y Axis"; so all the column header labels in a row.

Since table lines are present, the text falling inside each cell (obtained via the Recognize activity could be extracted to the corresponding cell in the column

Furthermore, if table lines are not present, Infer Grid can use both both the row and column header labels by using both the "Y Axis Extractor" and "X Axis Extractor" properties. We can use two extractors, one to return all the Y Axis labels and one to return the X Axis labels, and use their positions to infer the table's structure.


OMR Checkboxes

The Infer Grid method is the easiest way to read checkbox states inside a table. Once the table's structure is found using the axis extractors, you can choose which columns contain checkboxes. Grooper will use layout data obtained from a Box Detection or Box Removal IP Command to determine if the box is filled in or left blank. Refer to the tutorial below for more information on how to configure this use.

Marking the "Farm" and "Simulator" columns as OMR Columns in the Infer Grid Property Panel will return a value of "True" if the box is checked and "False" if it is blank.

How To

Configure Infer Grid for OMR Checkboxes