Grid Layout (Table Extract Method)

From Grooper Wiki
(Redirected from Grid Layout)

This article is about the current version of Grooper.

Note that some content may still need to be updated.

20252024 2023

The Grid Layout Table Extract Method uses the positional location of row and column headers to interpret where a tabular grid would be around each value in a table and extract values from each cell in the interpreted grid.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2025). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

Introduction

Grid Layout is a table extract method used by Data Tables to read tabular data when both column headers (X axis) and row headers (Y axis) are present in the document. It infers a matrix (grid) by intersecting detected header positions and maps each cell to a corresponding Data Column in the output. Depending on whether lines are present, and/or the layout of the table, the "Y axis" extractor may not be necessary. Grid Layout also allows the reading of OMR boxes within a table.

How it differs from other extract methods:

  • Tabular Layout: Detects a single header row and reads rows beneath it; ideal for line-item tables with clearly labeled columns.
  • Row Match: Matches whole rows using a single row-level extractor (e.g., regex with named groups) and is useful when headers are absent or irregular.
  • Fixed Width: Parses columns in monospaced text using predefined character widths.
  • Delimited Extract: Reads data from CSV or other delimited files with configurable column mappings.
  • Fluid Layout: Automatically alternates between Tabular Layout and Row Match based on Label Set configuration.
  • AI Table Reader: Uses generative AI to infer tables from semi-structured content with prompt-driven extraction.

Grid Layout is purpose-built for cross-tab or matrix-style tables where columns and rows are labeled and the intersection contains values (e.g., months vs. metrics).

What it is for

Ideal use cases

  • Cross-tab reports and matrices (e.g., months on Y axis, metrics on X axis).
  • Financial summaries that use both top headers and left-side row labels.
  • Pivoted tables where the document layout swaps rows and columns.

Benefits

  • Robust mapping via header intersections when both axes are labeled.
  • Supports transposed documents using the "Transpose" property.
  • Per-column control via Grid Layout Options extensions (e.g., read method, column required).
  • Can leverage snap-to-lines to improve alignment in imperfect scans.
  • Supports OMR reads for checkbox-like cells via "Minimum Fill Weight".

Drawbacks

  • Does not support multi-page tables; extraction is limited to a single page region.
  • Requires reliable detection of both axes or quality line detection when the Y axis is not extracted.
  • Header naming must align to Data Column names for consistent mapping (or be transposed accordingly).

How to add and configure Grid Layout

Follow these steps on the Data Model design surface:

FYI

Please see the demos below for example setups with screenshots and highlighted instructions.

  1. Create the Data Table and Columns:
    • Create a Data Table and add the necessary Data Columns representing the table’s fields. Name columns to match header cells (or the transposed mapping).
    • Set the table’s "Extract Method" to Grid Layout.
  2. Configure Grid Layout properties:
    • "X Axis Extractor": Matches the column header row and returns children for each column header cell.
    • "Y Axis Extractor" (optional): Matches the row header column and returns children for each row header cell. If not set, enable snap-to-lines and set "Maximum Row Count".
    • "Header Column" (visible when both X and Y extractors are set): Choose the Data Column that should receive row header text.
      • This would be the column defined by the "Y Axis" extractor.
    • Snap-to-lines settings: "X Axis Snap Limits", "Y Axis Snap Limits", and "Line Snap Margin" to improve alignment to detected line geometry.
    • "Minimum Fill Weight": Enable OMR-style reads for checkbox-like cells.
      • This allows black pixel fill to act as the OMR read, instead of needing a specific box from layout data.
    • "Maximum Row Count": If Y axis is not extracted, set the maximum number of detected rows via line detection.
    • "Transpose": Enable if the document presents data rotated (rows as columns).
  3. Test extraction:
    • Run a Batch through an Extract step or use the Tester tab of the Data Table or Data Model.
    • Review diagnostics to confirm header matches, snap-to-lines behavior, and final row/column cell values.
  4. Troubleshoot:
    • If header mapping fails, adjust extractor patterns to match the entire header row/column and ensure child names align to Data Column names (or enable "Transpose").
    • If rows are misaligned, tune snap limits and margin, or add a Y Axis Extractor.
    • If checkbox reads are inconsistent, set "Minimum Fill Weight" for OMR-based detection.

Example: Grid Layout with lines

The first and most effective way to use Grid Layout is when working with tables that have detected line information in the layout data of the containing document. This line information is collected by an IP Profile using Detect or Remove lines during the Recognize activity. When lines are present in a table, you need only configure the X Axis Extractor, which in nearly all cases will find the headers of the table. Once the headers are established, Grooper can read the lines of the table and infer the grid that forms it.

  1. Expand the Node Tree and select the "Grid Layout - With Lines" Data Table from the provided Project.
    • Notice the Extract Method property is set to "Grid Layout".
  2. Click the drop-down arrow to the left of the Extract Method property set to "Grid Layout" to expand the Extract Method sub-properties. Notice the X Axis Extractor property is set to "Pattern Match".
    • While not the only extract method available here, this would be the most common one to use for this purpose.
  3. Click the ellipsis button to the right of the X Axis Extractor property set to Pattern Match to open the Pattern Match editor.
  4. Click the "Select Batch" button in the Batch Viewer, then be sure to select the provided "Grid-Layout" Batch.
  5. Select the first Batch Folder in the Batch Viewer. Notice the regular expression pattern of the Value Pattern field is using named capture groups to find the headers of the table.
    • The names of these capture groups should match exactly the names of the target Data Columns.
  6. Click the "Data Inspector" button.
  7. Notice there are sub-element data instances for each value returned from a capture group.
    • Because there are sub-elements, these values can correspond directly to their target Data Columns.
  8. Back on the Data Table, click the Tester tab.
  9. Make sure the first Batch Folder is selected in the Batch Viewer, then click the "Test Extraction" button.
  10. Notice all data is returned to the appropriate columns.
    • By finding the headers of the table, and the fact that lines are present in the layout data of the Batch Folder, the grid of the table is inferred and the values placed appropriately.
    • Please note that the lines are present in the layout data of the Batch Folder as a result of an IP Profile with detect or remove lines was leveraged during the Recognize activity used against this document.

Example: Grid Layout without lines

Lines of a table are not strictly necessary for Grid Layout to function. A cross section of the X Axis Extractor and a Y Axis Extractor can be used to infer the grid of the table. While this works in many cases, in the example we'll see how the inference of the grid without the lines of the table can, in some cases, cause issues.

  1. Expand the Node Tree and select the "Order Date Array" Data Type from the provided Project. Notice the Local Extractor property is set to "Pattern Match".
    • Any extractor type can be used, but this works well for this example.
  2. Notice the Collation property is set to Array. Expand its sub-properties and you'll also notice the Vertical Layout sub-property of the Array collation is enabled.
  3. Click the Tester tab.
  4. Click the "Select Batch" button in the Batch Viewer, and be sure to select the provided "Grid-Layout" Batch.
  5. Select the second Batch Folder in the Batch Viewer. Notice all of the "Order Date" dates are collected into a single array.
    • It's critical that an extractor used for the Y Axis Extractor of Grid Layout return a result for every row.
  6. Click the "Data Inspector" button.
  7. Notice there is a sub-element instance for each value returned, which will represent each row in the table.
  8. Expand the Node Tree and select the "Grid Layout - Without Lines" Data Table within the provided Project. Notice the Extract Method property is set to "Grid Layout".
  9. Click the drop-down arrow to the left of the Extract Method property set to Grid Layout to expand its sub-properties. Notice the X Axis Extractor property is set to "Pattern Match".
  10. Click the ellipsis button to the right of the X Axis Extractor property set to "Pattern Match" to open the Pattern Match editor.
  11. Notice the regular expression pattern is using named capture groups to return the headers of the table.
    • The intersection of the results returned by the X Axis Extractor and the Y Axis Extractor will infer the grid of the table without lines.
  12. Back on the Data Table, notice the Y Axis Extractor is referencing the "Order Date Array" Data Type viewed previously. Notice also the Header Column property is pointed at the "OrderDate" Data Column.
    • While the results of the "OrderDate" column aren't exactly row headers, they will function as the vertical intersection with the table headers of the X Axis Extractor to infer the grid of the table.
  13. Click the Tester tab.
  14. Make sure the second Batch Folder is selected in the Batch Viewer, then click the "Test Extraction" button.
  15. You'll see results are returned, but they're incorrect.
    • The "Order I.D." column is pulling in the first letter from the "Salesperson" column, which is being truncated as a result.
  16. Click the "Data Inspector" button.
  17. Expand one of the row instances then select the "OrderDate" instance. This seems to be returning accurate information.
  18. Select the "Order I.D." instance.You can see that due to the way the grid was inferred that its cutting into the data of the next column.
  19. Select the "Salesperson" instance.Again here you can see the way the grid was inferred it is truncating the results.
    • This demonstrates a shortcoming of Grid Layout when working with tables without lines.

Example: Grid Layout - OMR boxes

Grid Layout is the only Table Extract Method that allows for the collection of the boolean results of OMR boxes within a table. Its configuration is not different than normal for the Data Table, however, Data Columns must also be configured to use the "OMR" Read Method in order for OMR boxes to be read.

  1. Expand the Node Tree and select the "Grid Layout - OMR Boxes" Data Table from the provided Project. Notice the Extract Method property is set to "Grid Layout".
  2. Click the drop-down arrow to the left of the Extract Method property set to "Grid Layout" to expand its sub-properties. Notice the X Axis Extractor property is set to "Pattern Match".
  3. Click the ellipsis button to the right of the X Axis Extractor property set to "Pattern Match" to open the Pattern Match editor.
  4. Click the "Select Batch" button in the Batch Viewer, then be sure to select the provided "Grid-Layout" Batch.
  5. Select the third Batch Folder in the Batch Viewer. Notice the regular expression pattern is using named capture groups to find the headers of the tables.
  6. You may notice this table consists of two side-by-side sub-tables.
  7. For reading OMR boxes, we also need to configure the child Data Columns. Expand the Node Tree and select the "Farm" Data Column. Notice the Read Method sub-property of the Grid Layout Options property is set to OMR.
  8. Select the "Simulator" Data Column. Its Read Method property is also set to "OMR".
  9. Select the "Grid Layout - OMR Boxes" Data Table, then click the Tester tab.
  10. Make sure the third Batch Folder is selected in the Batch Viewer, then click the "Test Extraction" button.
  11. You'll notice all results are returned, including the boolean results of the OMR boxes. The two side-by-side tables are combined into one data set.

Properties overview

Below are the properties exposed by Grid Layout along with their definitions and typical uses.

  • X Axis Extractor
    • Definition: An extractor that matches the column headers along the X axis and returns child instances for each column header.
    • Remarks: Should match the entire header row and return named groups or children matching Data Column names (when "Transpose" is false). Can use Pattern Match or a collation provider (e.g., Ordered Array). Robustness to merged or multi-line headers is recommended; test across samples.
    • Purpose/Use case: Establishes the set and order of columns for the output table by detecting the top header row.
  • Y Axis Extractor
    • Definition: Optional extractor that matches row headers along the Y axis and returns child instances for each row label.
    • Remarks: Recommended for tables with explicit row headers (e.g., months or categories). If not set, rows are found via line detection; enable snap-to-lines for Y and set "Maximum Row Count".
    • Purpose/Use case: Detects row labels and helps build the matrix grid; allows mapping row header values into a specific column via "Header Column".
  • Maximum Row Count
    • Definition: Sets the maximum number of rows detected when there is no Y Axis Extractor.
    • Remarks: 0 means unlimited. Ignored if a Y Axis Extractor is configured.
    • Purpose/Use case: Stabilizes row detection in line-detected scenarios, improving predictability and limiting over-detection.
  • X Axis Snap Limits
    • Definition: Logical border limits used to snap detected X axis headers to nearby lines (e.g., 0.25in).
    • Remarks: Enter a measurement to enable; leave blank to disable snap-to-lines. Helps align headers to actual line geometry.
    • Purpose/Use case: Improves header alignment in imperfect scans or visually noisy documents.
  • Y Axis Snap Limits
    • Definition: Logical border limits used to snap detected Y axis headers to nearby lines (e.g., 0.25in).
    • Remarks: Enter a measurement to enable; leave blank to disable snap-to-lines. Supports row header alignment.
    • Purpose/Use case: Ensures row header positions align to line geometry for more accurate grid inference.
  • Line Snap Margin
    • Definition: Additional margin to shrink the snap zone on each edge.
    • Remarks: Use a small value (e.g., 0.05in) to avoid snapping to unintended lines; only applies when snap-to-lines is enabled.
    • Purpose/Use case: Fine-tunes snapping to reduce false matches and improve grid fidelity.
  • Minimum Fill Weight
    • Definition: Threshold for OMR-style cell reads; if blank, reads use layout data from box detection/removal. If a value is set (e.g., 2pt), a cell is considered filled if it contains enough black pixels to fill a square of the specified size.
    • Remarks: Useful for checkbox-like cells or marks.
    • Purpose/Use case: Converts visual marks into boolean or checked-state values using pixel density.
  • Header Column
    • Definition: Selects a Data Column to receive row header values when both axes are extracted.
    • Remarks: Visible only when both "X Axis Extractor" and "Y Axis Extractor" are set.
    • Purpose/Use case: Saves the Y axis labels (e.g., month names) into a chosen output column.
  • Transpose
    • Definition: If true, rows from the source table become columns in the output (and vice versa).
    • Remarks: Use when the document presents a transposed matrix.
    • Purpose/Use case: Correctly maps values from cross-tab or pivoted layouts without changing column definitions.

Configuration tips

  • Align names: When "Transpose" is false, ensure X axis child instance names match Data Column names. When true, ensure Y axis names align appropriately.
  • Use snap-to-lines: Set "X Axis Snap Limits" and "Y Axis Snap Limits" with a small "Line Snap Margin" to stabilize header alignment in scanned documents.
  • OMR cells: Use "Minimum Fill Weight" for checkbox-like cells to convert visual marks reliably.
  • Row headers: Prefer a "Y Axis Extractor" for labeled matrices; if omitted, set "Maximum Row Count" and enable Y snap limits.

Testing and diagnostics

  • Extract the Data Table and review detected headers and rows. Ensure cell values align with the intended row/column intersections.
  • Validate output using the table instance grid. Confirm that row header values populate the "Header Column" (if configured).
  • If needed, iterate on extractor patterns:
    • Header row (X): use named groups matching column names, or an Ordered Array collation with consistent ordering.
    • Row headers (Y): ensure patterns match each row label reliably across pages and samples.

Notes

  • Grid Layout does not support multi-page tables.
  • If multiple table instances are detected on a page, Grid Layout combines all rows into a single output table.
  • Use the Read Method property on Data Columns to set OMR box reading, or to allow the re-OCR of tricky columns.