Tabular Layout (Table Extract Method)

From Grooper Wiki
Revision as of 16:21, 4 November 2025 by Rpatton (talk | contribs)

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.


This article is about the current version of Grooper.

Note that some content may still need to be updated.

20252024 20232021

Tabular Layout is a Table Extract Method that uses column header values determined by the view_column Data Columns Header Extractor results (or labels collected for the Data Columns when a Labeling Behavior is enabled) as well as Data Column Value Extractor results to model a table's structure and return its values.

Introduction

The Tabular Layout Table Extract Method is a powerful tool in Grooper for extracting structured tabular data from documents. It automatically detects table headers, rows, and footers using a combination of value extractors and layout analysis. Tabular Layout is ideal for documents where tables are clearly defined, such as invoices, statements, and reports.

Unlike other Table Extraction Methods (such as Row Match or Delimited Extract), Tabular Layout leverages header and footer labels, supports multi-line and stacked layouts, and provides advanced configuration for handling complex table structures.

When to use

Tabular Layout is best used when:

  • Tables have clearly defined headers and rows.
  • You need to extract data from grid-based tables, including those with merged or stacked cells.
  • Tables may span multiple pages or regions.

Example: Use Tabular Layout to extract line items from an invoice where each row contains "Quantity," "Description," "Unit Price," and "Total," and headers are present.

Drawbacks:

  • Tabular Layout may be less effective for highly irregular tables or lists without clear headers.
  • For simple delimited data (e.g., CSV), Delimited Extract may be more efficient.
  • Requires well-defined header labels or extractors for best results.

What is a table?

A table in document processing is a structured arrangement of data in rows and columns. Its main components are:

  • Headers: The top section that labels each column (e.g., "Quantity", "Description").


  • Rows: Horizontal groupings of related data, each representing a record or item.


  • Columns: Vertical divisions, each capturing a specific type of data (e.g., price, date).


  • Footers: The bottom section, often used for totals or summary information.


Common use cases for tables in documents include:

  • Invoice line items
  • Transaction logs
  • Product lists
  • Financial summaries

Basic setup

Grooper must be able to detect the columns and rows of a table to extract data. The Tabular Layout does this by identifying the column headers, which indicates where the columns are located on the document. Then at least one Value Extractor must be set on a Data Column that will return a result on each row of the table, giving Grooper context for where the rows of the table are located.

Step 1: Create the Data Elements and select the Extract Method It is assumed that you already have a Project set up in Grooper with a Content Model, Document Type, and Data Model already created in Grooper before following these instructions.

  1. Right click on your Data Model.
  2. Hover over "Add" and select "Data Table..." from the fly out menu.
  3. When the "Add" window appears, enter a name for your Data Table in the Name property.
  4. When satisfied with the naming, click "Execute" to add the Data Table.
  5. Add the Data Columns as children of the Data Table using one of the following methods:
    • One at a time
      1. Right-click on the Data Table.
      2. Hover-over "Add" and select "Data Column..." from the fly out menu.
      3. when the "Add" window appears, enter a name for your Data Column in the Name property.
      4. When satisfied, click "Execute" to create the Data Column.
      5. Repeat steps 1-4 to add as many Data Columns as you would like.
    • Multiple at once
      1. Right-click on the Data Table.
      2. Hover-over "Contents" and click "Add Multiple Items..." from the fly out menu.
      3. When the "Add Multiple Items" window appears, make sure the Item Type property is set to Data Column.
      4. Click the "..." icon to the right of the Item Names property.
      5. When the Item Names window appears, type in the names you want to give to the Data Columns in the text box. Hit enter after each name.
      6. When finished, click "OK".
      7. Back on the Add Multiple Items window, click "Execute" to create the Data Columns.
  6. Next, select the Data Table in your Node Tree.
  7. Click the "☰" to the right of the Extract Method property.
  8. Click on "Tabular Layout" in the drop out menu.
  9. Click the save icon at the top of the property grid to save your changes to the Data Table.