Tabular Layout (Table Extract Method): Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Line 33: Line 33:
A '''table''' in document processing is a structured arrangement of data in rows and columns. Its main components are:
A '''table''' in document processing is a structured arrangement of data in rows and columns. Its main components are:
* '''Headers''': The top section that labels each column (e.g., "Quantity", "Description").
* '''Headers''': The top section that labels each column (e.g., "Quantity", "Description").
[[File:2025 TabularLayout 003 001.png]]
* '''Rows''': Horizontal groupings of related data, each representing a record or item.
* '''Rows''': Horizontal groupings of related data, each representing a record or item.
[[File:2025 TabularLayout 003 002.png]]
* '''Columns''': Vertical divisions, each capturing a specific type of data (e.g., price, date).
* '''Columns''': Vertical divisions, each capturing a specific type of data (e.g., price, date).
[[File:2025 TabularLayout 003 003.png]]
* '''Footers''': The bottom section, often used for totals or summary information.
* '''Footers''': The bottom section, often used for totals or summary information.
[[File:2025 TabularLayout 003 004.png]]


Common use cases for tables in documents include:
Common use cases for tables in documents include:

Revision as of 09:49, 4 November 2025

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.


This article is about the current version of Grooper.

Note that some content may still need to be updated.

20252024 20232021

Tabular Layout is a Table Extract Method that uses column header values determined by the view_column Data Columns Header Extractor results (or labels collected for the Data Columns when a Labeling Behavior is enabled) as well as Data Column Value Extractor results to model a table's structure and return its values.

Introduction

The Tabular Layout Table Extract Method is a powerful tool in Grooper for extracting structured tabular data from documents. It automatically detects table headers, rows, and footers using a combination of value extractors and layout analysis. Tabular Layout is ideal for documents where tables are clearly defined, such as invoices, statements, and reports.

Unlike other Table Extraction Methods (such as Row Match or Delimited Extract), Tabular Layout leverages header and footer labels, supports multi-line and stacked layouts, and provides advanced configuration for handling complex table structures.

When to use

Tabular Layout is best used when:

  • Tables have clearly defined headers and rows.
  • You need to extract data from grid-based tables, including those with merged or stacked cells.
  • Tables may span multiple pages or regions.

Example: Use Tabular Layout to extract line items from an invoice where each row contains "Quantity," "Description," "Unit Price," and "Total," and headers are present.

Drawbacks:

  • Tabular Layout may be less effective for highly irregular tables or lists without clear headers.
  • For simple delimited data (e.g., CSV), Delimited Extract may be more efficient.
  • Requires well-defined header labels or extractors for best results.

What is a table?

A table in document processing is a structured arrangement of data in rows and columns. Its main components are:

  • Headers: The top section that labels each column (e.g., "Quantity", "Description").


  • Rows: Horizontal groupings of related data, each representing a record or item.


  • Columns: Vertical divisions, each capturing a specific type of data (e.g., price, date).


  • Footers: The bottom section, often used for totals or summary information.


Common use cases for tables in documents include:

  • Invoice line items
  • Transaction logs
  • Product lists
  • Financial summaries