Tabular Layout (Table Extract Method): Difference between revisions

Revision as of 08:25, 4 November 2025

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.

This article is about the current version of Grooper.

Note that some content may still need to be updated.

Tabular Layout is a Table Extract Method that uses column header values determined by the view_column Data Columns Header Extractor results (or labels collected for the Data Columns when a Labeling Behavior is enabled) as well as Data Column Value Extractor results to model a table's structure and return its values.

Introduction

The Tabular Layout Table Extract Method is a powerful tool in Grooper for extracting structured tabular data from documents. It automatically detects table headers, rows, and footers using a combination of value extractors and layout analysis. Tabular Layout is ideal for documents where tables are clearly defined, such as invoices, statements, and reports.

Unlike other Table Extraction Methods (such as Row Match or Delimited Extract), Tabular Layout leverages header and footer labels, supports multi-line and stacked layouts, and provides advanced configuration for handling complex table structures.

When to use

Tabular Layout is best used when:

Tables have clearly defined headers and rows.
You need to extract data from grid-based tables, including those with merged or stacked cells.
Tables may span multiple pages or regions.

Example: Use Tabular Layout to extract line items from an invoice where each row contains "Quantity," "Description," "Unit Price," and "Total," and headers are present.

Drawbacks:

Tabular Layout may be less effective for highly irregular tables or lists without clear headers.
For simple delimited data (e.g., CSV), Delimited Extract may be more efficient.
Requires well-defined header labels or extractors for best results.

@@ Line 1: / Line 1: @@
-{{Migrated}}
+{|class="wip-box"
-{{2024:{{PAGENAME}}}}
+|
+'''WIP'''
+|
+This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.
+This tag will be removed upon draft completion.
+|}
+{{AutoVersion}}
+<blockquote>{{#lst:Glossary|Tabular Layout}}</blockquote>
+== Introduction ==
+The '''Tabular Layout Table Extract Method''' is a powerful tool in Grooper for extracting structured tabular data from documents. It automatically detects table headers, rows, and footers using a combination of value extractors and layout analysis. Tabular Layout is ideal for documents where tables are clearly defined, such as invoices, statements, and reports.
+Unlike other Table Extraction Methods (such as Row Match or Delimited Extract), Tabular Layout leverages header and footer labels, supports multi-line and stacked layouts, and provides advanced configuration for handling complex table structures.
+== When to use ==
+Tabular Layout is best used when:
+* Tables have clearly defined headers and rows.
+* You need to extract data from grid-based tables, including those with merged or stacked cells.
+* Tables may span multiple pages or regions.
+'''Example:''' Use Tabular Layout to extract line items from an invoice where each row contains "Quantity," "Description," "Unit Price," and "Total," and headers are present.
+'''Drawbacks:'''
+* Tabular Layout may be less effective for highly irregular tables or lists without clear headers.
+* For simple delimited data (e.g., CSV), Delimited Extract may be more efficient.
+* Requires well-defined header labels or extractors for best results.