2023:Fluid Layout (Table Extract Method): Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Line 6: Line 6:


The '''''Flow Layout''''' configuration will execute the '''''Row Match''''' method for the document, provided the '''Data Table's''' '''''Header''''' label is collected and NO '''Data Column''' labels are collected for the '''Document Type'''.
The '''''Flow Layout''''' configuration will execute the '''''Row Match''''' method for the document, provided the '''Data Table's''' '''''Header''''' label is collected and NO '''Data Column''' labels are collected for the '''Document Type'''.
== About ==
{|class="download-box"
|
[[File:Asset 22@4x.png]]
|
You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023). The first contains one or more '''Batches''' of sample documents.  The second contains one or more '''Projects''' with resources used in examples throughout this article.
* [[Media:2023 Wiki Fluid-Layout Batch.zip]]
* [[Media:2023 Wiki Fluid-Layout Project.zip]]
|}
{|class="fyi-box"
|
'''FYI'''
|
This article is taken from the larger [[Label Sets]] article.
The '''''Fluid Layout''''' method is dependent on Label Sets to function and must have a '''''Labeling Behavior''''' enabled to execute properly.  For more information on Label Sets and the '''''Labeling Behavior''''', please visit the [[Label Sets]] article.
|}
{{#lst:2023:Labeling Behavior|Fluid Layout}}


== Glossary ==
== Glossary ==
Line 31: Line 51:


<u><big>'''Tabular Layout'''</big></u>: {{#lst:Glossary|Tabular Layout}}
<u><big>'''Tabular Layout'''</big></u>: {{#lst:Glossary|Tabular Layout}}
== About ==
{|class="download-box"
|
[[File:Asset 22@4x.png]]
|
You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023). The first contains one or more '''Batches''' of sample documents.  The second contains one or more '''Projects''' with resources used in examples throughout this article.
* [[Media:2023 Wiki Fluid-Layout Batch.zip]]
* [[Media:2023 Wiki Fluid-Layout Project.zip]]
|}
{|class="fyi-box"
|
'''FYI'''
|
This article is taken from the larger [[Label Sets]] article.
The '''''Fluid Layout''''' method is dependent on Label Sets to function and must have a '''''Labeling Behavior''''' enabled to execute properly.  For more information on Label Sets and the '''''Labeling Behavior''''', please visit the [[Label Sets]] article.
|}
{{#lst:2023:Labeling Behavior|Fluid Layout}}

Revision as of 16:25, 26 August 2024

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

202520232021

The Fluid Layout Table Extract Method will choose between Tabular Layout and Flow Layout configurations, depending on how labels are collected for a description Document Type.

The Tabular Layout configuration will execute the Tabular Layout method for the document, provided Data Column labels are collected for the Document Type.

The Flow Layout configuration will execute the Row Match method for the document, provided the Data Table's Header label is collected and NO Data Column labels are collected for the Document Type.

About

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

FYI

This article is taken from the larger Label Sets article.

The Fluid Layout method is dependent on Label Sets to function and must have a Labeling Behavior enabled to execute properly. For more information on Label Sets and the Labeling Behavior, please visit the Label Sets article.

2023:Labeling Behavior

Glossary

Batch: inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

Behavior: A "Behavior" is one of several features applied to a Content Type (such as a description Document Type). Behaviors affect how certain Activities and Commands are executed, based how a document (folder Batch Folder) is classified. They behave differently, according to their Document Type. This includes how they are exported (how Export behaves), if and how they are added to a document search index (how the various indexing commands behave), and if and how Label Sets are used (how Classify and Extract behave in the presence of Label Sets).

  • Each Behavior is enabled by adding it to a Content Type. They are configured in the Behaviors editor.
  • Behaviors extend to descendent Content Types, if the descendent Content Types has no Behavior configuration of its own.
    • For example, all Document Types will inherit their parent Content Model's Behaviors.
    • However, if a Document Type has its own Behavior configuration, it will be used instead.

Data Column: view_column Data Columns represent columns in a table extracted from a document. They are added as child nodes of a table Data Table. They define the type of data each column holds along with its data extraction properties.

  • Data Columns are frequently referred to simply as "columns".
  • In the context of reviewing data in a Data Viewer, a single Data Column instance in a single Data Table row, is most frequently called a "cell".

Data Table: A table Data Table is a Data Element specialized in extracting tabular data from documents (i.e. data formatted in rows and columns).

  • The Data Table itself defines the "Table Extract Method". This is configured to determine the logic used to locate and return the table's rows.
  • The table's columns are defined by adding view_column Data Column nodes to the Data Table (as its children).

Document Type: description Document Type nodes represent a distinct type of document, such as an invoice or a contract. Document Types are created as child nodes of a stacks Content Model or a collections_bookmark Content Category. They serve three primary purposes:

  1. They are used to classify documents. Documents are considered "classified" when the folder Batch Folder is assigned a Content Type (most typically, a Document Type).
  2. The Document Type's data_table Data Model defines the Data Elements extracted by the Extract activity (including any Data Elements inherited from parent Content Types).
  3. The Document Type defines all "Behaviors" that apply (whether from the Document Type's Behavior settings or those inherited from a parent Content Type).

Extract: export_notes Extract is an Activity that retrieves information from folder Batch Folder documents, as defined by Data Elements in a data_table Data Model. This is how Grooper locates unstructured data on your documents and collects it in a structured, usable format.

Fluid Layout: The Fluid Layout Table Extract Method will choose between Tabular Layout and Flow Layout configurations, depending on how labels are collected for a description Document Type.

Labeling Behavior: A Labeling Behavior extends "label set" functionality to description Document Types. This allows you to collect field labels and other labels present on a document and use them in a variety of ways. This includes functionality for classification, field extraction, table extraction, and section extraction.

Project: package_2 Projects are the primary containers for configuration nodes within Grooper. The Project is where various processing objects such as stacks Content Models, settings Batch Processes, profile objects are stored. This makes resources easier to manage, easier to save, and simplifies how node references are made in a Grooper Repository.

Row Match: The Row Match Table Extract Method uses regular expression pattern matching to determine a tables structure based on the pattern of each row and extract cell data from each column.

Table Extract Method: A Table Extract Method defines the settings and logic for a table Data Table to perform extraction. It is set by configuring the Extract Method property of the Data Table.

Tabular Layout: The Tabular Layout Table Extract Method uses column header values determined by the view_column Data Columns Header Extractor results (or labels collected for the Data Columns when a Labeling Behavior is enabled) as well as Data Column Value Extractor results to model a table's structure and return its values.