2023.1:Footer Rows and Footer Modes (Data Table Functionality): Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Line 11: Line 11:
* [[Media:2023.1_Wiki_Footer-Rows-and-Footer-Modes_Project.zip]]
* [[Media:2023.1_Wiki_Footer-Rows-and-Footer-Modes_Project.zip]]
|}
|}
== Glossary ==
<u><big>'''Batch'''</big></u>: {{#lst:Glossary|Batch}}
<u><big>'''Data Column'''</big></u>: {{#lst:Glossary|Data Column}}
<u><big>'''Data Export'''</big></u>: {{#lst:Glossary|Data Export}}
<u><big>'''Data Table'''</big></u>: {{#lst:Glossary|Data Table}}
<u><big>'''Data Type'''</big></u>: {{#lst:Glossary|Data Type}}
<u><big>'''Document Type'''</big></u>: {{#lst:Glossary|Document Type}}
<u><big>'''Export'''</big></u>: {{#lst:Glossary|Export}}
<u><big>'''Extract'''</big></u>: {{#lst:Glossary|Extract}}
<u><big>'''Footer Rows and Footer Modes'''</big></u>: {{#lst:Glossary|Footer Rows and Footer Modes}}
<u><big>'''OCR'''</big></u>: {{#lst:Glossary|OCR}}
<u><big>'''Project'''</big></u>: {{#lst:Glossary|Project}}
<u><big>'''Review'''</big></u>: {{#lst:Glossary|Review}}
<u><big>'''Tabular Layout'''</big></u>: {{#lst:Glossary|Tabular Layout}}


== About ==
== About ==
Line 172: Line 145:


=== Select a Footer Mode for Data Columns ===
=== Select a Footer Mode for Data Columns ===
== Glossary ==
<u><big>'''Batch'''</big></u>: {{#lst:Glossary|Batch}}
<u><big>'''Data Column'''</big></u>: {{#lst:Glossary|Data Column}}
<u><big>'''Data Export'''</big></u>: {{#lst:Glossary|Data Export}}
<u><big>'''Data Table'''</big></u>: {{#lst:Glossary|Data Table}}
<u><big>'''Data Type'''</big></u>: {{#lst:Glossary|Data Type}}
<u><big>'''Document Type'''</big></u>: {{#lst:Glossary|Document Type}}
<u><big>'''Export'''</big></u>: {{#lst:Glossary|Export}}
<u><big>'''Extract'''</big></u>: {{#lst:Glossary|Extract}}
<u><big>'''Footer Rows and Footer Modes'''</big></u>: {{#lst:Glossary|Footer Rows and Footer Modes}}
<u><big>'''OCR'''</big></u>: {{#lst:Glossary|OCR}}
<u><big>'''Project'''</big></u>: {{#lst:Glossary|Project}}
<u><big>'''Review'''</big></u>: {{#lst:Glossary|Review}}
<u><big>'''Tabular Layout'''</big></u>: {{#lst:Glossary|Tabular Layout}}

Revision as of 16:25, 26 August 2024

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252023.1

A "Footer Row" is a row at the bottom of a table Data Table that displays sum totals for numerical view_column Data Columns. This can help Data Viewer users validate data Grooper extracts for one or more Data Columns. The Data Column's "Footer Mode" controls if a sum calculation is performed or not (and if Tabular Layout's "Capture Footer Row" creates the Footer Row if and how document data is used to capture and validate the footer value).

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

About

We will begin by addressing some simple questions you may have.

What is a footer?

A "footer", in terms of table extraction, is a line that indicates you've reached the last row of a table.

What is a footer row?

A "footer row" is a special row at the end of a table that displays totals for some or all numerical columns in the table.

How do I display a footer row in Grooper?

There are two ways depending on what Data Table extract method you're using.
  • For all table extract methods, you can enable the Data Table's "Generate Footer Row" property to generate a blank footer row.
    • This option is useful for situations where you want Grooper to simply calculate the sum total of Data Columns.
    • When enabled, a footer row will always be present in the table Grooper extracts, whether a footer row is present on the document itself or not.
    • To generate a computed sum of the Data Column's values, the Data Column's Footer Mode must be set to Calculate.
  • For the Tabular Layout method only, you can collect a footer row from the document itself by enabling the Capture Footer Row property.
    • This will extract a row from the document using either the Footer Detection extractor's result or a Footer label (if collected in the Document Type's Label Set) for the Data Table
    • Creating a footer row in this way has additional Footer Mode options. You can simply populate the footer row with values collected from the document. You can validate the value collected totals up according to values collected for a column. There are also modes that allow you to calculate a footer value if not present on the document (or otherwise not extracted from the document).
    • Enabling this property will override the Generate Footer Row property, if configured.

What is a "Footer Mode"?

This property determines how footer values are extracted or calculated for each Data Column. This property is located on each individual Data Column's property panel. This property and its configuration options will be discussed in more detail in the following "how to" sections of this article:

Footer values will ONLY be collected/calculated for Data Types with numerical Value Types (Decimal, Double, Int16, Int32, Int64).

  • Furthermore, the Footer Mode property will only appear for Data Columns with numerical Value Types selected.
  • You will not see this property if the Data Column is using the default "String" Value Type.

How to use Generate Footer Row

A footer row can be created for any Data Table by enabling the Generate Footer Row property.

While there are multiple Footer Mode options, only two apply for footer rows generated by the Generate Footer Row property:

  • Disabled - This will disable footer calculation for a Data Column
  • Calculate - This will enable footer calculation for a Data Column


Generate Footer Row can create footer rows regardless of whether or not a footer row appears on the document. In the case of the image below:

  • "Table A" has a footer row present on the document totaling the "Salary" column.
  • "Table B" has no such row on the document.

In either case, the Generate Footer Row property can generate the footer row and calculate the totaled value for the column, as seen in the image below.

For this example, the "Salary" Data Column's Footer Mode was set to Calculate, allowing Grooper to add up each of the collected "Salary" column values in both cases.

  • Please note, no document extraction is taking place in this case. Grooper is simply adding up the values in the "Salary" column to get to this number (i.e. 10000 + 25000 + 15000 = 50000)
  • Furthermore, the "Employee ID" Data Column's Footer Mode was set to Disabled. Even though these are numerical values, it does not make sense for us to total them in a footer row.

FYI

Why would I ever use this?

Commonly, Generate Footer Row is used to aid data reviewers executing a Grooper Review step. It can be an "at-a-glance" way to verify if the numbers Grooper collects for a column add up correctly.
  • This can help catch:
    • OCR errors for that column.
    • Mistakes on the source document, such as an improperly listed total value.
Less commonly, the footer data may be exported during the Export step.
  • The footer row is collected and stored as a special footer instance in the Data Table's extracted data (saved in each document's "Grooper.DocumentData.json" file).
  • Certain Export Formats, such as XML Metadata and JSON Metadata, will include this footer instance as part of the data in the exported file.
  • Be aware, Data Export will not export the footer instance to a database table.

Enable Generate Footer Row

To create a footer row using Generate Footer Row you first must enable the property.

  1. Select the Data Table.
  2. Turn the Generate Footer Row property to True.
  3. After saving these changes to the Data Table, you will see a blank footer row in the "Preview" panel.
FYI: Generate Footer Row can be used in conjunction with any Extract Method.

Enable footer calculation for Data Columns

After enabling Generate Footer Row, you must choose which Data Columns will have their values totaled in the footer row. To do this, you will select the Data Column and turn the Footer Mode property to Calculate.

  1. Select the Data Column.
    • In our case, we want to calculate the total of the "Salary" column.
  2. Ensure the Value Type is set to a numerical type.
    • Footer rows will only calculate totals for numerical data. Grooper can add numbers together. It can't add strings together!
  3. Change the Footer Mode to Calculate.
    • BE AWARE: This property will not appear unless the Value Type is numerical. This property will not be visible when the Value Type is String.
    • BE AWARE: Using Generate Footer Row, the only valid choices are Disabled or Calculate.



  1. Test the Data Table's extraction to verify results.
  2. Navigate to the "Tester" tab.
  3. Select a document and hit the "Test" button.
  4. Upon extraction, Data Columns whose Footer Mode is Calculate will be totaled in the footer row.



FYI

Generate Footer Row will always create the footer row, whether or not there is one on the document.

  • Nothing is actually being extracted from the document to populate the footer value.
  • Instead, Grooper is just adding up each of the previously extracted values for the Data Column.

How to use Capture Footer Row

FYI

The big difference between Capture Footer Row and Generate Footer Row is this:

  • Capture Footer Row obtains the footer row by extracting data from the document, whereas Generate Footer Row does not (It adds up data previously extracted).


Despite this difference, there is no difference in how the data Grooper collects for the Data Table is exported by the Export activity.

Enable Capture Footer Row

Select a Footer Mode for Data Columns

Glossary

Batch: inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

Data Column: view_column Data Columns represent columns in a table extracted from a document. They are added as child nodes of a table Data Table. They define the type of data each column holds along with its data extraction properties.

  • Data Columns are frequently referred to simply as "columns".
  • In the context of reviewing data in a Data Viewer, a single Data Column instance in a single Data Table row, is most frequently called a "cell".

Data Export: Data Export is an Export Definition available when configuring an Export Behavior. It exports extracted document data over a database Data Connection, allowing users to export data to a Microsoft SQL Server or ODBC compliant database.

Data Table: A table Data Table is a Data Element specialized in extracting tabular data from documents (i.e. data formatted in rows and columns).

  • The Data Table itself defines the "Table Extract Method". This is configured to determine the logic used to locate and return the table's rows.
  • The table's columns are defined by adding view_column Data Column nodes to the Data Table (as its children).

Data Type: pin Data Types are nodes used to extract text data from a document. Data Types have more capabilities than quick_reference_all Value Readers. Data Types can collect results from multiple extractor sources, including a locally defined extractor, child extractor nodes, and referenced extractor nodes. Data Types can also collate results using Collation Providers to combine, sift and manipulate results further.

Document Type: description Document Type nodes represent a distinct type of document, such as an invoice or a contract. Document Types are created as child nodes of a stacks Content Model or a collections_bookmark Content Category. They serve three primary purposes:

  1. They are used to classify documents. Documents are considered "classified" when the folder Batch Folder is assigned a Content Type (most typically, a Document Type).
  2. The Document Type's data_table Data Model defines the Data Elements extracted by the Extract activity (including any Data Elements inherited from parent Content Types).
  3. The Document Type defines all "Behaviors" that apply (whether from the Document Type's Behavior settings or those inherited from a parent Content Type).

Export: output Export is an Activity that transfers documents and extracted information to external file systems and content management systems, completing the data processing workflow.

Extract: export_notes Extract is an Activity that retrieves information from folder Batch Folder documents, as defined by Data Elements in a data_table Data Model. This is how Grooper locates unstructured data on your documents and collects it in a structured, usable format.

Footer Rows and Footer Modes: A "Footer Row" is a row at the bottom of a table Data Table that displays sum totals for numerical view_column Data Columns. This can help Data Viewer users validate data Grooper extracts for one or more Data Columns. The Data Column's "Footer Mode" controls if a sum calculation is performed or not (and if Tabular Layout's "Capture Footer Row" creates the Footer Row if and how document data is used to capture and validate the footer value).

OCR: OCR is stands for Optical Character Recognition. It allows text on paper documents to be digitized, in order to be searched or edited by other software applications. OCR converts typed or printed text from digital images of physical documents into machine readable, encoded text.

Project: package_2 Projects are the primary containers for configuration nodes within Grooper. The Project is where various processing objects such as stacks Content Models, settings Batch Processes, profile objects are stored. This makes resources easier to manage, easier to save, and simplifies how node references are made in a Grooper Repository.

Review: person_search Review is an Activity that allows user attended review of Grooper's results. This allows human operators to validate processed contract Batch Page and folder Batch Folder content using specialized user interfaces called "Viewers". Different kinds of Viewers assist users in reviewing Grooper's image processing, document classification, data extraction and operating document scanners.

Tabular Layout: The Tabular Layout Table Extract Method uses column header values determined by the view_column Data Columns Header Extractor results (or labels collected for the Data Columns when a Labeling Behavior is enabled) as well as Data Column Value Extractor results to model a table's structure and return its values.