Data Collection Order of Operations: Difference between revisions

From Grooper Wiki
No edit summary
Line 47: Line 47:
* Lookup events
* Lookup events
* Fill events
* Fill events
; Extraction events
: This event typically occurs once, at the start of the Extract activity. The extraction event executes all the Data Model's extractors. Extractors execute in sequence, starting with the first child Data Element going down.
:*<li class="attn-bullet"> Default Values are calculated during the extraction event.
; Validation events
: Validation events are automatic checks that Grooper performs after extracting data. They help confirm that the information in each Data Field and Data Column cell is correct and complete. Validation events flag Data Field and Data Column cells for further review.
:*<li class="attn-bullet"> Calculated Value expressions are recalculated during validation events.
: There are three types of validation events:
:* Validate All – This event checks the entire Data Model at once.
:**<li class="attn-bullet"> When a Data Rule is executed by a Data Container's "Validation Rule" configuration, it is only executed during the Validate All event, not by the Validate Field or Validate Cell events.
:* Validate Field – This event checks each individual Data Field. This is triggered every time a Data Field's value changes. Including when changed by:
:** The Data Field's extractor
:** A Fill Method (like AI Extract)
:** A Lookup Specification (such as Database Lookup)
:** Manually changed/entered by a user in the Data Viewer
:* Validate Cell – This event checks each cell in a Data Table. This is triggered every time a Data Column cell's value changes. Including when changed by:
:** The Data Column's extractor or the Data Table's Extract Method if that is the primary mechanism the cell is first populated
:** A Fill Method (like AI Extract)
:** A Lookup Specification (such as Database Lookup)
:** Manually changed/entered by a user in the Data Viewer

Revision as of 17:04, 2 October 2025

Put very simply the Extract activity collects data according to a document's Data Model configuration.

But there are a lot of ways the Extract activity (and others) populate a Data Model (and its child Data Elements). There are several mechanisms that ultimately collect data in a Data Model. Furthermore there is a somewhat complicated set of logic that executes these mechanisms when more than one is used to fully populated Data Model.

This article seeks to document the different mechanisms that collect and populate data in a document's Data Model and the different orders of operation that can affect data population when multiple mechanisms interact with each other.

Data collection mechanisms

These are the different ways Data Elements collect values in a Data Model.

Extractors
Extractors collect data from a document's text (either OCR or native text obtained by the Recognize activity). Extractors are configured on a Data Model's child Data Elements. These include:
  • Data Field Value Extractors
  • Data Section Extract Methods
  • Data Table Extract Methods
  • Data Column Value Extractors
  • Data Fields and Data Columns are the only Data Elements that actually hold values. Data Section and Data Table extractors execute to logically divide the document such that their child Data Fields and Data Columns can ultimately be populated appropriately.
Expressions
Expressions populate data in Data Fields or Data Columns in one of two primary ways (1) using system data, environment data, or metadata associated with the document or (2) calculated from the values of other Data Fields or Data Columns in the Data Model. This includes:
  • Default Value expressions
  • Calculated Value expressions
Fill Methods
Fill Methods are data population mechanisms that occur after a Data Model's extractors run. They are set on on or more "Data Containers" (Data Models, Data Sections and Data Tables). The most common Fill Method is AI Extract.
Lookup Specifications
Lookup Specifications use one or more values collected in a Data Model to populate other Data Fields or Data Columns using data stored in an external source, such as a database or a response from a web service. They are set on on or more "Data Containers" (Data Models, Data Sections and Data Tables). The most common Lookup Specifications are Database Lookup and Web Service Lookup.
Data Rules
Data Rules a node type in Grooper that are used to normalize and manipulate data extracted by a Data Model. Each data rule defines a "Data Action", which performs a specialized normalization operation.
Common Data Actions include:
  • "Calculate Value" which calculates a Data Field or Data Column's value using an expression
  • "Copy" which copies or moves data from one Data Element to another
  • "Parse Value" which parses a Data Field or Data Column value using a regular expression and assign them to sibling Data Fields/Data Columns.
Data Rules can apply conditional logic using "Trigger" expressions and can have children Data Rules, allowing for complex custom execution flows.
Data Rules in one of the following ways:
  • By an Extract activity's "Data Rules" configuration.
  • By a Data Container's "Validation Rule" configuration.
  • By the Apply Rules activity (must run after data is extracted/collected).
  • It is generally regarded as best practice to execute Data Rules with the Apply Rules activity. This cuts down on the confusing order of operations logic detailed in this article.
Human intervention (Review)
When data collection cannot be fully automated, it is up to a user to correctly enter values in the Data Model. Users intervene in Review steps in a Batch Process. They use the "Data Viewer" to validate Extract's results and manually input values into Data Fields and Data Column cells.

Data Collection Events

It's not like data is collected willy nilly. Each of the various mechanisms described above happen in a logical sequence. However, this sequence can become complicated as some of these mechanisms can "fire" at multiple times when the Extract activity runs.

Understand there are four "events" that happen when a Data Model is extracted:

  • Extraction events
  • Validation events
  • Lookup events
  • Fill events
Extraction events
This event typically occurs once, at the start of the Extract activity. The extraction event executes all the Data Model's extractors. Extractors execute in sequence, starting with the first child Data Element going down.
  • Default Values are calculated during the extraction event.
Validation events
Validation events are automatic checks that Grooper performs after extracting data. They help confirm that the information in each Data Field and Data Column cell is correct and complete. Validation events flag Data Field and Data Column cells for further review.
  • Calculated Value expressions are recalculated during validation events.
There are three types of validation events:
  • Validate All – This event checks the entire Data Model at once.
    • When a Data Rule is executed by a Data Container's "Validation Rule" configuration, it is only executed during the Validate All event, not by the Validate Field or Validate Cell events.
  • Validate Field – This event checks each individual Data Field. This is triggered every time a Data Field's value changes. Including when changed by:
    • The Data Field's extractor
    • A Fill Method (like AI Extract)
    • A Lookup Specification (such as Database Lookup)
    • Manually changed/entered by a user in the Data Viewer
  • Validate Cell – This event checks each cell in a Data Table. This is triggered every time a Data Column cell's value changes. Including when changed by:
    • The Data Column's extractor or the Data Table's Extract Method if that is the primary mechanism the cell is first populated
    • A Fill Method (like AI Extract)
    • A Lookup Specification (such as Database Lookup)
    • Manually changed/entered by a user in the Data Viewer