Data Collection Order of Operations

From Grooper Wiki

Put very simply the Extract activity collects data according to a document's Data Model configuration.

But there are a lot of ways the Extract activity (and others) populate a Data Model (and its child Data Elements). There are several mechanisms that ultimately collect data in a Data Model. Furthermore there is a somewhat complicated set of logic that executes these mechanisms when more than one is used to fully populated Data Model.

This article seeks to document the different mechanisms that collect and populate data in a document's Data Model and the different orders of operation that can affect data population when multiple mechanisms interact with each other.

Data collection mechanisms

These are the different ways Data Elements collect values in a Data Model.

Extractors
Extractors collect data from a document's text (either OCR or native text obtained by the Recognize activity). Extractors are configured on a Data Model's child Data Elements. These include:
  • Data Field Value Extractors
  • Data Section Extract Methods
  • Data Table Extract Methods
  • Data Column Value Extractors
  • Data Fields and Data Columns are the only Data Elements that actually hold values. Data Section and Data Table extractors execute to logically divide the document such that their child Data Fields and Data Columns can ultimately be populated appropriately.
Expressions
Expressions populate data in Data Fields or Data Columns in one of two primary ways (1) using system data, environment data, or metadata associated with the document or (2) calculated from the values of other Data Fields or Data Columns in the Data Model. This includes:
  • Default Value expressions
  • Calculated Value expressions
Fill Methods
Fill Methods are data population mechanisms that occur after a Data Model's extractors run. They are set on on or more "Data Containers" (Data Models, Data Sections and Data Tables). The most common Fill Method is AI Extract.
Lookup Specifications
Lookup Specifications use one or more values collected in a Data Model to populate other Data Fields or Data Columns using data stored in an external source, such as a database or a response from a web service. They are set on on or more "Data Containers" (Data Models, Data Sections and Data Tables). The most common Lookup Specifications are Database Lookup and Web Service Lookup.
Data Rules
A "Data Rule" a node type in Grooper that are used to normalize and manipulate data extracted by a Data Model. Each data rule defines a "Data Action", which performs a specialized normalization operation.
Common Data Actions include:
  • "Calculate Value" which calculates a Data Field or Data Column's value using an expression
  • "Copy" which copies or moves data from one Data Element to another
  • "Parse Value" which parses a Data Field or Data Column value using a regular expression and assign them to sibling Data Fields/Data Columns.
Data Rules can apply conditional logic using "Trigger" expressions and can have children Data Rules, allowing for complex custom execution flows.
Data Rules in one of the following ways:
  • By an Extract activity's "Data Rules" configuration.
  • By a Data Container's "Validation Rule" configuration.
  • By the Apply Rules activity (must run after data is extracted/collected).
  • It is generally regarded as best practice to execute Data Rules with the Apply Rules activity. This cuts down on the confusing order of operations logic detailed in this article.
Human intervention (Review)
When data collection cannot be fully automated, it is up to a user to correctly enter values in the Data Model. Users intervene in Review steps in a Batch Process. They use the "Data Viewer" to validate Extract's results and manually input values into Data Fields and Data Column cells.

Data collection events

It's not like data is collected willy nilly. Each of the various mechanisms described above happen in a logical sequence. However, this sequence can become complicated as some of these mechanisms can "fire" at multiple times when the Extract activity runs.

Understand there are four "events" that happen when a Data Model is extracted:

  • Extraction events
  • Validation events
  • Lookup events
  • Fill events
Extraction events
This event typically occurs once, at the start of the Extract activity. The extraction event executes all the Data Model's extractors. Extractors execute in sequence, starting with the first child Data Element going down.
  • Default Value expressions are calculated before the extraction event, when the Data Model is initialized.
  • The only other way extraction events take place besides at the start of Extract is using the "Run Child Extractors" Fill Method. Run Child Extractors allows users to selectively choose what Data Element extractors are executed. This would take place during a fill event.
Validation events
Validation events are automatic checks that Grooper performs after extracting data. They help confirm that the information in each Data Field and Data Column cell is correct and complete. Validation events flag Data Field and Data Column cells for further review.
  • Calculated Value expressions are recalculated during validation events.
There are three types of validation events:
  • Validate All – This event checks the entire Data Model at once. The Validate All event executes:
    • At the very end of the Extract activity
    • Manually when a user executes the "Validate All" command (F9) in a Review tasks from the the Data Viewer
  • Validate Field – This event checks each individual Data Field. This is triggered every time a Data Field's value changes. Including when changed by:
    • The Data Field's extractor
    • A Fill Method (like AI Extract)
    • A Lookup Specification (such as Database Lookup)
    • A Data Rule specified by a Data Container's "Validate Rule" configuration.
    • Manually changed/entered by a user in the Data Viewer
  • Validate Cell – This event checks each cell in a Data Table. This is triggered every time a Data Column cell's value changes. Including when changed by:
    • The Data Column's extractor or the Data Table's Extract Method if that is the primary mechanism the cell is first populated
    • A Fill Method (like AI Extract)
    • A Lookup Specification (such as Database Lookup)
    • A Data Rule specified by a Data Container's "Validate Rule" configuration.
    • Manually changed/entered by a user in the Data Viewer
Fill events
Fill events execute Fill Methods, such as AI Extract. Fill Methods are intended to be secondary extraction operations, occurring after the primary extraction event. However, if no extractors are configured, the Fill Method acts as the de facto data collection method.
A Fill Method can be conditionally executed during the fill event using its "Trigger" property This lets users set a Boolean code expression to determine if the Fill Method runs.
Lookup events
Lookup events execute Lookup Specifications to enrich document extracted data with data from external systems, like a database. Lookup Specifications can be used to populate Data Fields and Data Column cells with external data or validate Data Fields and Data Column cells matches external data.
Lookup events occur at the end of the Extract activity and during document review in the Data Viewer, based on its "Trigger Mode" configuration. If and how a Lookup Specification runs is determined by its "Trigger Mode" configuration.
  • Auto - The lookup occurs every time the lookup field's value changes. Target fields are always updated.
  • Conditional - The lookup occurs when the lookup field's value changes only if the target fields are empty.
  • Custom - A code expression is used to determine if the lookup occurs. If the expression evaluates to true, the lookup occurs. If it evaluates to false, it does not.
  • Manual - The lookup must be initiated by a user in the Data Viewer using the "Execute Lookup" (Alt + L) command in the lookup field. Manual lookups will never occur during Extract.

Extract order of operations

All of the data collection events above execute during the Extract activity. However, they follow a set order of operations. Understanding this order of operations is often crucial to troubleshooting issues when multiple collection mechanisms are used to fully extract a document.

To help keep things brief Data Fields and Data Column cells will just be called "fields" here.

  1. Field initialization
    • The document's Data Model is formed from the Content Type's Data Model and Data Elements from any parent Content Types.
    • Fields populated by Default Values are collected. Default values may be overwritten by any subsequent event.
  2. Extraction phase
    • Fields populated by extractors are collected.
    • Validation events occur during the extraction phase every time a field's value changes.
      • Fields populated by Calculated Value expressions are collected.
      • Fields populated by Data Rules using a "Validate Rule" are collected.
  3. Fill phase
    • Fields populated by Fill Methods (AI Extract) are collected.
    • Validation events occur during the fill phase every time a field's value changes.
      • Fields populated by Calculated Value expressions are collected.
      • Fields populated by Data Rules using a "Validate Rule" are collected.
  4. Lookup phase
    • Fields populated by Lookup Specifications are collected.
    • Each Lookup Specification executes in order from first to last, according to their "Trigger Mode". Manual lookups will never execute during the lookup phase.
    • Be aware if a lookup field is empty (null or blank), the Lookup Specification will not attempt the lookup. Before executing a lookup, Grooper checks the values of all lookup fields. If any are missing, the lookup is skipped. Furthermore, no Miss Disposition is applied when the lookup is skipped.
    • Validation events occur during the lookup phase every time a field's value changes.
      • Fields populated by Calculated Value expressions are collected.
      • Fields populated by Data Rules using a "Validate Rule" are collected.

Data collection during Review

There are five ways a Data Field or Data Column cell is collected during a Review task:

  1. A user manually enters it
  2. A Calculated Value expression calculates it
  3. A "Validate Rule" populates it
  4. A Lookup Specification populates it
  5. A user manually re-runs Extract on the entire Data Model

Calculated Values and Validate Rules

Fields are populated by Calculated Value expressions and Data Rules using "Validate Rule" configurations during validation events.

  • Data Field values are validated by the "Validate Field" event whenever the field's value changes.
  • Data Column cell values are validated by the "Validate Cell" event whenever the cell's value changes.

Lookup Specifications

Lookup Specifications occur in real-time as users enter data according to their "Trigger Mode" configuration.