Data Collection Order of Operations
Put very simply the Extract activity collects data according to a document's Data Model configuration.
But there are a lot of ways the Extract activity and a Data Model (and its child Data Elements) collect and populate data. This includes:
- Extractors
- Extractors collect data from a document's text (either OCR or native text obtained by the Recognize activity). Extractors are configured on a Data Model's child Data Elements. These include:
- Data Field Value Extractors
- Data Section Extract Methods
- Data Table Extract Methods
- Data Column Value Extractors
- Data Fields and Data Columns are the only Data Elements that actually hold values. Data Section and Data Table extractors execute to logically divide the document such that their child Data Fields and Data Columns can ultimately be populated appropriately.
- Expressions
- Expressions populate data in Data Fields or Data Columns in one of two primary ways (1) using system data, environment data, or metadata associated with the document or (2) calculated from the values of other Data Fields or Data Columns in the Data Model. This includes:
- Default Value expressions
- Calculated Value expressions
- Fill Methods
- Fill Methods are data population mechanisms that occur after a Data Model's extractors run. They are set on on or more "Data Containers" (Data Models, Data Sections and Data Tables). The most common Fill Method is AI Extract.
- Lookup Specifications
- Lookup Specifications use one or more values collected in a Data Model to populate other Data Fields or Data Columns using data stored in an external source, such as a database or a response from a web service. They are set on on or more "Data Containers" (Data Models, Data Sections and Data Tables). The most common Lookup Specifications are Database Lookup and Web Service Lookup.
- Data Rules
- Data Rules a node type in Grooper that are used to normalize and manipulate data extracted by a Data Model. Each data rule defines a "Data Action", which performs a specialized normalization operation.
- Common Data Actions include:
- "Calculate Value" which calculates a Data Field or Data Column's value using an expression
- "Copy" which copies or moves data from one Data Element to another
- "Parse Value" which parses a Data Field or Data Column value using a regular expression and assign them to sibling Data Fields/Data Columns.
- Data Rules can apply conditional logic using "Trigger" expressions and can have children Data Rules, allowing for complex custom execution flows.
- Data Rules in one of the following ways:
- By an Extract activity's "Data Rules" configuration.
- By a Data Container's "Validation Rule" configuration.
- By the Apply Rules activity (must run after data is extracted/collected).
- It is generally regarded as best practice to execute Data Rules with the Apply Rules activity. This cuts down on the confusing order of operations logic detailed in this article.
- Human intervention (Review)
- When data collection cannot be fully automated, it is up to a user to correctly enter values in the Data Model. Users intervene in Review steps in a Batch Process. They use the "Data Viewer" to validate Extract's results and manually input values into Data Fields and Data Column cells.