Data Model (Object): Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
{{stubs}} | {{stubs}} | ||
Revision as of 09:16, 28 August 2024
STUB |
This article is a stub. It contains minimal information on the topic and should be expanded. |
data_table Data Model node objects serve as the top-tier structure defining the taxonomy for Data Elements and are leveraged during the Extract Activity to extract data from a folder Batch Folders. They are a hierarchy of Data Elements that sets the stage for the extraction logic and review of data collected from documents.
About
The Data Model defines the data structure for a Content Type and can live at varying levels of structure, allowing for inheritance if a hierarchy exists. This can be a simple list of data fields or a complex hierarchy of sections, subsections, tables and fields.
The Data Model is leveraged by Grooper to extract data from a Batch. All extraction logic (i.e. referencing a Data Extractor to fill a field, performing a database lookup, or generating a calculated field expression) is set on the Data Model or the Data Elements related to the Data Model. It also provides information to the Data Review activity setting expectations for field appearance and behavior (i.e. if a field is required before completing batch validation).
One Data Model can be created for each:
Data Models also inherit data elements from parent Content Types. For example, if a Content Model's Data Model has a child Data Field named "Date" and a Content Category's Data Model has a child Data Field named "Time", the Content Category's Data Model will actually have both "Date" and "Time" as fields. It has it's child field "Time" and inherits the parent field "Date" as well. See below for a typical hierarchical structure exemplifying such:
- Content Model - HR
- Data Model - HR
- Data fields such as: First Name, Middle Name, Last Name, Employment Status, Status Date
- Content Category - Benefits
- Data Model - Benefits (Inherits all data from the Content Model's primary Data Model as well extracting its own data such as...)
- Data Fields: Eligible Date
- Document Type - Health Insurance
- Data Model - Health Insurance (Inherits all data from the Content Model and parent Content Category as well as extracting its own data such as...)
- Data Fields: Enrolled Date, Covered Parties
- Data Model - Health Insurance (Inherits all data from the Content Model and parent Content Category as well as extracting its own data such as...)
- Data Model - Benefits (Inherits all data from the Content Model's primary Data Model as well extracting its own data such as...)
- Data Model - HR
So, a document classified as a "Health Insurance" Document Type would have eight total Data Fields: Two from its own Data Model (Enrolled Date and Covered Parties), One from its parent Content Category's (named "Benefits") Data Model (Eligible Date), and five from the Content Model's Data Model (First Name, Middle Name, Last Name, Employment Status, Status Date).
Data context can be critical to build the Data Type and Field Class extractors to populate a Data Model. For more information on this topic, visit the Data Context article.
Glossary
Activity: Activity is a property on edit_document Batch Process Steps. Activities define specific document processing operations done to a inventory_2 Batch, folder Batch Folder, or contract Batch Page. Batch Process Steps configured with specific Activities are frequently referred by the name of the Activity followed by the word "step". For example: Classify step.
Batch Folder: folder Batch Folder objects are defined as container objects within a inventory_2 Batch that are used to represent and organize both folders and pages. They can hold other Batch Folders or contract Batch Page objects as children. The Batch Folder acts as an organizational unit within a Batch, allowing for a structured approach to managing and processing a collection of documents.
- Batch Folders are frequently referred to simply as "documents".
Batch: inventory_2 Batch objects are fundamental in Grooper's architecture as they are the containers of documents that get moved through Grooper's workflow mechanisms known as settings Batch Processes.
Content Category: collections_bookmark Content Category node objects are containers within a stacks Content Model that hold other Content Categories and description Document Type objects. They allow for further classification and grouping of Document Types within a taxonomy, aiding in the logical structuring of complex document sets. Besides grouping Document Types together, Content Categories also serve to create new branches in a Data Element hierarchy. In most cases Content Categories are used as organizational buckets to group like Document Types together.
Content Model: stacks Content Model node objects define the taxonomy of document sets in terms of the description Document Type they contain. They also house the Data Elements that appear on each collections_bookmark Content Category and Document Type within them. Content Models serve as the root of a Content Type hierarchy and are crucial for organizing the different types of documents that Grooper can recognize and process.
Content Type: Content Type refers to objects in Grooper used to classify folder Batch Folders. These include: stacks Content Models, collections_bookmark Content Categories, and description Document Types.
Data Context: Data Context refers to contextual information used to extract data, such as a label that identifies the value you want to collect.
Data Element: Data Element refers to the objects in Grooper used to collect data from a document. These include: data_table Data Models, insert_page_break Data Sections, variables Data Fields, table Data Tables, and view_column Data Columns.
Data Extractor: Data Extractor (or just "extractor") refers to all Extractor Types and extractor node objects. Extractors define the logic used to return data from a document's text content, including general data (such as a date) and specific data (such as an agreement date on a contract).
Data Field: variables Data Field node objects are created as child objects of a data_table Data Model. A Data Field is a representation of a single piece of data targeted for extraction on a document.
Data Fields are frequently referred to simply as "fields".
Data Model: data_table Data Model node objects serve as the top-tier structure defining the taxonomy for Data Elements and are leveraged during the Extract Activity to extract data from a folder Batch Folders. They are a hierarchy of Data Elements that sets the stage for the extraction logic and review of data collected from documents.
Data Type: pin Data Type objects hold a collection of child, referenced, and locally defined Data Extractors and settings that manage how multiple (even differing) matches from Data Extractors are consolidated (via Collation) into a result set.
Document Type: description Document Type objects represent a distinct type of document, like an invoice or contract. Document Types are created as children of a stacks Content Model or a collections_bookmark Content Category and are used to classify individual folder Batch Folders. Each Document Type in the hierarchy defines the Data Elements and Behaviors that apply to Batch Folders of that specific classification.
Extract: export_notes Extract is an Activity that retrieves information from folder Batch Folder documents, as defined by Data Elements in a data_table Data Model. This is how Grooper locates unstructured data on your documents and collects it in a structured, usable format.
Field Class: input Field Class node objects are used to find values based on some natural language context near that value. Values are positively or negatively associated with text-based "features" nearby by training the extractor. During extraction, the extractor collects values based on these training weightings.
- Field Classes are most useful when attempting to find values within the flow of natural language.
- Field Classes can be configured to distinguish values within highly structured documents, but this type of extraction is better suited to simpler "Extractor Objects" like quick_reference_all Value Readers or pin Data Types.
Review: person_search Review is an Activity that allows user attended review of Grooper's results. This allows human operators to validate processed contract Batch Page and folder Batch Folder content using specialized user interfaces called "Viewers". Different kinds of Viewers assist users in reviewing Grooper's image processing, document classification, data extraction and operating document scanners.