Object Nomenclature (Concept): Difference between revisions

From Grooper Wiki
// via Wikitext Extension for VSCode
Line 10: Line 10:


This approach fosters a more holistic understanding of the data ecosystem within '''Grooper''', empowering users to devise more effective strategies for data extraction, classification, and interpretation. By recognizing the underlying functional relationships between objects, users can optimize workflows, improve accuracy, and derive deeper insights from their data.
This approach fosters a more holistic understanding of the data ecosystem within '''Grooper''', empowering users to devise more effective strategies for data extraction, classification, and interpretation. By recognizing the underlying functional relationships between objects, users can optimize workflows, improve accuracy, and derive deeper insights from their data.
* [[#Batch Objects|Batch Objects]]
** [[#Batch|Batch]]
** [[#Batch Folder|Batch Folder]]
** [[#Batch Page|Batch Page]]
* [[#Content Type Objects|Content Type Objects]]
** [[#Content Model|Content Model]]
** [[#Content Category|Content Category]]
** [[#Document Type|Document Type]]
* [[#Data Element Objects|Data Element Objects]]
** [[#Data Model|Data Model]]
** [[#Data Field|Data Field]]
** [[#Data Section|Data Section]]
** [[#Data Table|Data Table]]
** [[#Data Column|Data Column]]
* [[#Extractor Objects|Extractor Objects]]
** [[#Value Reader|Value Reader]]
** [[#Data Type|Data Type]]
** [[#Field Class|Field Class]]
* [[#Connection Objects|Connection Objects]]
** [[#CMIS Connection|CMIS Connection]]
** [[#CMIS Repository|CMIS Repository]]
** [[#Data Connection|Data Connection]]
* [[#Profile Objects|Profile Objects]]
** [[#IP Profile|IP Profile]]
** [[#IP Group|IP Group]]
** [[#IP Step|IP Step]]
** [[#OCR Profile|OCR Profile]]
** [[#Scanner Profile|Scanner Profile]]
** [[#Separation Profile|Separation Profile]]
* [[#Queue Objects|Queue Objects]]
** [[#Processing Queue|Processing Queue]]
** [[#Review Queue|Review Queue]]
* [[#Process Objects|Process Objects]]
** [[#Batch Process|Batch Process]]
** [[#Batch Process Step|Batch Process Step]]
* [[#Architecture Objects|Architecture Objects]]
** [[#Root|Root]]
** [[#Project|Project]]
** [[#Filestore|Filestore]]
** [[#Machine|Machine]]
* [[#Miscellaneous Objects|Miscellaneous Objects]]
** [[#Lexicon|Lexicon]]
** [[#Data Rule|Data Rule]]
** [[#AI Analyst|AI Analyst]]
** [[#Resource File|Resource File]]
** [[#Object Library|Object Library]]
** [[#Control Sheet|Control Sheet]]


== Batch Objects ==
== Batch Objects ==
Line 66: Line 18:
... each serving a distinct function within this hierarchy but also being fundamentally related.
... each serving a distinct function within this hierarchy but also being fundamentally related.


The relationship between these objects is hierarchical in nature. The '''Batch''' object is the top level. It contains:
The relationship between these objects is hierarchical in nature. The '''Batch''' object is the top level. It contains:
* '''Batch Folders''' and ...
* '''Batch Folders''' and ...
* '''Batch Pages'''
* '''Batch Pages'''
Line 80: Line 32:
==== Batch Page ====
==== Batch Page ====
A '''[[Batch Page]]''' object in '''Grooper''' represents an individual page within a '''Batch'''. The '''Batch Page''' object is the most granular unit in the hierarchy of "Batch Objects" in '''Grooper'''. It is created in one of two ways:  
A '''[[Batch Page]]''' object in '''Grooper''' represents an individual page within a '''Batch'''. The '''Batch Page''' object is the most granular unit in the hierarchy of "Batch Objects" in '''Grooper'''. It is created in one of two ways:  
* Physical pages can be acquired in '''Grooper''' by scanning them via the '''[[Desktop Scanning in Grooper|Grooper Desktop]]''' application.
* Physical pages can be acquired in '''Grooper''' by scanning them via the '''[[Desktop Scanning in Grooper|Grooper Desktop]]''' application.  
* Digital documents are acquired in '''Grooper''' as whole objects and represented as '''Batch Folders'''. Applying the [[Split Pages]] activity on a '''Batch Folder''' that represents a digital document will expose '''Batch Page''' objects as direct children.
* Digital documents are acquired in '''Grooper''' as whole objects and represented as '''Batch Folders'''. Applying the [[Split Pages]] activity on a '''Batch Folder''' that represents a digital document will expose '''Batch Page''' objects as direct children.
'''Batch Pages''' allow '''Grooper''' to process and store information at the page level, which is essential for operations that include [[Image Processing]] and recognition of text (see [[Recognize (Activity)|Recognize]]). They enable the system to manage and process each page independently. This is critical for workflows that require detailed page-specific actions or for '''Batches''' composed of documents with different processing requirements per page.
'''Batch Pages''' allow '''Grooper''' to process and store information at the page level, which is essential for operations that include [[Image Processing]] and recognition of text (see [[Recognize (Activity)|Recognize]]). They enable the system to manage and process each page independently. This is critical for workflows that require detailed page-specific actions or for '''Batches''' composed of documents with different processing requirements per page.
Line 105: Line 57:


==== Content Category ====
==== Content Category ====
A '''[[Content Category]]''' is a container within a '''Content Model''' that holds other '''Content Categories''' and '''Document Type''' objects. It allows for further classification and grouping of '''Document Types''' within a taxonomy, aiding in the logical structuring of complex document sets. Besides grouping '''Document Types''' together, '''Content Categories''' also serve to create new branches in a [[#Data Element Objects|Data Element]] hierarchy.
A '''[[Content Category]]''' is a container within a '''Content Model''' that holds other '''Content Categories''' and '''Document Type''' objects. It allows for further classification and grouping of '''Document Types''' within a taxonomy, aiding in the logical structuring of complex document sets. Besides grouping '''Document Types''' together, '''Content Categories''' also serve to create new branches in a [[#Data Element Objects|Data Element]] hierarchy.


==== Document Type ====
==== Document Type ====
Line 118: Line 70:
Each of these objects has its own function within the data capture and organization framework. These objects are, however, all interconnected within '''Grooper's''' data extraction architecture.
Each of these objects has its own function within the data capture and organization framework. These objects are, however, all interconnected within '''Grooper's''' data extraction architecture.


The relationship between these "Data Element Objects" is hierarchical and modular. The '''Data Model''' acts as the overall blueprint for data extraction. '''Data Sections''' structure the document into logical parts. '''Data Tables''' are incorporated into the model to handle tabular data. Each '''Data Table''' comprises '''Data Columns''' which specify the format and rules for columnar data extraction. Finally, '''Data Fields''' are the fundamental units of data of any kind representing individual pieces of non-repeated data within a document. The exception to this is when '''Data Fields''' are contained within a '''Data Section''' that occurs repeatedly within a document. The Data Model ties these elements together, dictating the inheritance of properties and the flow of data extraction processes
The relationship between these "Data Element Objects" is hierarchical and modular.  
 
* The '''Data Model''' acts as the overall blueprint for data extraction.
* '''Data Sections''' structure the document into logical parts. '''Data Sections''' can also serve as simple organizational objects within a '''Data Model''' to ''bucket'' similar Data Elements together.
* '''Data Tables''' are incorporated into the model to handle tabular data. Each '''Data Table''' comprises '''Data Columns''' which specify the format and rules for columnar data extraction.
* Finally, '''Data Fields''' are the fundamental units of data of any kind representing individual pieces of non-repeated data within a document. The exception to this is when '''Data Fields''' are contained within a '''Data Section''' that occurs repeatedly within a document.
 
The Data Model ties these elements together, dictating the inheritance of properties and the flow of data extraction processes


=== Related Objects ===
=== Related Objects ===
Line 143: Line 102:
All three of these objects perform a similar function. They are objects that are configured to return data from documents. However, they differ in their configuration and data extraction purpose.
All three of these objects perform a similar function. They are objects that are configured to return data from documents. However, they differ in their configuration and data extraction purpose.


Extractor Objects are tools to extract data. Ultimately, Data Elements are what collects data. They may ''use'' extractor objects to help collect data in a Data Model.
Extractor Objects are tools to extract data. Ultimately, Data Elements are what collects data. They may ''use'' extractor objects to help collect data in a Data Model.


To that end, extractor objects serve three purposes:
To that end, extractor objects serve three purposes:
Line 169: Line 128:
=== Related Objects ===
=== Related Objects ===
==== Value Reader ====
==== Value Reader ====
A '''Value Reader''' defines a single extraction operation. You set the type of extractor on the '''Value Reader''' that matches the specific data you're aiming to capture. For example, you would use the Pattern-Match extractor type to return data using regular expression. You would use a Value Reader when you need to extract a single result or list of simple results from a document.
A '''Value Reader''' defines a single extraction operation. You set the type of extractor on the '''Value Reader''' that matches the specific data you're aiming to capture. For example, you would use the Pattern-Match extractor type to return data using regular expression. You would use a Value Reader when you need to extract a single result or list of simple results from a document.


==== Data Type ====
==== Data Type ====
Line 178: Line 137:
The simplest type of collation (Individual collation) would just return all individual extractors' results as a list of results.
The simplest type of collation (Individual collation) would just return all individual extractors' results as a list of results.


'''Data Types''' are also used for recognizing complex 2D data structures, like address blocks or table rows. Different collation methods would be used in these cases to combine results in different ways.
'''Data Types''' are also used for recognizing complex 2D data structures, like address blocks or table rows. Different collation methods would be used in these cases to combine results in different ways.


==== Field Class ====
==== Field Class ====

Revision as of 11:40, 14 March 2024

A Grooper environment consists of many interrelated objects.

Mastery of a Grooper environment is greately enhanced by understanding the myriad of objects that can exist and how they are related.

About

In Grooper, understanding the objects within the platform involves recognizing how various elements can serve similar functions and therefore be grouped together based on their shared functionalities. This concept stems from the recognition that disparate objects often perform analogous tasks, albeit with differing characteristics or representations.

By discerning commonalities in functionality across diverse objects, users can streamline their approach to data processing and analysis within Grooper. Rather than treating each object in isolation, users can categorize them based on their functional similarities, thus simplifying management and enhancing efficiency.

This approach fosters a more holistic understanding of the data ecosystem within Grooper, empowering users to devise more effective strategies for data extraction, classification, and interpretation. By recognizing the underlying functional relationships between objects, users can optimize workflows, improve accuracy, and derive deeper insights from their data.

Batch Objects

In Grooper, "Batch Objects" represent the hierarchical structure of documents being processed and consist of:

  • Batch ...
  • Batch Folder and ...
  • Batch Page objects ...

... each serving a distinct function within this hierarchy but also being fundamentally related.

The relationship between these objects is hierarchical in nature. The Batch object is the top level. It contains:

  • Batch Folders and ...
  • Batch Pages

Batch Folders may contain either further Batch Folders (to represent subfolders or grouped documents) or Batch Pages (to represent individual pages of documents). This structured approach allows Grooper to efficiently manage and process documents at various levels of granularity — from a full batch down to individual pages.

Related Objects

Batch

The Batch object is a fundamental construct in Grooper's architecture as it encompasses the documents that are grouped together to be processed through Grooper's workflow mechanisms, following the steps dictated by the related Batch Process.

Batch Folder

A Batch Folder in Grooper is defined as a container object within a Batch that is used to represent and organize both folders and documents. It can hold other Batch Folders or Batch Page objects as children. The Batch Folder acts as an organizational unit within a Batch, allowing for a structured approach to managing and processing a collection of documents.

Batch Page

A Batch Page object in Grooper represents an individual page within a Batch. The Batch Page object is the most granular unit in the hierarchy of "Batch Objects" in Grooper. It is created in one of two ways:

  • Physical pages can be acquired in Grooper by scanning them via the Grooper Desktop application.
  • Digital documents are acquired in Grooper as whole objects and represented as Batch Folders. Applying the Split Pages activity on a Batch Folder that represents a digital document will expose Batch Page objects as direct children.

Batch Pages allow Grooper to process and store information at the page level, which is essential for operations that include Image Processing and recognition of text (see Recognize). They enable the system to manage and process each page independently. This is critical for workflows that require detailed page-specific actions or for Batches composed of documents with different processing requirements per page.

Content Type Objects

In Grooper, the "Content Type Objects" consist of:

  • Content Model ...
  • Content Category and ...
  • Document Type objects.

Each of these objects serves a distinct function within Grooper's content classification and are related to each other through hierarchical relationships.

The relationship between these objects is established through a heirarchical inheritance system. Content Categories and Document Types are building blocks within a Content Model seen as the "tree". Content Categories act as the "branches". Document Types are the "leaves" of the hierarchy.

Data Elements can be defined on each Content Type object and are inherited down the "tree" of heirachy.

  • Data Elements defined at the Content Model level are applied to all Content Types within the Content Model.
  • Data Elements defined at the Content Category level are applied to all Content Types that exist within that specific "branch".
  • Data Elements defined on a Document Type will apply to that specific "leaf".

These "Content Type Objects" work together in Grooper to enable sophisticated document processing workflows. With different types of documents properly classified, they can have their data extracted and be handled according to the rules and behaviors defined by their respective Document Types within a Content Model hierarchy.

Related Objects

Content Model

A Content Model defines the taxonomy of document sets in terms of the Document Types it contains. It also defines the Data Elements that appear on each Content Category and Document Type. Content Models serve as the root of a Content Type hierarchy and are crucial for organizing the different types of documents that Grooper can recognize and process.

Content Category

A Content Category is a container within a Content Model that holds other Content Categories and Document Type objects. It allows for further classification and grouping of Document Types within a taxonomy, aiding in the logical structuring of complex document sets. Besides grouping Document Types together, Content Categories also serve to create new branches in a Data Element hierarchy.

Document Type

A Document Type represents a distinct type of document, like an invoice or contract. Document Types are created as children of a Content Model or a Content Category and are used to classify individual documents. Each Document Type in the hierarchy defines the Data Elements and Behaviors that apply to documents of that specific classification.

Data Element Objects

The "Data Element Objects" within Grooper consist of:

  • Data Field ...
  • Data Section ...
  • Data Table and ...
  • Data Column objects.

Each of these objects has its own function within the data capture and organization framework. These objects are, however, all interconnected within Grooper's data extraction architecture.

The relationship between these "Data Element Objects" is hierarchical and modular.

  • The Data Model acts as the overall blueprint for data extraction.
  • Data Sections structure the document into logical parts. Data Sections can also serve as simple organizational objects within a Data Model to bucket similar Data Elements together.
  • Data Tables are incorporated into the model to handle tabular data. Each Data Table comprises Data Columns which specify the format and rules for columnar data extraction.
  • Finally, Data Fields are the fundamental units of data of any kind representing individual pieces of non-repeated data within a document. The exception to this is when Data Fields are contained within a Data Section that occurs repeatedly within a document.

The Data Model ties these elements together, dictating the inheritance of properties and the flow of data extraction processes

Related Objects

Data Model

A Data Model serves as the top-tier structure defining the taxonomy for data elements and is leveraged during the Extract activity to extract data from a Batch. It is a hierarchy of Data Elements that sets the stage for the organization, extraction logic, and review behavior of data collected from documents.

Data Field

A Data Field is designed to capture a single piece of information from a document, such as a name or date, which is a fundamental data point required from the content.

Data Section

A Data Section is a grouping mechanism for related Data Fields. Data Sections organize and segment them into logical divisions of a document based on the structure and semantics of the information the documents contain.

Data Table

A Data Table is utilized for extracting repeating data that's formatted in rows and columns, allowing for complex multi-instance data organization that would be present in table-formatted content.

Data Column

A Data Column is a child object of a Data Table, representing individual columns and defining the type of data each column holds along with its data extraction properties.

Extractor Objects

There are three types of "Extractor Objects" in Grooper:

  • Value Reader
  • Data Type
  • Field Class

All three of these objects perform a similar function. They are objects that are configured to return data from documents. However, they differ in their configuration and data extraction purpose.

Extractor Objects are tools to extract data. Ultimately, Data Elements are what collects data. They may use extractor objects to help collect data in a Data Model.

To that end, extractor objects serve three purposes:

  1. To be re-usable units of extraction
  2. To collate data.
  3. To leverage machine learning algorithms to target data in the flow of text.

Re-Usability

"Extractor Objects" are meant to be referenced either by other "Extractor Objects", or more importantly, by Data Elements. For example, an individual Data Field can be configured on its own to collect a date value, such as the "Received Date" on an invoice. However, what if another Data Field is collectig a different date format, like the "Due Date" on the same invoice? In this case you would create one "Extractor Object", like a Value Reader, to collect any and all date formats. You could then have each Data Field reference that one Value Reader and further configure each individual Data Field to differentiate their specific date value.

Data Collation

Another example would be configuring a Data Type to target entire rows of information within a table of data. Several Value Reader "Extractor Objects" could be made as children of the Data Type, each targeting a specific value within the table row. The parent Data Type would then collate the results of its child Value Reader "Extractor Objects" into one result. A Data Table would then reference the Data Type to collect the appropriate rows of information.

Machine Learning

=== Extractor Objects vs Extracto "Extractor Objects" should not be confused with "Extractor Types". There are many places in Grooper where extraction logic can be applied for one purpose or another. In these cases an "Extractor Type" is chosen to define the logic required to return a desired value. In fact, the "Extractor Objects" themselves each leverage specific "Extractor Types" to define their logic.

  • For example, Pattern-Match uses regex to return results.
  • For example, Labeled OMR uses a regex and computer vision to return results for checkboxes.
  • Other Extractor Types may use a combination of Extractor Types that work together to return results in specific ways.

However, "Extractor Objects" are used when you need to reference them for their designated strengths (re-usbaility, collation, or machine learning).

Related Objects

Value Reader

A Value Reader defines a single extraction operation. You set the type of extractor on the Value Reader that matches the specific data you're aiming to capture. For example, you would use the Pattern-Match extractor type to return data using regular expression. You would use a Value Reader when you need to extract a single result or list of simple results from a document.

Data Type

A Data Type in Grooper holds a collection of extractors and settings that manage how multiple matches from extractors are consolidated into a result set.

For example, if you're extracting a date that could appear in multiple formats within a document, you'd use various child extractors (each capturing a different format). The Data Type would define how to collate those into a referenceable output.

The simplest type of collation (Individual collation) would just return all individual extractors' results as a list of results.

Data Types are also used for recognizing complex 2D data structures, like address blocks or table rows. Different collation methods would be used in these cases to combine results in different ways.

Field Class

Field Classes are trainable classifiers that distinguish between multiple instances of similar data within a document by understanding the context in which they occur. Field Classes can be configured to distinguish values within highly structured documents, but this type of extraction is better suited to a simpler "Extractor Objects" like a Value Readers or Data Types.

Field Classes are most useful when attempting to find values within the flow of natural language. This method involves training with positive and negative examples to distinguish the right context. You'd opt for a Field Class when the value you're after is an entire clause within a contract, or a specific value defined within the flow of text.

Connection Objects

Related Objects

CMIS Connection

CMIS Repository

Data Connection

Profile Objects

Related Objects

IP Profile

IP Group

IP Step

OCR Profile

Scanner Profile

Separation Profile

Queue Objects

Related Objects

Procssing Queue

Review Queue

Process Objects

In Grooper Batch Process and Batch Process Step objects are closely related in managing and executing a sequence of steps designed to process a collection of documents known as a Batch.

A Batch Process consists of a series of Batch Process Steps meant to be executed in a particular sequence for a batch of documents. Before a Batch Process can be used in production, it must be "published", which creates a read-only copy in the "Processes" folder of the node tree, making it accessible for production purposes.

In essence, a Batch Process defines the overall workflow for processing documents, but it relies on Batch Process Steps to perform each action required during the process. Each Batch Process Step represents a discrete operation, or "activity", within the broader scope of the Batch Process. Batches Processes and Batch Process Steps work together to ensure that documents are handled in a consistent and controlled manner.

Related Objects

Batch Process

A Batch Process is a crucial component in Grooper's architecture. A Batch Process orchestrates the document processing strategy and ensures each batch of documents is managed systematically and efficiently.

Batch Process Step

A Batch Process Step is a specific action within the sequence defined by a Batch Process. A Batch Procses Step plays a critical role in automating and managing the flow of documents through the various stages of processing within Grooper.

Architecture Objects

Related Objects

Root

Project

Filestore

Machine

Miscellaneous Objects

Related Objects

Lexicon

Data Rule

AI Analyst

Resource File

Object Library

Control Sheet