Object Nomenclature (Concept): Difference between revisions

From Grooper Wiki
Line 15: Line 15:
== Batch Objects ==
== Batch Objects ==
In '''Grooper''', "Batch Objects" represent the hierarchical structure of documents being processed.  These objects consist of:
In '''Grooper''', "Batch Objects" represent the hierarchical structure of documents being processed.  These objects consist of:
: [[image:GrooperIcon_Batch.png]] '''[[Batch]]es'''
: {{BatchIcon}} '''[[Batch]]es'''
: [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder]]s'''
: {{BatchFolderIcon}} '''[[Batch Folder]]s'''
: [[image:GrooperIcon_BatchPage.png]] '''[[Batch Page]]s'''
: {{BatchPageIcon}} '''[[Batch Page]]s'''
Each serves a distinct function within this hierarchy but are also fundamentally related to each other.
Each serves a distinct function within this hierarchy but are also fundamentally related to each other.



Revision as of 11:28, 28 August 2024

A Grooper environment consists of many interrelated objects.

The Grooper Wiki's Object Nomenclature defines how Grooper users categorize and refer to different types of Node Objects in a Grooper Repository. Knowing what objects can be added to the Grooper Node Tree and how they are related is a critical part of understanding Grooper itself.

About

In Grooper, understanding the objects within the platform involves recognizing how various elements can serve similar functions and therefore be grouped together based on their shared functionalities. This concept stems from the recognition that disparate objects often perform analogous tasks, albeit with differing characteristics or representations.

By discerning commonalities in functionality across diverse objects, users can streamline their approach to data processing and analysis within Grooper. Rather than treating each object in isolation, users can categorize them based on their functional similarities, thus simplifying management and enhancing efficiency.

This approach fosters a more holistic understanding of the data ecosystem within Grooper, empowering users to devise more effective strategies for data extraction, classification, and interpretation. By recognizing the underlying functional relationships between objects, users can optimize workflows, improve accuracy, and derive deeper insights from their data.

High Level Overview

This article is meant to be a high level overview of all the objects in Grooper and how they're related. If you need more specific information on a particular object, please click the hyperlink for that specific object (as listed in the category's "Related Objects" section) to be taken to an article giving more informatoin on that object.

Batch Objects

In Grooper, "Batch Objects" represent the hierarchical structure of documents being processed. These objects consist of:

inventory_2 Batches
folder Batch Folders
contract Batch Pages

Each serves a distinct function within this hierarchy but are also fundamentally related to each other.

The relationship between these objects is hierarchical in nature.

  • The Batch object is the top level
  • Each Batch contains: Batch Folders and Batch Pages
    • Batch Folders may contain further Batch Folders to represent subfolders or grouped documents.
    • Or, Batch Folders will contain Batch Pages to represent individual pages of documents.

This structured approach allows Grooper to efficiently manage and process documents at various levels of granularity — from a full batch down to individual pages.

Related Objects

Batch

inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

Batch Folder

The folder Batch Folder is an organizational unit within a inventory_2 Batch, allowing for a structured approach to managing and processing a collection of documents. Batch Folder nodes serve two purposes in a Batch. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other Batch Folders and/or contract Batch Page nodes as children.

  • Batch Folders are frequently referred to simply as "documents" or "folders" depending on how they are used in the Batch.

Batch Page

contract Batch Page nodes represent individual pages within a inventory_2 Batch. Batch Pages are created in one of two ways: (1) When images are scanned into a Batch using the Scan Viewer. (2) Or, when split from a PDF or TIFF file using the Split Pages activity.

  • Batch Pages are frequently referred to simply as "pages".

They are created in one of two ways:

  • Physical pages can be acquired in Grooper by scanning them via the Grooper Desktop application.
  • Digital documents are acquired in Grooper as whole objects and represented as Batch Folders. Applying the Split Pages activity on a Batch Folder that represents a digital document will expose Batch Page objects as direct children.

Batch Pages allow Grooper to process and store information at the page level, which is essential for operations that include Image Processing and recognition of text (see Recognize). They enable the system to manage and process each page independently. This is critical for workflows that require detailed page-specific actions or for Batches composed of documents with different processing requirements per page.

Organization Objects

"Oganization Objects" refer to objects used to organize nodes in the Grooper node tree. These objects include:

Projects
Folder nodes in the node tree
Local Resources Folders

Organization Objects are used to store resources at one level of the node tree or another. They may be very generic organizational tools (in the case of Folder nodes) or more specialized (in the case of Projects and Local Resources Folders).

  • Folder objects refer to both:
    • The main six folder nodes in the node tree (Batches, Projects, Processes, Queues, File Stores and Machines)
    • Any folder added to the node tree to organize nodes.
  • The Project object is the primary container for design components in Grooper. It stores resources used to process document content (like a Content Model) but not document content itself (like Batch Folders).
  • The Local Resources Folder object is a specialized folder added to Content Types (e.g. Content Models and Document Types). It houses resources utilized for that Content Type and its decedents (like Data Rules and Data Types).

Related Objects

Folder

Folder objects refer to the various kinds of organizational folders in the node tree. Please do not confuse "Folder" with "Batch Folder". These are two different things. A Batch Folder is an integral part of the Batch hierarchy and used to represent a "document" in Grooper. A Folder is just a folder.

Local Resources Folder

A Local Resources Folder is a container object that can only be added to Content Types (e.g. Content Models and Document Types). It is a specialized folder that houses resources used for that Content Type and its Data Model such as Data Rules and Data Types.

Project

package_2 Projects are the primary containers for configuration nodes within Grooper. The Project is where various processing objects such as stacks Content Models, settings Batch Processes, profile objects are stored. This makes resources easier to manage, easier to save, and simplifies how node references are made in a Grooper Repository.

Content Type Objects

Content Type (Concept)

Data Element Objects

Data Element (Concept)

Extractor Objects

Connection Objects

In Grooper, "Connection Objects" play a vital role in integrating external data sources and repositories. They consist of:

CMIS Connection ...
CMIS Repository and ...
Data Connection objects.

Each of these objects serve a unique purpose while also being related through their collaborative use in connecting and managing data across various platforms and databases.

These Connection Objects are related in their collective ability to bridge Grooper with external data sources and content repositories.

  • The CMIS Connection object serves as the gateway to multiple content management systems.
  • The CMIS Repository object uses this connection to organize and manage document access for those systems.
  • The Data Connection object links Grooper to databases, allowing it to perform data lookups and synchronize with external structured data sources.

Together these Connection Objects enable Grooper to extend its data processing capabilities beyond its local domain and integrate seamlessly with external systems for end-to-end document and data management.

Related Objects

CMIS Connection

cloud CMIS Connections provide a standardized way of connecting to various content management systems (CMS). CMIS Connections allow Grooper to communicate with multiple external storage platforms, enabling access to documents and document metadata that reside outside of Grooper's immediate environment.

  • For those that support the CMIS standard, the CMIS Connection connects to the CMS using the CMIS standard.
  • For those that do not, the CMIS Connection normalizes connection and transfer protocol as if they were a CMIS platform.

CMIS Repository

settings_system_daydream CMIS Repository nodes provide document access in external storage platforms through a cloud CMIS Connection. With a CMIS Repository, users can manage and interact with those documents within Grooper. They are used primarily for import using Import Descendants and Import Query Results and for export using CMIS Export.

  • CMIS Repositories are create as a child node of a CMIS Connection using the "Import Repository" command.

Data Connection

database Data Connections connect Grooper to Microsoft SQL and supported ODBC databases. Once configured, Data Connections can be used to export data extracted from a document to a database, perform database lookups to validate data Grooper collects and other actions related to database management systems (DBMS).

  • Grooper supports MS SQL Server connectivity with the "SQL Server" connection method.
  • Grooper supports Oracle, PostgreSQL, Db2, and MySQL connectivity with the "ODBC" connection method.

Profile Objects

"Profile Objects" in Grooper serve as pre-configured settings templates used across various stages of document processing, such as scanning, image cleanup, and document separation. These objects, which include:

IP Profile ...
IP Group ...
IP Step ...
OCR Profile ...
Scanner Profile and ...
Separation Profile ...

... have their own individual functions but are also related by defining structured approaches to handling documents within Grooper.

By creating distinct profiles for each aspect of the document processing pipeline, Grooper allows for customization and optimization of each step. This standardizes settings across similar document types or processing requirements, which can contribute to consistency and efficiency in processing tasks. These "Profile Objects" collectively establish a comprehensive, repeatable, and optimized workflow for processing documents from the point of capture to the point of data extraction.

Related Objects

IP Profile

perm_media IP Profiles are a step-by-step list of image processing operations (IP Commands). They are used for several image processing related operations, but primarily for:

  1. Permanently enhancing an image during the Image Processing activity (usually to get rid of defects in a scanned image, such as skewing or borders).
  2. Cleaning up an image in-memory during the Recognize activity without altering the image to improve OCR accuracy.
  3. Computer vision operations that collect layout data (table line locations, OMR checkboxes, barcode value and more) utilized in data extraction.

IP Group

gallery_thumbnail IP Groups are containers of image IP Steps and/or IP Groups that can be added to perm_media IP Profiles. IP Groups add hierarchy to IP Profiles. They serve two primary purposes:

  1. They can be used simply to organize IP Steps for IP Profiles with large numbers of steps.
  2. They are often used with "Should Execute Expressions" and "Next Step Expressions" to conditionality execute a sequence of IP Steps.

IP Step

image IP Steps are the basic units of an perm_media IP Profile. They define a single image processing operation, called an IP Command in Grooper.

OCR Profile

library_books OCR Profiles store configuration settings for optical character recognition (OCR). They are used by the Recognize activity to convert images of text on contract Batch Pages into machine-encoded text. OCR Profiles are highly configurable, allowing fine-grained control over how OCR occurs, how pre-OCR image cleanup occurs, and how Grooper's OCR Synthesis occurs. All this works to the end goal of highly accurate OCR text data, which is used to classify documents, extract data and more.

Scanner Profile

scanner Scanner Profiles store configuration settings for operating a document scanner. Scanner Profiles provide users operating the Scan Viewer in the Review activity a quick way to select pre-saved scanner configurations.

Separation Profile

insert_page_break Separation Profiles store settings that determine how contract Batch Pages are separated into folder Batch Folders. Separation Profiles can be referenced in two ways:

  • In a Review activity's Scan Viewer settings to control how pages are separated in real time during scanning.
  • In a Separate activity as an alternative to configuring separation settings locally.

Queue Objects

"Queue Objects" in Grooper are structures designed to manage and distribute tasks within the document processing workflow. There are two main types of queues:

Processing Queue and ...
Review Queue ...

... each with a distinct function but inherently interconnected as they both coordinate the flow of work through Grooper.

The relationship between Processing Queues and Review Queues lies in their roles in managing the workflow and task distribution in Grooper. Both facilitate the progression of document processing from automatic operations to those requiring human intervention.

  • Processing Queues handle the automation side of the operation, ensuring that machine tasks are efficiently allocated across the available resources.
  • Review Queues oversee the user-driven aspects of the workflow, particularly quality control and verification processes that require manual input.

Together, these queues ensure a smooth transition between automated and manual stages of document processing and help maintain order and efficiency within the system.

Related Objects

Processing Queue

memory Processing Queues help automate "machine performed tasks" (Those are Code Activity tasks performed by computer Machines and their Activity Processing services). Processing Queues are assigned to Batch Process Steps to distribute tasks, control the maximum processing rate, and set the "concurrency mode" (specifying if and how parallelism can occur across one or more servers).

  • Processing Queues are used to dedicate Activity Processing services with a capped number of processing threads to resource intensive activities, such as Recognize. That way, these compute hungry tasks won't gobble up all available system resources.
  • Processing Queues are also used to manage activities, such as Render, who can only have one activity instance running per machine (This is done by changing the queue's Concurrency Mode from "Maximum" to "Per Machine").
  • Processing Queues are also used to throttle Export tasks in scenarios where the export destination can only accept one document at a time.

Review Queue

person_play Review Queues help organize and filter human-performed Review activity tasks. User groups are assigned to each Review Queue, which is then set either on a settings Batch Process or a Review step. Based on a user's membership in Review Queues, this will affect how inventory_2 Batches are distributed in the Batches page and how Review tasks are distributed in the Tasks page.

Process Objects

"Process Objects" in Grooper, which include...

Batch Process and ...
Batch Process Step ...

... are closely related in managing and executing a sequence of steps designed to process a collection of documents known as a Batch

Note: The icon for a Batch Process Step will change depending on how you add the object to a Batch Process. If you use the "Add" object-command it will give the Batch Process Step the icon used above. If you use the "Add Activity" object command, it will give the Batch Process Step an icon according the the activity chosen.
Below is an example of a Batch Process with several child Batch Process Steps that were added using the "Add Activity" object-command:
Batch Process
Split Pages
Recognize
Separate
Classify
Extract
Review
Export
Dispose Batch

A Batch Process consists of a series of Batch Process Steps meant to be executed in a particular sequence for a batch of documents. Before a Batch Process can be used in production, it must be "published". Publishing a Batch Process will create a read-only copy in the "Processes" folder of the node tree, making it accessible for production purposes.

In essence, a Batch Process defines the overall workflow for processing documents. It relies on Batch Process Steps to perform each action required during the process. Each Batch Process Step represents a discrete operation, or "activity", within the broader scope of the Batch Process. Batches Processes and Batch Process Steps work together to ensure that documents are handled in a consistent and controlled manner.

Related Objects

Batch Process

settings Batch Process nodes are crucial components in Grooper's architecture. A Batch Process is the step-by-step processing instructions given to a inventory_2 Batch. Each step is comprised of a "Code Activity" or a Review activity. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Processes by themselves do nothing. Instead, they execute edit_document Batch Process Steps which are added as children nodes.
  • A Batch Process is often referred to as simply a "process".

Batch Process Step

edit_document Batch Process Steps are specific actions within a settings Batch Process sequence. Each Batch Process Step performs an "Activity" specific to some document processing task. These Activities will either be a "Code Activity" or "Review" activities. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Process Steps are frequently referred to as simply "steps".
  • Because a single Batch Process Step executes a single Activity configuration, they are often referred to by their referenced Activity as well. For example, a "Recognize step".

Architecture Objects

In Grooper, "Architecture Objects" organize and oversee the infrastructure and framework of the Grooper repository. A "Grooper Repository" is a tree structure of nodes representing both configuration and content objects. These objects include the...

Root ...
File Store and ...
Machine objects ...

... each with distinct roles but also working in conjunction to manage resources and information flow within the repository.

The relationship among these "Architecture Objects" is foundational to the operation and scalability of Grooper's document processing capabilities.

  • The Root object provides a base structure.
  • The Filestore offers a storage utility for files and content.
  • The Machine objects represent the hardware resources for performing processing tasks.

Together, they comprise the essential components that underpin the function and manageability of the Grooper ecosystem.

Related Objects

Root

The Grooper database Root node is the topmost element of the Grooper Repository. All other nodes in a Grooper Repository are its children/descendants. The Grooper Root also stores several settings that apply to the Grooper Repository, including the license serial number or license service URL and Repository Options.

File Store

hard_drive File Store nodes are a key part of Grooper's "database and file store" architecture. They define a storage location where file content associated with Grooper nodes are saved. This allows processing tasks to create, store and manipulate content related to documents, images, and other "files".

  • Not every node in Grooper will have files associated with it, but if it does, those files are stored in the Windows folder location defined by the File Store node.

Machine

computer Machine nodes represent servers that have connected to the Grooper Repository. They are essential for distributing task processing loads across multiple servers. Grooper creates Machine nodes automatically whenever a server makes a new connection to a Grooper Repository's database. Once added, Machine nodes can be used to view server information and to manage Grooper Service instances.

Miscellaneous Objects

The following ojbects are related only in that they don't fit neatly into the groups defined above in this article.

(un)Related Objects

Control Sheet

Control Sheets in Grooper are special pages used to control various aspects of the document scanning process. Control Sheets can serve multiple functions such as:

  • separating and classifying documents
  • changing image settings dynamically
  • create a new folder with specific Content Types
  • trigger other actions that affect how documents are handled as they pass through the scanning equipment

Control sheets are pre-printed with barcodes or other markers that Grooper recognizes and uses to perform specific actions based on the presence of the sheet. For instance, when a control sheet instructs the creation of a new folder it can influence the hierarchy within a batch. This enables the management and organization of documents without manual intervention during the Scan activity.

Overall, Control Sheets are an intelligent way to guide the scanning workflow. Control Sheets can ensure that batches of documents are organized and processed according to predefined rules, thereby automating the structuring of scanned content into logical units within Grooper.

Data Rule

flowsheet Data Rules are used to normalize or otherwise prepare data collected in a data_table Data Model for downstream processes. Data Rules define data manipulation logic for data extracted from documents (folder Batch Folders) to ensure data conforms to expected formats or meets certain standards.

  • Each Data Rule executes a "Data Action" which do things like computing a field's value, parse a field into other fields, perform lookups, and more.
  • Data Actions can be conditionally executed based on a Data Rule's "Trigger" expression.
  • A hierarchy of Data Rules can be created to execute multiple Data Actions and perform complex data transformation tasks.
  • Data Rules can be applied by:
    • The Apply Rules activity (must be done after data is collected by the Extract activity)
    • The Extract activity (will run after the Data Model extraction)
    • The Convert Data activity when converting document to another Document Type
    • They can be applied manually in a Data Viewer with the "Run Rule" command.

The execution of a Data Rule takes place during the Apply Rules activity. Data Rules can be applied at different scopes such as each individual type of "Data Element". The rule can be set to execute conditionally based on a Trigger expression. If the Trigger evaluates to true, the Data Rule's True Action is applied, and if false, its False Action is executed. Data Rules can recursively apply logic to the hierarchy of data within a document instance, enabling complex data transformation and normalization operations that reflect the structure of the extracted data.

Overall, Data Rules in Grooper simplify extractors by separating the data normalization logic from the extraction logic, allowing for flexible and powerful post-extraction data processing .

Lexicon

dictionary Lexicons are dictionaries used throughout Grooper to store lists of words, phrases, weightings for Fuzzy RegEx, and more. Users can add entries to a Lexicon, Lexicons can import entries from other Lexicons by referencing them, and entries can be dynamically imported from a database using a database Data Connection. Lexicons are commonly used to aid in data extraction, with the "List Match" and "Word Match" extractors utilizing them most commonly.

Object Library

extension Object Library nodes are .NET libraries that contain code files for customizing the Grooper's functionality. These libraries are used for a range of customization and integration tasks, allowing users to extend Grooper's capabilities.

Examples include:
  • Adding custom Activities that execute within Batch Processes
  • Creating custom commands available during the Review activity and in the Design page.
  • Defining custom methods that can be called from code expressions on Data Field and Batch Process Step objects.
  • Creating custom Connection Types for CMIS Connections for import/export operations from/to CMS systems.
  • Establish custom Grooper Services that perform automated background tasks at regular intervals

Resource File

Resource Files are nodes you can add to a package_2 Project and store any kind of file. Each Resource File stores one file. While you can use Resource Files to store any kind of file in a Project, there are several areas in Grooper that can reference Resource Files to one end or another, including XML schema files used for Grooper's XML Schema Integration.