Object Nomenclature (Concept): Difference between revisions

From Grooper Wiki
Line 74: Line 74:
==== Local Resources Folder ====
==== Local Resources Folder ====


{{LocalResourcesIcon}} A '''Local Resources Folder''' is a container object that can only be added to '''Content Types''' (e.g. '''Content Models''' and '''Document Types'''). It is a specialized folder that houses resources used for that '''Content Type''' and its '''Data Model''' such as '''Data Rules''' and '''Data Types'''.
{{LocalResourcesFolderIcon}} A '''Local Resources Folder''' is a container object that can only be added to '''Content Types''' (e.g. '''Content Models''' and '''Document Types'''). It is a specialized folder that houses resources used for that '''Content Type''' and its '''Data Model''' such as '''Data Rules''' and '''Data Types'''.


==== Project ====
==== Project ====

Revision as of 08:41, 17 June 2025

A Grooper environment consists of many interrelated objects.

The Grooper Wiki's Object Nomenclature defines how Grooper users categorize and refer to different types of Node Objects in a Grooper Repository. Knowing what objects can be added to the Grooper Node Tree and how they are related is a critical part of understanding Grooper itself.

About

Understanding Grooper involves recognizing how different objects serve similar functions and therefore be grouped together based on their shared functionalities. Disparate objects often perform analogous tasks, albeit with differing characteristics or representations.

By discerning commonalities in functionality across diverse objects, users can streamline their approach to data processing and analysis within Grooper. Rather than treating each object in isolation, users can categorize them based on their functional similarities, thus simplifying management and enhancing efficiency.

This approach fosters a more holistic understanding of Grooper, empowering users to devise more effective strategies for data extraction, classification, and interpretation. By recognizing the underlying functional relationships between objects, users can optimize workflows, improve accuracy, and derive deeper insights from their data.

High Level Overview

This article is meant to be a high level overview of many objects in Grooper and how they're related.

  • Primarily, this article is focused on "Node Types" but will include other Grooper objects when appropriate.
  • If you need more specific information on a particular object, please click the hyperlink for that specific object (as listed in the category's "Related Objects" section) to be taken to an article giving more information on that object.

Batch Objects

In Grooper, "Batch Objects" represent the hierarchical structure of documents being processed. These objects consist of:

inventory_2 Batches
folder Batch Folders
contract Batch Pages

Each serves a distinct function within this hierarchy but are also fundamentally related to each other.

The relationship between these objects is hierarchical in nature.

  • The Batch object is the top level
  • Each Batch contains: Batch Folders and Batch Pages
    • Batch Folders may contain further Batch Folders to represent subfolders or grouped documents.
    • Or, Batch Folders will contain Batch Pages to represent individual pages of documents.

This structured approach allows Grooper to efficiently manage and process documents at various levels of granularity — from a full batch down to individual pages.

Related Objects

Batch

inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

Batch Folder

The folder Batch Folder is an organizational unit within a inventory_2 Batch, allowing for a structured approach to managing and processing a collection of documents. Batch Folder nodes serve two purposes in a Batch. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other Batch Folders and/or contract Batch Page nodes as children.

  • Batch Folders are frequently referred to simply as "documents" or "folders" depending on how they are used in the Batch.

Batch Page

contract Batch Page nodes represent individual pages within a inventory_2 Batch. Batch Pages are created in one of two ways: (1) When images are scanned into a Batch using the Scan Viewer. (2) Or, when split from a PDF or TIFF file using the Split Pages activity.

  • Batch Pages are frequently referred to simply as "pages".

They are created in one of two ways:

  • Physical pages can be acquired in Grooper by scanning them via the Grooper Desktop application.
  • Digital documents are acquired in Grooper as whole objects and represented as Batch Folders. Applying the Split Pages activity on a Batch Folder that represents a digital document will expose Batch Page objects as direct children.

Batch Pages allow Grooper to process and store information at the page level, which is essential for operations that include Image Processing and recognition of text (see Recognize). They enable the system to manage and process each page independently. This is critical for workflows that require detailed page-specific actions or for Batches composed of documents with different processing requirements per page.

Organization Nodes

"Organization nodes" refer to nodes in the Grooper node tree used to organize other nodes. These nodes include:

package_2 Projects
folder_open Folder nodes in the node tree
Template:LocalResourcesIcon Local Resources Folders

Organization nodes are used to store resources at one level of the node tree or another. They may be very generic organizational tools (in the case of Folder nodes) or more specialized (in the case of Projects and Local Resources Folders).

  • "Folder" refers to both:
    • The main six folder nodes in the node tree (Batches, Projects, Processes, Queues, File Stores and Machines)
    • Any folder node added to the node tree to organize other nodes.
  • Project nodes are the primary containers for design components in Grooper.
    • They store node resources used to process document content (like a Content Model).
    • They do not store document content itself. Projects do not store Batches, Batch Folders, and Batch Pages (Those are stored in the "Batches" branch of the node tree).
  • Local Resources Folder nodes are specialized folders added to Content Types (e.g. Content Models and Document Types). They house node resources utilized for their parent Content Type and its decedents (like Data Rules and Data Types).

Related Objects

Folder

folder_open Folder objects refer to the various kinds of organizational folders in the node tree. Please do not confuse "Folder" with "Batch Folder". These are two different things. A Batch Folder is an integral part of the Batch hierarchy and used to represent a "document" in Grooper. A Folder is just a folder.

Local Resources Folder

folder_data A Local Resources Folder is a container object that can only be added to Content Types (e.g. Content Models and Document Types). It is a specialized folder that houses resources used for that Content Type and its Data Model such as Data Rules and Data Types.

Project

package_2 Projects are the primary containers for configuration nodes within Grooper. The Project is where various processing objects such as stacks Content Models, settings Batch Processes, profile objects are stored. This makes resources easier to manage, easier to save, and simplifies how node references are made in a Grooper Repository.

Content Types

Content Type (Concept)

Data Elements

Data Element (Concept)

Extractor nodes

Connection nodes

In Grooper, "connection nodes" play a vital role in integrating external data sources and repositories. They consist of:

cloud CMIS Connection ...
settings_system_daydream CMIS Repository and ...
database Data Connection nodes.

Each of these node types serve a unique purpose while also being related through their collaborative use in connecting and managing data across various platforms and databases.

These connections nodes are related in their collective ability to bridge Grooper with external data sources and content repositories.

  • CMIS Connections' serve as the gateway to multiple content management systems.
  • CMIS Repositories' uses the connection established by CMIS Connections to organize and manage document access for those systems.
  • Data Connections link Grooper to databases, allowing it to export data to databases, perform data lookups and synchronize with external structured data sources.

Together these connection nodes enable Grooper to extend its data processing capabilities beyond its local domain and integrate seamlessly with external systems for end-to-end document and data management.

Related Node Types

CMIS Connection

cloud CMIS Connections provide a standardized way of connecting to various content management systems (CMS). CMIS Connections allow Grooper to communicate with multiple external storage platforms, enabling access to documents and document metadata that reside outside of Grooper's immediate environment.

  • For those that support the CMIS standard, the CMIS Connection connects to the CMS using the CMIS standard.
  • For those that do not, the CMIS Connection normalizes connection and transfer protocol as if they were a CMIS platform.

CMIS Repository

settings_system_daydream CMIS Repository nodes provide document access in external storage platforms through a cloud CMIS Connection. With a CMIS Repository, users can manage and interact with those documents within Grooper. They are used primarily for import using Import Descendants and Import Query Results and for export using CMIS Export.

  • CMIS Repositories are create as a child node of a CMIS Connection using the "Import Repository" command.

Data Connection

database Data Connections connect Grooper to Microsoft SQL and supported ODBC databases. Once configured, Data Connections can be used to export data extracted from a document to a database, perform database lookups to validate data Grooper collects and other actions related to database management systems (DBMS).

  • Grooper supports MS SQL Server connectivity with the "SQL Server" connection method.
  • Grooper supports Oracle, PostgreSQL, Db2, and MySQL connectivity with the "ODBC" connection method.

Profile Objects

"Profile Objects" in Grooper serve as pre-configured settings templates used across various stages of document processing, such as scanning, image cleanup, and document separation. These objects, which include:

perm_media IP Profile ...
gallery_thumbnail IP Group ...
image IP Step ...
library_books OCR Profile ...
scanner Scanner Profile and ...
insert_page_break Separation Profile ...

... have their own individual functions but are also related by defining structured approaches to handling documents within Grooper.

By creating distinct profiles for each aspect of the document processing pipeline, Grooper allows for customization and optimization of each step. This standardizes settings across similar document types or processing requirements, which can contribute to consistency and efficiency in processing tasks. These "Profile Objects" collectively establish a comprehensive, repeatable, and optimized workflow for processing documents from the point of capture to the point of data extraction.

Related Objects

IP Profile

perm_media IP Profiles are a step-by-step list of image processing operations (IP Commands). They are used for several image processing related operations, but primarily for:

  1. Permanently enhancing an image during the Image Processing activity (usually to get rid of defects in a scanned image, such as skewing or borders).
  2. Cleaning up an image in-memory during the Recognize activity without altering the image to improve OCR accuracy.
  3. Computer vision operations that collect layout data (table line locations, OMR checkboxes, barcode value and more) utilized in data extraction.

IP Group

gallery_thumbnail IP Groups are containers of image IP Steps and/or IP Groups that can be added to perm_media IP Profiles. IP Groups add hierarchy to IP Profiles. They serve two primary purposes:

  1. They can be used simply to organize IP Steps for IP Profiles with large numbers of steps.
  2. They are often used with "Should Execute Expressions" and "Next Step Expressions" to conditionality execute a sequence of IP Steps.

IP Step

image IP Steps are the basic units of an perm_media IP Profile. They define a single image processing operation, called an IP Command in Grooper.

OCR Profile

library_books OCR Profiles store configuration settings for optical character recognition (OCR). They are used by the Recognize activity to convert images of text on contract Batch Pages into machine-encoded text. OCR Profiles are highly configurable, allowing fine-grained control over how OCR occurs, how pre-OCR image cleanup occurs, and how Grooper's OCR Synthesis occurs. All this works to the end goal of highly accurate OCR text data, which is used to classify documents, extract data and more.

Scanner Profile

scanner Scanner Profiles store configuration settings for operating a document scanner. Scanner Profiles provide users operating the Scan Viewer in the Review activity a quick way to select pre-saved scanner configurations.

Separation Profile

insert_page_break Separation Profiles store settings that determine how contract Batch Pages are separated into folder Batch Folders. Separation Profiles can be referenced in two ways:

  • In a Review activity's Scan Viewer settings to control how pages are separated in real time during scanning.
  • In a Separate activity as an alternative to configuring separation settings locally.

Queue Objects

"Queue Objects" in Grooper are structures designed to manage and distribute tasks within the document processing workflow. There are two main types of queues:

memory Processing Queue and ...
person_play Review Queue ...

... each with a distinct function but inherently interconnected as they both coordinate the flow of work through Grooper.

The relationship between Processing Queues and Review Queues lies in their roles in managing the workflow and task distribution in Grooper. Both facilitate the progression of document processing from automatic operations to those requiring human intervention.

  • Processing Queues handle the automation side of the operation, ensuring that machine tasks are efficiently allocated across the available resources.
  • Review Queues oversee the user-driven aspects of the workflow, particularly quality control and verification processes that require manual input.

Together, these queues ensure a smooth transition between automated and manual stages of document processing and help maintain order and efficiency within the system.

Related Objects

Processing Queue

memory Processing Queues help automate "machine performed tasks" (Those are Code Activity tasks performed by computer Machines and their Activity Processing services). Processing Queues are assigned to Batch Process Steps to distribute tasks, control the maximum processing rate, and set the "concurrency mode" (specifying if and how parallelism can occur across one or more servers).

  • Processing Queues are used to dedicate Activity Processing services with a capped number of processing threads to resource intensive activities, such as Recognize. That way, these compute hungry tasks won't gobble up all available system resources.
  • Processing Queues are also used to manage activities, such as Render, who can only have one activity instance running per machine (This is done by changing the queue's Concurrency Mode from "Maximum" to "Per Machine").
  • Processing Queues are also used to throttle Export tasks in scenarios where the export destination can only accept one document at a time.

Review Queue

person_play Review Queues help organize and filter human-performed Review activity tasks. User groups are assigned to each Review Queue, which is then set either on a settings Batch Process or a Review step. Based on a user's membership in Review Queues, this will affect how inventory_2 Batches are distributed in the Batches page and how Review tasks are distributed in the Tasks page.

Process Objects

"Process Objects" in Grooper, which include...

settings Batch Process and ...
edit_document Batch Process Step ...

... are closely related in managing and executing a sequence of steps designed to process a collection of documents known as a Batch

Note: The icon for a Batch Process Step will change depending on how you add the object to a Batch Process. If you use the "Add" object-command it will give the Batch Process Step the icon used above. If you use the "Add Activity" object command, it will give the Batch Process Step an icon according the the activity chosen.
Below is an example of a Batch Process with several child Batch Process Steps that were added using the "Add Activity" object-command:
settings Batch Process
description Split Pages
format_letter_spacing_wide Recognize
insert_page_break Separate
unknown_document Classify
export_notes Extract
person_search Review
output Export
inventory_2 Dispose Batch

A Batch Process consists of a series of Batch Process Steps meant to be executed in a particular sequence for a batch of documents. Before a Batch Process can be used in production, it must be "published". Publishing a Batch Process will create a read-only copy in the "Processes" folder of the node tree, making it accessible for production purposes.

In essence, a Batch Process defines the overall workflow for processing documents. It relies on Batch Process Steps to perform each action required during the process. Each Batch Process Step represents a discrete operation, or "activity", within the broader scope of the Batch Process. Batches Processes and Batch Process Steps work together to ensure that documents are handled in a consistent and controlled manner.

Related Objects

Batch Process

settings Batch Process nodes are crucial components in Grooper's architecture. A Batch Process is the step-by-step processing instructions given to a inventory_2 Batch. Each step is comprised of a "Code Activity" or a Review activity. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Processes by themselves do nothing. Instead, they execute edit_document Batch Process Steps which are added as children nodes.
  • A Batch Process is often referred to as simply a "process".

Batch Process Step

edit_document Batch Process Steps are specific actions within a settings Batch Process sequence. Each Batch Process Step performs an "Activity" specific to some document processing task. These Activities will either be a "Code Activity" or "Review" activities. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Process Steps are frequently referred to as simply "steps".
  • Because a single Batch Process Step executes a single Activity configuration, they are often referred to by their referenced Activity as well. For example, a "Recognize step".

Architecture Objects

In Grooper, "Architecture Objects" organize and oversee the infrastructure and framework of the Grooper repository. A "Grooper Repository" is a tree structure of nodes representing both configuration and content objects. These objects include the...

database Root ...
hard_drive File Store and ...
computer Machine objects ...

... each with distinct roles but also working in conjunction to manage resources and information flow within the repository.

The relationship among these "Architecture Objects" is foundational to the operation and scalability of Grooper's document processing capabilities.

  • The Root object provides a base structure.
  • The Filestore offers a storage utility for files and content.
  • The Machine objects represent the hardware resources for performing processing tasks.

Together, they comprise the essential components that underpin the function and manageability of the Grooper ecosystem.

Related Objects

Root

The Grooper database Root node is the topmost element of the Grooper Repository. All other nodes in a Grooper Repository are its children/descendants. The Grooper Root also stores several settings that apply to the Grooper Repository, including the license serial number or license service URL and Repository Options.

File Store

hard_drive File Store nodes are a key part of Grooper's "database and file store" architecture. They define a storage location where file content associated with Grooper nodes are saved. This allows processing tasks to create, store and manipulate content related to documents, images, and other "files".

  • Not every node in Grooper will have files associated with it, but if it does, those files are stored in the Windows folder location defined by the File Store node.

Machine

computer Machine nodes represent servers that have connected to the Grooper Repository. They are essential for distributing task processing loads across multiple servers. Grooper creates Machine nodes automatically whenever a server makes a new connection to a Grooper Repository's database. Once added, Machine nodes can be used to view server information and to manage Grooper Service instances.

Miscellaneous Objects

The following ojbects are related only in that they don't fit neatly into the groups defined above in this article.

(un)Related Objects

AI Analyst

An psychology AI Analyst object in Grooper defines a role meant to harness artificial intelligence capabilities, particularly from OpenAI. AI Analyst objects assist with chat sessions and other interactive tasks. This object is set up to act as an AI-driven analyst, which can be configured with specific models such as "gpt-4-1106-preview" to serve as the "brain" of the analyst, performing complex AI functions.

Key properties of an AI Analyst include:

  • Model: Defines the OpenAI model that powers the AI Analyst, impacting its cognitive capabilities.
  • Enable Code Interpreter: Specifies whether to allow the AI Analyst to write and run Python code in a sandboxed environment. This enables the AI Analyst to process diverse data or generate output like data files and graphs.
  • Instructions: Detailed instructions provided to guide the responses and behavior of the AI Analyst during interaction.
  • Knowledge: Sets the scope of knowledge available to the AI Analyst to inform its understanding and responses.
  • Predefined Messages: A list of predetermined messages that the AI Analyst can use during chat sessions.

The AI Analyst object must have connectivity established with OpenAI. This is done by configuring the Options property on the Root of the Grooper repository. The AI Analyst object can be applied to facilitate AI interactions in chat sessions, offering dynamic responses based on the conversational context, the provided knowledge base, and specific instructions set up for the analyst's behavior .

Control Sheet

document_scanner Control Sheets in Grooper are special pages used to control various aspects of the document scanning process. Control Sheets can serve multiple functions such as:

  • separating and classifying documents
  • changing image settings dynamically
  • create a new folder with specific Content Types
  • trigger other actions that affect how documents are handled as they pass through the scanning equipment

Control sheets are pre-printed with barcodes or other markers that Grooper recognizes and uses to perform specific actions based on the presence of the sheet. For instance, when a control sheet instructs the creation of a new folder it can influence the hierarchy within a batch. This enables the management and organization of documents without manual intervention during the Scan activity.

Overall, Control Sheets are an intelligent way to guide the scanning workflow. Control Sheets can ensure that batches of documents are organized and processed according to predefined rules, thereby automating the structuring of scanned content into logical units within Grooper.

Data Rule

flowsheet Data Rules are used to normalize or otherwise prepare data collected in a data_table Data Model for downstream processes. Data Rules define data manipulation logic for data extracted from documents (folder Batch Folders) to ensure data conforms to expected formats or meets certain standards.

  • Each Data Rule executes a "Data Action" which do things like computing a field's value, parse a field into other fields, perform lookups, and more.
  • Data Actions can be conditionally executed based on a Data Rule's "Trigger" expression.
  • A hierarchy of Data Rules can be created to execute multiple Data Actions and perform complex data transformation tasks.
  • Data Rules can be applied by:
    • The Apply Rules activity (must be done after data is collected by the Extract activity)
    • The Extract activity (will run after the Data Model extraction)
    • The Convert Data activity when converting document to another Document Type
    • They can be applied manually in a Data Viewer with the "Run Rule" command.

The execution of a Data Rule takes place during the Apply Rules activity. Data Rules can be applied at different scopes such as each individual type of "Data Element". The rule can be set to execute conditionally based on a Trigger expression. If the Trigger evaluates to true, the Data Rule's True Action is applied, and if false, its False Action is executed. Data Rules can recursively apply logic to the hierarchy of data within a document instance, enabling complex data transformation and normalization operations that reflect the structure of the extracted data.

Overall, Data Rules in Grooper simplify extractors by separating the data normalization logic from the extraction logic, allowing for flexible and powerful post-extraction data processing .

Lexicon

dictionary Lexicons are dictionaries used throughout Grooper to store lists of words, phrases, weightings for Fuzzy RegEx, and more. Users can add entries to a Lexicon, Lexicons can import entries from other Lexicons by referencing them, and entries can be dynamically imported from a database using a database Data Connection. Lexicons are commonly used to aid in data extraction, with the "List Match" and "Word Match" extractors utilizing them most commonly.

Object Library

extension Object Library nodes are .NET libraries that contain code files for customizing the Grooper's functionality. These libraries are used for a range of customization and integration tasks, allowing users to extend Grooper's capabilities.

Examples include:
  • Adding custom Activities that execute within Batch Processes
  • Creating custom commands available during the Review activity and in the Design page.
  • Defining custom methods that can be called from code expressions on Data Field and Batch Process Step objects.
  • Creating custom Connection Types for CMIS Connections for import/export operations from/to CMS systems.
  • Establish custom Grooper Services that perform automated background tasks at regular intervals

Resource File

Resource Files are nodes you can add to a package_2 Project and store any kind of file. Each Resource File stores one file. While you can use Resource Files to store any kind of file in a Project, there are several areas in Grooper that can reference Resource Files to one end or another, including XML schema files used for Grooper's XML Schema Integration.