Object Nomenclature (Concept): Difference between revisions

From Grooper Wiki
Line 62: Line 62:
=== Related Objects ===
=== Related Objects ===
==== CMIS Connection ====
==== CMIS Connection ====
[[image:GrooperIcon_CMISConnection.png]] '''[[CMIS Connection|CMIS Connections]]''' in '''Grooper''' provide a standardized way of connecting to various content management systems (CMS). 
{{#lst:Glossary|CMIS Connection}}
* For those that support the [https://en.wikipedia.org/wiki/Content_Management_Interoperability_Services CMIS] standard, the '''CMIS Connection''' connects to the CMS using the CMIS standard.
* For those that do not, the '''CMIS Connection''' normalizes connection and transfer protocol as if they ''were'' a CMIS platform.
 
This object allows '''Grooper''' to communicate with multiple external storage platforms, enabling access to documents and content that reside outside of '''Grooper's''' immediate environment.
 
==== CMIS Repository ====
==== CMIS Repository ====
[[image:GrooperIcon_CMISRepository.png]] '''[[CMIS Repository|CMIS Repositories]]''' represent a logical container for documents on an external storage platform that is accessed via a '''CMIS Connection'''. These objects facilitate the organization and retrieval of documents stored in a CMIS-compliant repository, enabling '''Grooper''' to work with documents as if they were within its local infrastructure . A '''CMIS Repoistory''' ojbect is created as a "child" of the '''CMIS Connection''' object via "Import" button found in the top-right of the UI after successfully configuring the '''CMIS Connection''' object and creating a connection to its destination. The '''CMIS Repository''' object is referenced for lookups, '''[[CMIS Import]]''', and the '''[[Export (Activity)|Export]]''' activity.
{{#lst:Glossary|CMIS Repository}}
 
==== Data Connection ====
==== Data Connection ====
[[image:GrooperIcon_DataConnection.png]] '''[[Data Connection|Data Connections]]''' define the settings necessary to establish connectivity with a database. A '''Data Connection''' object holds the configuration details required for connecting to and interacting with a database. These interactions may include conducting lookups, exports, or other actions that relate to database management systems (DBMS). Once configured, a '''Data Connection''' object can be referenced by other components in '''Grooper''' for various DBMS-related activities.
{{#lst:Glossary|Data Connection}}


== Profile Objects ==
== Profile Objects ==

Revision as of 09:21, 29 April 2024

A Grooper environment consists of many interrelated objects.

The Grooper Wiki's Object Nomenclature defines how Grooper users categorize and refer to different types of Node Objects in a Grooper Repository. Knowing what objects can be added to the Grooper Node Tree and how they are related is a critical part of understanding Grooper itself.

About

In Grooper, understanding the objects within the platform involves recognizing how various elements can serve similar functions and therefore be grouped together based on their shared functionalities. This concept stems from the recognition that disparate objects often perform analogous tasks, albeit with differing characteristics or representations.

By discerning commonalities in functionality across diverse objects, users can streamline their approach to data processing and analysis within Grooper. Rather than treating each object in isolation, users can categorize them based on their functional similarities, thus simplifying management and enhancing efficiency.

This approach fosters a more holistic understanding of the data ecosystem within Grooper, empowering users to devise more effective strategies for data extraction, classification, and interpretation. By recognizing the underlying functional relationships between objects, users can optimize workflows, improve accuracy, and derive deeper insights from their data.

High Level Overview

This article is meant to be a high level overview of all the objects in Grooper and how they're related. If you need more specific information on a particular object, please click the hyperlink for that specific object (as listed in the category's "Related Objects" section) to be taken to an article giving more informatoin on that object.

Batch Objects

In Grooper, "Batch Objects" represent the hierarchical structure of documents being processed and consist of:

Batch ...
Batch Folder and ...
Batch Page objects ...

... each serving a distinct function within this hierarchy but also being fundamentally related.

The relationship between these objects is hierarchical in nature. The Batch object is the top level. It contains:

  • Batch Folders and ...
  • Batch Pages

Batch Folders may contain either further Batch Folders (to represent subfolders or grouped documents) or Batch Pages (to represent individual pages of documents). This structured approach allows Grooper to efficiently manage and process documents at various levels of granularity — from a full batch down to individual pages.

Related Objects

Batch

inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

Batch Folder

The folder Batch Folder is an organizational unit within a inventory_2 Batch, allowing for a structured approach to managing and processing a collection of documents. Batch Folder nodes serve two purposes in a Batch. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other Batch Folders and/or contract Batch Page nodes as children.

  • Batch Folders are frequently referred to simply as "documents" or "folders" depending on how they are used in the Batch.

Batch Page

contract Batch Page nodes represent individual pages within a inventory_2 Batch. Batch Pages are created in one of two ways: (1) When images are scanned into a Batch using the Scan Viewer. (2) Or, when split from a PDF or TIFF file using the Split Pages activity.

  • Batch Pages are frequently referred to simply as "pages".

They are created in one of two ways:

  • Physical pages can be acquired in Grooper by scanning them via the Grooper Desktop application.
  • Digital documents are acquired in Grooper as whole objects and represented as Batch Folders. Applying the Split Pages activity on a Batch Folder that represents a digital document will expose Batch Page objects as direct children.

Batch Pages allow Grooper to process and store information at the page level, which is essential for operations that include Image Processing and recognition of text (see Recognize). They enable the system to manage and process each page independently. This is critical for workflows that require detailed page-specific actions or for Batches composed of documents with different processing requirements per page.

Content Type Objects

Types of Content Types

In Grooper, the "Content Type" nodes consist of:

stacks Content Model ...
collections_bookmark Content Category and ...
description Document Type nodes.

These nodes create a classification taxonomy in Grooper. They define how documents are classified, what data to collect from a document, how different kinds of documents are related, and even how certain activities like Export should behave based on how a document is classified.

Content Types work together in Grooper to enable sophisticated document processing workflows. With different types of documents properly classified, they can have their data extracted and are handled according to the rules and behaviors defined by the Document Types within a Content Model.

The relationship between these Content Types is established through a hierarchical inheritance system. Content Categories and Document Types are building blocks within a Content Model seen as the "tree". Content Categories act as the "branches". Document Types are the "leaves" of the hierarchy.

Content Types and document classification

Documents are classified by having a Content Type (usually a Document Type) assigned either by the Classify activity, manually by a user, or other mechanisms in Grooper.

The Content Model plays a special role in defining the "Classify Method" used to classify documents. Classify Methods define the logic for

Content Types and data extraction

"Data Elements" represent information written on the document and contain instructions on how to collect it.

Data Elements can be defined for each Content Type by adding a Data Model. Data Elements (including Data Fields, Data Sections and Data Tables) are added these Data Models. Data Elements are inherited down the "tree" of the Content Type hierarchy.

  • Data Elements defined at the Content Model level are applied to all Content Types within the Content Model and will apply to the whole "tree".
  • Data Elements defined at the Content Category level are applied to all Content Types that exist within that specific "branch".
  • Data Elements defined on a Document Type will apply to that specific "leaf".


  • This is why documents must be "classified" in order to have their data extracted. It is the Content Type that determines which Data Model is used to collect data when the Extract activity runs.

Content Types and "Behaviors"

"Behaviors" are a set of different configurations that affect certain Activities and other areas of Grooper based on how a document is classified. They include:

  • Import Behaviors - Defining how documents and metadata are imported from CMIS Repositories based on their classification.
  • Export Behaviors - Defining how documents and data are exported based on their classification.
  • Labeling Behaviors - Defining how Label Sets are used for documents based on their classification.
  • PDF Data Mapping - Defining several PDF generation capabilities for documents based on their classification.
  • Indexing Behavior - Defining how documents are added to a Grooper search index based on their classification.

Behaviors also respect the Content Type hierarchy.

  • Behaviors defined at the Content Model level are applied to all Content Types within the Content Model, unless a child Content Type has its own Behavior configured. Content Category and Document Type Behavior configurations will override the Content Model configuration.
  • Behaviors defined at the Content Category level are applied to all Content Types within that branch, unless a child Content Type has its own Behavior configured. Child Content Category and Document Type Behavior configurations will override a parent Content Category configuration.
  • Behaviors defined at the Document Type level are applied to that Document Type only. Document Type Behavior configurations will override all parent Content Category and/or Content Model configurations.


Related Node Types

Content Model

stacks Content Model nodes define a classification taxonomy for document sets in Grooper. This taxonomy is defined by the collections_bookmark Content Categories and description Document Types they contain. Content Models serve as the root of a Content Type hierarchy, which defines Data Element inheritance and Behavior inheritance. Content Models are crucial for organizing documents for data extraction and more.

Content Category

collections_bookmark A Content Category is a container for other Content Category or description Document Type nodes in a stacks Content Model. Content Categories are often used simply as organizational buckets for Content Models with large numbers of Document Types. However, Content Categories are also necessary to create branches in a Content Model's classification taxonomy, allowing for more complex Data Element inheritance and Behavior inheritance.

Document Type

description Document Type nodes represent a distinct type of document, such as an invoice or a contract. Document Types are created as child nodes of a stacks Content Model or a collections_bookmark Content Category. They serve three primary purposes:

  1. They are used to classify documents. Documents are considered "classified" when the folder Batch Folder is assigned a Content Type (most typically, a Document Type).
  2. The Document Type's data_table Data Model defines the Data Elements extracted by the Extract activity (including any Data Elements inherited from parent Content Types).
  3. The Document Type defines all "Behaviors" that apply (whether from the Document Type's Behavior settings or those inherited from a parent Content Type).

What about Form Types and Page Types?

Technically speaking, Form Types and Page Types are also Content Types, but they aren't typically used in the same way. Form Types and Page Types are created automatically when training example documents for classification. They hold the feature weighting data for documents.

  • Form Types
    • When a Document Type is trained for classification, the training samples are created as Form Types.
    • Form Types are generated automatically when training documents for Lexical classification (and less commonly for Visual classification).
  • Page Types
    • The Page Types are the individual pages of a Form Type. All training weightings are stored on the Page Types for each page of the training document.
    • Page Types are generated automatically when training documents for Lexical classification (and less commonly for Visual classification).


Data Element Objects

Types of Data Elements

The "Data Element" nodes in Grooper consist of:

data_table Data Model ...
variables Data Field ...
insert_page_break Data Section ...
table Data Table and ...
view_column Data Column nodes .

Each of these nodes has its own function within Grooper's data extraction architecture but are also intimately related to each other.

The relationship between these Data Elements is hierarchical and modular.

  • The Data Model acts as the overall blueprint for data extraction.
  • Data Sections structure the document into logical parts. Data Sections can also serve as simple organizational objects within a Data Model to bucket similar "Data Elements" together.
  • Data Tables are incorporated into the model to handle tabular data. Each Data Table comprises Data Columns which specify the format and rules for columnar data extraction.
  • Finally, Data Fields are the fundamental units of data of any kind representing individual pieces of non-repeated data within a document. The exception to this is when Data Fields are contained within a "multi instance" Data Section that occurs repeatedly within a document.

Related Node Types

Data Model

data_table Data Models are leveraged during the Extract activity to collect data from documents (folder Batch Folders). Data Models are the root of a Data Element hierarchy. The Data Model and its child Data Elements define a schema for data present on a document. The Data Model's configuration (and its child Data Elements' configuration) define data extraction logic and settings for how data is reviewed in a Data Viewer.

Data Field

variables Data Fields represent a single value targeted for data extraction on a document. Data Fields are created as child nodes of a data_table Data Model and/or insert_page_break Data Sections.

  • Data Fields are frequently referred to simply as "fields".

Data Section

A insert_page_break Data Section is a container for Data Elements in a data_table Data Model. variables They can contain Data Fields, table Data Tables, and even Data Sections as child nodes and add hierarchy to a Data Model. They serve two main purposes:

  1. They can simply act as organizational buckets for Data Elements in larger Data Models.
  2. By configuring its "Extract Method", a Data Section can subdivide larger and more complex documents into smaller parts to assist in extraction.
    • "Single Instance" sections define a division (or "record") that appears only once on a document.
    • "Multi-Instance" sections define collection of repeating divisions (or "records").

Data Table

A table Data Table is a Data Element specialized in extracting tabular data from documents (i.e. data formatted in rows and columns).

  • The Data Table itself defines the "Table Extract Method". This is configured to determine the logic used to locate and return the table's rows.
  • The table's columns are defined by adding view_column Data Column nodes to the Data Table (as its children).

Data Column

view_column Data Columns represent columns in a table extracted from a document. They are added as child nodes of a table Data Table. They define the type of data each column holds along with its data extraction properties.

  • Data Columns are frequently referred to simply as "columns".
  • In the context of reviewing data in a Data Viewer, a single Data Column instance in a single Data Table row, is most frequently called a "cell".


Extractor Objects

Connection Objects

In Grooper, "Connection Objects" play a vital role in integrating external data sources and repositories. They consist of:

CMIS Connection ...
CMIS Repository and ...
Data Connection objects.

Each of these objects serve a unique purpose while also being related through their collaborative use in connecting and managing data across various platforms and databases.

These Connection Objects are related in their collective ability to bridge Grooper with external data sources and content repositories.

  • The CMIS Connection object serves as the gateway to multiple content management systems.
  • The CMIS Repository object uses this connection to organize and manage document access for those systems.
  • The Data Connection object links Grooper to databases, allowing it to perform data lookups and synchronize with external structured data sources.

Together these Connection Objects enable Grooper to extend its data processing capabilities beyond its local domain and integrate seamlessly with external systems for end-to-end document and data management.

Related Objects

CMIS Connection

cloud CMIS Connections provide a standardized way of connecting to various content management systems (CMS). CMIS Connections allow Grooper to communicate with multiple external storage platforms, enabling access to documents and document metadata that reside outside of Grooper's immediate environment.

  • For those that support the CMIS standard, the CMIS Connection connects to the CMS using the CMIS standard.
  • For those that do not, the CMIS Connection normalizes connection and transfer protocol as if they were a CMIS platform.

CMIS Repository

settings_system_daydream CMIS Repository nodes provide document access in external storage platforms through a cloud CMIS Connection. With a CMIS Repository, users can manage and interact with those documents within Grooper. They are used primarily for import using Import Descendants and Import Query Results and for export using CMIS Export.

  • CMIS Repositories are create as a child node of a CMIS Connection using the "Import Repository" command.

Data Connection

database Data Connections connect Grooper to Microsoft SQL and supported ODBC databases. Once configured, Data Connections can be used to export data extracted from a document to a database, perform database lookups to validate data Grooper collects and other actions related to database management systems (DBMS).

  • Grooper supports MS SQL Server connectivity with the "SQL Server" connection method.
  • Grooper supports Oracle, PostgreSQL, Db2, and MySQL connectivity with the "ODBC" connection method.

Profile Objects

"Profile Objects" in Grooper serve as pre-configured settings templates used across various stages of document processing, such as scanning, image cleanup, and document separation. These objects, which include:

IP Profile ...
IP Group ...
IP Step ...
OCR Profile ...
Scanner Profile and ...
Separation Profile ...

... have their own individual functions but are also related by defining structured approaches to handling documents within Grooper.

By creating distinct profiles for each aspect of the document processing pipeline, Grooper allows for customization and optimization of each step. This standardizes settings across similar document types or processing requirements, which can contribute to consistency and efficiency in processing tasks. These "Profile Objects" collectively establish a comprehensive, repeatable, and optimized workflow for processing documents from the point of capture to the point of data extraction.

Related Objects

IP Profile

perm_media IP Profiles are a step-by-step list of image processing operations (IP Commands). They are used for several image processing related operations, but primarily for:

  1. Permanently enhancing an image during the Image Processing activity (usually to get rid of defects in a scanned image, such as skewing or borders).
  2. Cleaning up an image in-memory during the Recognize activity without altering the image to improve OCR accuracy.
  3. Computer vision operations that collect layout data (table line locations, OMR checkboxes, barcode value and more) utilized in data extraction.

IP Group

gallery_thumbnail IP Groups are containers of image IP Steps and/or IP Groups that can be added to perm_media IP Profiles. IP Groups add hierarchy to IP Profiles. They serve two primary purposes:

  1. They can be used simply to organize IP Steps for IP Profiles with large numbers of steps.
  2. They are often used with "Should Execute Expressions" and "Next Step Expressions" to conditionality execute a sequence of IP Steps.

IP Step

image IP Steps are the basic units of an perm_media IP Profile. They define a single image processing operation, called an IP Command in Grooper.

OCR Profile

library_books OCR Profiles store configuration settings for optical character recognition (OCR). They are used by the Recognize activity to convert images of text on contract Batch Pages into machine-encoded text. OCR Profiles are highly configurable, allowing fine-grained control over how OCR occurs, how pre-OCR image cleanup occurs, and how Grooper's OCR Synthesis occurs. All this works to the end goal of highly accurate OCR text data, which is used to classify documents, extract data and more.

Scanner Profile

scanner Scanner Profiles store configuration settings for operating a document scanner. Scanner Profiles provide users operating the Scan Viewer in the Review activity a quick way to select pre-saved scanner configurations.

Separation Profile

insert_page_break Separation Profiles store settings that determine how contract Batch Pages are separated into folder Batch Folders. Separation Profiles can be referenced in two ways:

  • In a Review activity's Scan Viewer settings to control how pages are separated in real time during scanning.
  • In a Separate activity as an alternative to configuring separation settings locally.

Queue Objects

"Queue Objects" in Grooper are structures designed to manage and distribute tasks within the document processing workflow. There are two main types of queues:

Processing Queue and ...
Review Queue ...

... each with a distinct function but inherently interconnected as they both coordinate the flow of work through Grooper.

The relationship between Processing Queues and Review Queues lies in their roles in managing the workflow and task distribution in Grooper. Both facilitate the progression of document processing from automatic operations to those requiring human intervention.

  • Processing Queues handle the automation side of the operation, ensuring that machine tasks are efficiently allocated across the available resources.
  • Review Queues oversee the user-driven aspects of the workflow, particularly quality control and verification processes that require manual input.

Together, these queues ensure a smooth transition between automated and manual stages of document processing and help maintain order and efficiency within the system.

Related Objects

Processing Queue

Processing Queues are designed for tasks performed by machines, which include automated steps in the document processing lifecycle. Processing Queues are used to distribute machine tasks among different servers and control the concurrency or processing rate of these tasks.

  • For example, activities such as rendering documents or exporting data can be managed so that only one activity instance runs per machine or so multiple instances are processed concurrently, according to the queue configuration.

Review Queue

Review Queues are designated for human-performed tasks. They organizes the Review tasks that require human attention and can distribute these tasks among different groups of users based on the queue's settings. Review Queues can be assigned on the Batch Process level to filter work by an entire process or Review activities at the Batch Process Step level to filter tasks at a more granular step-based level.

Process Objects

"Process Objects" in Grooper, which include...

Batch Process and ...
Batch Process Step ...

... are closely related in managing and executing a sequence of steps designed to process a collection of documents known as a Batch

Note: The icon for a Batch Process Step will change depending on how you add the object to a Batch Process. If you use the "Add" object-command it will give the Batch Process Step the icon used above. If you use the "Add Activity" object command, it will give the Batch Process Step an icon according the the activity chosen.
Below is an example of a Batch Process with several child Batch Process Steps that were added using the "Add Activity" object-command:
Batch Process
Split Pages
Recognize
Separate
Classify
Extract
Review
Export
Dispose Batch

A Batch Process consists of a series of Batch Process Steps meant to be executed in a particular sequence for a batch of documents. Before a Batch Process can be used in production, it must be "published". Publishing a Batch Process will create a read-only copy in the "Processes" folder of the node tree, making it accessible for production purposes.

In essence, a Batch Process defines the overall workflow for processing documents. It relies on Batch Process Steps to perform each action required during the process. Each Batch Process Step represents a discrete operation, or "activity", within the broader scope of the Batch Process. Batches Processes and Batch Process Steps work together to ensure that documents are handled in a consistent and controlled manner.

Related Objects

Batch Process

settings Batch Process nodes are crucial components in Grooper's architecture. A Batch Process is the step-by-step processing instructions given to a inventory_2 Batch. Each step is comprised of a "Code Activity" or a Review activity. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Processes by themselves do nothing. Instead, they execute edit_document Batch Process Steps which are added as children nodes.
  • A Batch Process is often referred to as simply a "process".

Batch Process Step

edit_document Batch Process Steps are specific actions within a settings Batch Process sequence. Each Batch Process Step performs an "Activity" specific to some document processing task. These Activities will either be a "Code Activity" or "Review" activities. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Process Steps are frequently referred to as simply "steps".
  • Because a single Batch Process Step executes a single Activity configuration, they are often referred to by their referenced Activity as well. For example, a "Recognize step".

Architecture Objects

In Grooper, "Architecture Objects" organize and oversee the infrastructure and framework of the Grooper repository. A "Grooper Repository" is a tree structure of nodes representing both configuration and content objects. These objects include the...

Root ...
Project ...
FileStore and ...
Machine objects ...

... each with distinct roles but also working in conjunction to manage resources and information flow within the repository.

The relationship among these "Architecture Objects" is foundational to the operation and scalability of Grooper's document processing capabilities.

  • The Root object provides a base structure.
  • The Project object defines the processing and design resources.
  • The Filestore offers a storage utility for files and content.
  • The Machine objects represent the hardware resources for performing processing tasks.

Together, they comprise the essential components that underpin the function and manageability of the Grooper ecosystem.

Related Objects

Root

The Grooper database Root node is the topmost element of the Grooper Repository. All other nodes in a Grooper Repository are its children/descendants. The Grooper Root also stores several settings that apply to the Grooper Repository, including the license serial number or license service URL and Repository Options.

Project

package_2 Projects are the primary containers for configuration nodes within Grooper. The Project is where various processing objects such as stacks Content Models, settings Batch Processes, profile objects are stored. This makes resources easier to manage, easier to save, and simplifies how node references are made in a Grooper Repository.

File Store

hard_drive File Store nodes are a key part of Grooper's "database and file store" architecture. They define a storage location where file content associated with Grooper nodes are saved. This allows processing tasks to create, store and manipulate content related to documents, images, and other "files".

  • Not every node in Grooper will have files associated with it, but if it does, those files are stored in the Windows folder location defined by the File Store node.

Machine

computer Machine nodes represent servers that have connected to the Grooper Repository. They are essential for distributing task processing loads across multiple servers. Grooper creates Machine nodes automatically whenever a server makes a new connection to a Grooper Repository's database. Once added, Machine nodes can be used to view server information and to manage Grooper Service instances.

Miscellaneous Objects

The following ojbects are related only in that they don't fit neatly into the groups defined above in this article.

(un)Related Objects

Control Sheet

Control Sheets in Grooper are special pages used to control various aspects of the document scanning process. Control Sheets can serve multiple functions such as:

  • separating and classifying documents
  • changing image settings dynamically
  • create a new folder with specific Content Types
  • trigger other actions that affect how documents are handled as they pass through the scanning equipment

Control sheets are pre-printed with barcodes or other markers that Grooper recognizes and uses to perform specific actions based on the presence of the sheet. For instance, when a control sheet instructs the creation of a new folder it can influence the hierarchy within a batch. This enables the management and organization of documents without manual intervention during the Scan activity.

Overall, Control Sheets are an intelligent way to guide the scanning workflow. Control Sheets can ensure that batches of documents are organized and processed according to predefined rules, thereby automating the structuring of scanned content into logical units within Grooper.

Data Rule

flowsheet Data Rules are used to normalize or otherwise prepare data collected in a data_table Data Model for downstream processes. Data Rules define data manipulation logic for data extracted from documents (folder Batch Folders) to ensure data conforms to expected formats or meets certain standards.

  • Each Data Rule executes a "Data Action" which do things like computing a field's value, parse a field into other fields, perform lookups, and more.
  • Data Actions can be conditionally executed based on a Data Rule's "Trigger" expression.
  • A hierarchy of Data Rules can be created to execute multiple Data Actions and perform complex data transformation tasks.
  • Data Rules can be applied by:
    • The Apply Rules activity (must be done after data is collected by the Extract activity)
    • The Extract activity (will run after the Data Model extraction)
    • The Convert Data activity when converting document to another Document Type
    • They can be applied manually in a Data Viewer with the "Run Rule" command.

The execution of a Data Rule takes place during the Apply Rules activity. Data Rules can be applied at different scopes such as each individual type of "Data Element". The rule can be set to execute conditionally based on a Trigger expression. If the Trigger evaluates to true, the Data Rule's True Action is applied, and if false, its False Action is executed. Data Rules can recursively apply logic to the hierarchy of data within a document instance, enabling complex data transformation and normalization operations that reflect the structure of the extracted data.

Overall, Data Rules in Grooper simplify extractors by separating the data normalization logic from the extraction logic, allowing for flexible and powerful post-extraction data processing .

Lexicon

dictionary Lexicons are dictionaries used throughout Grooper to store lists of words, phrases, weightings for Fuzzy RegEx, and more. Users can add entries to a Lexicon, Lexicons can import entries from other Lexicons by referencing them, and entries can be dynamically imported from a database using a database Data Connection. Lexicons are commonly used to aid in data extraction, with the "List Match" and "Word Match" extractors utilizing them most commonly.

Object Library

extension Object Library nodes are .NET libraries that contain code files for customizing the Grooper's functionality. These libraries are used for a range of customization and integration tasks, allowing users to extend Grooper's capabilities.

Examples include:
  • Adding custom Activities that execute within Batch Processes
  • Creating custom commands available during the Review activity and in the Design page.
  • Defining custom methods that can be called from code expressions on Data Field and Batch Process Step objects.
  • Creating custom Connection Types for CMIS Connections for import/export operations from/to CMS systems.
  • Establish custom Grooper Services that perform automated background tasks at regular intervals

Resource File

Resource Files are nodes you can add to a package_2 Project and store any kind of file. Each Resource File stores one file. While you can use Resource Files to store any kind of file in a Project, there are several areas in Grooper that can reference Resource Files to one end or another, including XML schema files used for Grooper's XML Schema Integration.