Glossary

From Grooper Wiki

Template:TOC limit

This glossary seeks to educate readers on various Grooper terms, objects and other entities. Glossary entries will be short paragraphs describing the topic. For each glossary entry, you will find links to a full article about the entry as well as articles on associated terms.

Each entry is organized according to what major Grooper entity they belong to. For example, "Classify" is an "Activity". It is found in the "Activity" section of the Glossary.

Application

Grooper is an intelligent document processing platform that uses an array of sophisticated techniques to automate end-to-end content capture and delivery. From a technical standpoint, Grooper consists of a Grooper Repository and the applications that the support management and execution of configuration assets.

  • A Grooper Repository consists of two things: (1) A series of tables in a SQL database (containing configuration nodes and their properties) and (2) a File Store (containing files associated to nodes in the database).

The Grooper applications are as follows:

  • Grooper - The primary program files for the Grooper platform. This application will need to be installed on any Grooper web server hosting the Grooper UI and processing servers running Activity Processing services to automate task processing.
  • Grooper Command Console - This is an administrative utility that gets installed with the Grooper application.
  • Grooper Web Client - This application installs the Grooper user interface. It will need to be installed on the Grooper web server. The Grooper web server hosts the Grooper web app which is accessed via a URL.
  • Grooper Desktop - This is a lightweight application required to scan documents using the Grooper web app. It runs in the background and helps operate the Scan Viewer in Grooper. It needs to be installed on any workstation connected to a document scanner.

Grooper Command Console

Grooper Command Console is a command-line interface that performs system configuration and administration tasks within Grooper.

Grooper Web Client

The Grooper user interface is accessed using a web browser from a URL. The Grooper Web Client is the application that installs the Grooper website on a web server.

Node Types

Grooper.GrooperNode

Nodes are the main configuration objects in Grooper. They are created and accessed in the Node Tree from the Design page. The different types of nodes ("Node Types") serve different functions in Grooper. For example, "Batch" nodes are the primary container for document content. They contain "Batch Folder" nodes which represent documents and "Batch Page" nodes which represent individual pages of documents.

AI Analyst

BE AWARE: AI Analysts are obsolete as of version 2025. See AI Assistant for the new and improved version of AI Analyst. An AI Analyst facilitates the ability to interact with a document as you might with an AI chatbot.

AI Assistant

Grooper.GPT.AIAssistant

AI Assistants are Grooper's conversational AI personas. They answer questions about resources it can access (including content from documents, databases and/or web services). This greatly increases an AI's ability to answer domain-specific questions that require access to these resources.

Batch Objects

Grooper.Core.BatchObject

Batch Objects are the foundational elements of Grooper's document processing system, providing a unified structure for organizing, processing, and reviewing document content within a inventory_2 Batch. Every item within a Batch—whether a document, folder, or page—is represented as a Batch Object (and Batches themselves are Batch Objects too).

Batch

Grooper.Core.Batch

inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

Batch Folder

Grooper.Core.BatchFolder

The folder Batch Folder is an organizational unit within a inventory_2 Batch, allowing for a structured approach to managing and processing a collection of documents. Batch Folder nodes serve two purposes in a Batch. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other Batch Folders and/or contract Batch Page nodes as children.

  • Batch Folders are frequently referred to simply as "documents" or "folders" depending on how they are used in the Batch.

Batch Page

Grooper.Core.BatchPage

contract Batch Page nodes represent individual pages within a inventory_2 Batch. Batch Pages are created in one of two ways: (1) When images are scanned into a Batch using the Scan Viewer. (2) Or, when split from a PDF or TIFF file using the Split Pages activity.

  • Batch Pages are frequently referred to simply as "pages".

Batch Process

Grooper.Core.BatchProcess

settings Batch Process nodes are crucial components in Grooper's architecture. A Batch Process is the step-by-step processing instructions given to a inventory_2 Batch. Each step is comprised of a "Code Activity" or a Review activity. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Processes by themselves do nothing. Instead, they execute edit_document Batch Process Steps which are added as children nodes.
  • A Batch Process is often referred to as simply a "process".

Batch Process Step

Grooper.Core.BatchProcessStep

edit_document Batch Process Steps are specific actions within a settings Batch Process sequence. Each Batch Process Step performs an "Activity" specific to some document processing task. These Activities will either be a "Code Activity" or "Review" activities. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Process Steps are frequently referred to as simply "steps".
  • Because a single Batch Process Step executes a single Activity configuration, they are often referred to by their referenced Activity as well. For example, a "Recognize step".

CMIS Connection

Grooper.CMIS.CmisConnection

cloud CMIS Connections provide a standardized way of connecting to various content management systems (CMS). CMIS Connections allow Grooper to communicate with multiple external storage platforms, enabling access to documents and document metadata that reside outside of Grooper's immediate environment.

  • For those that support the CMIS standard, the CMIS Connection connects to the CMS using the CMIS standard.
  • For those that do not, the CMIS Connection normalizes connection and transfer protocol as if they were a CMIS platform.

CMIS Repository

Grooper.CMIS.CmisRepository

settings_system_daydream CMIS Repository nodes provide document access in external storage platforms through a cloud CMIS Connection. With a CMIS Repository, users can manage and interact with those documents within Grooper. They are used primarily for import using Import Descendants and Import Query Results and for export using CMIS Export.

  • CMIS Repositories are create as a child node of a CMIS Connection using the "Import Repository" command.

Content Types

Grooper.Core.ContentType

Content Types are a class of node types used used to classify folder Batch Folders. They represent categories of documents (stacks Content Models and collections_bookmark Content Categories) or distinct types of documents (description Document Types). Content Types serve an important role in defining Data Elements and Behaviors that apply to a document.

Content Model

Grooper.Core.ContentType

stacks Content Model nodes define a classification taxonomy for document sets in Grooper. This taxonomy is defined by the collections_bookmark Content Categories and description Document Types they contain. Content Models serve as the root of a Content Type hierarchy, which defines Data Element inheritance and Behavior inheritance. Content Models are crucial for organizing documents for data extraction and more.

Content Category

Grooper.Core.ContentCategory

collections_bookmark A Content Category is a container for other Content Category or description Document Type nodes in a stacks Content Model. Content Categories are often used simply as organizational buckets for Content Models with large numbers of Document Types. However, Content Categories are also necessary to create branches in a Content Model's classification taxonomy, allowing for more complex Data Element inheritance and Behavior inheritance.

Document Type

Grooper.Core.DocumentType

description Document Type nodes represent a distinct type of document, such as an invoice or a contract. Document Types are created as child nodes of a stacks Content Model or a collections_bookmark Content Category. They serve three primary purposes:

  1. They are used to classify documents. Documents are considered "classified" when the folder Batch Folder is assigned a Content Type (most typically, a Document Type).
  2. The Document Type's data_table Data Model defines the Data Elements extracted by the Extract activity (including any Data Elements inherited from parent Content Types).
  3. The Document Type defines all "Behaviors" that apply (whether from the Document Type's Behavior settings or those inherited from a parent Content Type).

Form Type

Grooper.Core.FormType

two_pager Form Types represent trained variations of a description Document Type. These nodes store machine learning training data for Lexical and Visual document classification methods.

Page Type

Grooper.Core.PageType

article Page Types represent individual pages of a two_pager Form Type. These nodes store page-level machine learning training data for Lexical and Visual document classification methods. Page Types are used by ESP Auto Separation to make document separation decisions based on page classification.

Control Sheet

Grooper.Capture.ControlSheet

document_scanner Control Sheets are printable pages used to automate document separation at scan time. Control Sheets are placed before each new document before loading pages into the scanner. Then, when pages are scanned using the Scan Viewer and Control Sheet Separation is executed, a new folder Batch Folder is created for every Control Sheet scanned. Control Sheets can also be configured to assign the Batch Folder a description Document Type, thus classifying the document at scan time as well.

Data Connection

Grooper.Core.DataConnection

database Data Connections connect Grooper to Microsoft SQL and supported ODBC databases. Once configured, Data Connections can be used to export data extracted from a document to a database, perform database lookups to validate data Grooper collects and other actions related to database management systems (DBMS).

  • Grooper supports MS SQL Server connectivity with the "SQL Server" connection method.
  • Grooper supports Oracle, PostgreSQL, Db2, and MySQL connectivity with the "ODBC" connection method.

Data Elements

Grooper.Core.DataElement

Data Elements are a class of node types used to collect data from a document. These include: data_table Data Models, insert_page_break Data Sections, variables Data Fields, table Data Tables, and view_column Data Columns.

Data Model

Grooper.Core.DataModel

data_table Data Models are leveraged during the Extract activity to collect data from documents (folder Batch Folders). Data Models are the root of a Data Element hierarchy. The Data Model and its child Data Elements define a schema for data present on a document. The Data Model's configuration (and its child Data Elements' configuration) define data extraction logic and settings for how data is reviewed in a Data Viewer.

Data Field

Grooper.Core.DataField

variables Data Fields represent a single value targeted for data extraction on a document. Data Fields are created as child nodes of a data_table Data Model and/or insert_page_break Data Sections.

  • Data Fields are frequently referred to simply as "fields".

Data Section

Grooper.Core.DataSection

A insert_page_break Data Section is a container for Data Elements in a data_table Data Model. variables They can contain Data Fields, table Data Tables, and even Data Sections as child nodes and add hierarchy to a Data Model. They serve two main purposes:

  1. They can simply act as organizational buckets for Data Elements in larger Data Models.
  2. By configuring its "Extract Method", a Data Section can subdivide larger and more complex documents into smaller parts to assist in extraction.
    • "Single Instance" sections define a division (or "record") that appears only once on a document.
    • "Multi-Instance" sections define collection of repeating divisions (or "records").

Data Table

Grooper.Core.DataTable

A table Data Table is a Data Element specialized in extracting tabular data from documents (i.e. data formatted in rows and columns).

  • The Data Table itself defines the "Table Extract Method". This is configured to determine the logic used to locate and return the table's rows.
  • The table's columns are defined by adding view_column Data Column nodes to the Data Table (as its children).

Data Column

Grooper.Core.DataColumn

view_column Data Columns represent columns in a table extracted from a document. They are added as child nodes of a table Data Table. They define the type of data each column holds along with its data extraction properties.

  • Data Columns are frequently referred to simply as "columns".
  • In the context of reviewing data in a Data Viewer, a single Data Column instance in a single Data Table row, is most frequently called a "cell".

Data Field Container and Data Element Container

Grooper.Core.DataFieldContainer
Grooper.Core.DataElementContainer

Data Field Container and Data Element Container are two base types in Grooper from which "container" Data Elements are derived. Container Data Elements (data_table Data Models, insert_page_break, Data Sections table Data Tables) serve an important function in organizing and defining behavior and extraction logic for the variables Data Fields and view_column Data Columns they contain.

  • While "Data Field Container" and "Data Element Container" are distinct classes in the Grooper Object Model, they are closely related. While Grooper scripters/experts should know the difference, for most practical purposes, the terms are used interchangeably (or they're just called "containers" or "container elements"). See Object Model info for more.

Data Rule

Grooper.Core.DataRule

flowsheet Data Rules are used to normalize or otherwise prepare data collected in a data_table Data Model for downstream processes. Data Rules define data manipulation logic for data extracted from documents (folder Batch Folders) to ensure data conforms to expected formats or meets certain standards.

  • Each Data Rule executes a "Data Action" which do things like computing a field's value, parse a field into other fields, perform lookups, and more.
  • Data Actions can be conditionally executed based on a Data Rule's "Trigger" expression.
  • A hierarchy of Data Rules can be created to execute multiple Data Actions and perform complex data transformation tasks.
  • Data Rules can be applied by:
    • The Apply Rules activity (must be done after data is collected by the Extract activity)
    • The Extract activity (will run after the Data Model extraction)
    • The Convert Data activity when converting document to another Document Type
    • They can be applied manually in a Data Viewer with the "Run Rule" command.

Extractor Nodes

Grooper.Core.ExtractorNode

Data Type

Grooper.Extract.DataType

pin Data Types are nodes used to extract text data from a document. Data Types have more capabilities than quick_reference_all Value Readers. Data Types can collect results from multiple extractor sources, including a locally defined extractor, child extractor nodes, and referenced extractor nodes. Data Types can also collate results using Collation Providers to combine, sift and manipulate results further.

Value Reader

Grooper.Extract.ValueReader

quick_reference_all Value Reader nodes define a single data extraction operation. For each Value Reader, its Extractor Type determines the logic for returning data from a text-based document or page. For example, you would use the Pattern Match Extractor Type to return data using regular expression. Value Extractors are "primitive" extractors that can be used on their own or in conjunction with Data Types for more complex data extraction and collation.

Field Class

Grooper.Extract.FieldClass

input Field Classes are NLP (natural language processing) based extractor nodes. They find values based on some natural language context near that value. Values are positively or negatively associated with text-based "features" nearby by training the extractor. During extraction, the extractor collects values based on these training weightings.

  • Field Classes are most useful when attempting to find values within the flow of natural language.
  • Field Classes can be configured to distinguish values within highly structured documents, but this type of extraction is better suited to simpler "extractor nodes" like quick_reference_all Value Readers or pin Data Types.
  • Advances in large-language models (LLMs) have largely made Field Classes obsolete. LLM-based extraction methods in Grooper (such as AI Extract) can achieve similar results with nowhere near the amount of set up.

File Store

Grooper.FileStore

hard_drive File Store nodes are a key part of Grooper's "database and file store" architecture. They define a storage location where file content associated with Grooper nodes are saved. This allows processing tasks to create, store and manipulate content related to documents, images, and other "files".

  • Not every node in Grooper will have files associated with it, but if it does, those files are stored in the Windows folder location defined by the File Store node.

Folder

Grooper.Folder

Batches Folder

Grooper.Core.BatchesFolder

Projects Folder

Grooper.ProjectsFolder

Machines Folder

Grooper.MachinesFolder

Local Resources Folder

Grooper.Core.LocalResourcesFolder

IP Elements

Grooper.IP.IpElement

IP Group

Grooper.IP.IpGroup

gallery_thumbnail IP Groups are containers of image IP Steps and/or IP Groups that can be added to perm_media IP Profiles. IP Groups add hierarchy to IP Profiles. They serve two primary purposes:

  1. They can be used simply to organize IP Steps for IP Profiles with large numbers of steps.
  2. They are often used with "Should Execute Expressions" and "Next Step Expressions" to conditionality execute a sequence of IP Steps.

IP Profile

Grooper.IP.IpProfile

perm_media IP Profiles are a step-by-step list of image processing operations (IP Commands). They are used for several image processing related operations, but primarily for:

  1. Permanently enhancing an image during the Image Processing activity (usually to get rid of defects in a scanned image, such as skewing or borders).
  2. Cleaning up an image in-memory during the Recognize activity without altering the image to improve OCR accuracy.
  3. Computer vision operations that collect layout data (table line locations, OMR checkboxes, barcode value and more) utilized in data extraction.

IP Step

Grooper.IP.IpStep

image IP Steps are the basic units of an perm_media IP Profile. They define a single image processing operation, called an IP Command in Grooper.

Lexicon

Grooper.Core.Lexicon

dictionary Lexicons are dictionaries used throughout Grooper to store lists of words, phrases, weightings for Fuzzy RegEx, and more. Users can add entries to a Lexicon, Lexicons can import entries from other Lexicons by referencing them, and entries can be dynamically imported from a database using a database Data Connection. Lexicons are commonly used to aid in data extraction, with the "List Match" and "Word Match" extractors utilizing them most commonly.

Machine

Grooper.Machine

computer Machine nodes represent servers that have connected to the Grooper Repository. They are essential for distributing task processing loads across multiple servers. Grooper creates Machine nodes automatically whenever a server makes a new connection to a Grooper Repository's database. Once added, Machine nodes can be used to view server information and to manage Grooper Service instances.

OCR Profile

Grooper.OCR.OcrProfile

library_books OCR Profiles store configuration settings for optical character recognition (OCR). They are used by the Recognize activity to convert images of text on contract Batch Pages into machine-encoded text. OCR Profiles are highly configurable, allowing fine-grained control over how OCR occurs, how pre-OCR image cleanup occurs, and how Grooper's OCR Synthesis occurs. All this works to the end goal of highly accurate OCR text data, which is used to classify documents, extract data and more.

Object Library

Grooper.ObjectLibrary

extension Object Library nodes are .NET libraries that contain code files for customizing the Grooper's functionality. These libraries are used for a range of customization and integration tasks, allowing users to extend Grooper's capabilities.

Examples include:
  • Adding custom Activities that execute within Batch Processes
  • Creating custom commands available during the Review activity and in the Design page.
  • Defining custom methods that can be called from code expressions on Data Field and Batch Process Step objects.
  • Creating custom Connection Types for CMIS Connections for import/export operations from/to CMS systems.
  • Establish custom Grooper Services that perform automated background tasks at regular intervals

Project

Grooper.Project

package_2 Projects are the primary containers for configuration nodes within Grooper. The Project is where various processing objects such as stacks Content Models, settings Batch Processes, profile objects are stored. This makes resources easier to manage, easier to save, and simplifies how node references are made in a Grooper Repository.

Resource File

Grooper.ResourceFile

Resource Files are nodes you can add to a package_2 Project and store any kind of file. Each Resource File stores one file. While you can use Resource Files to store any kind of file in a Project, there are several areas in Grooper that can reference Resource Files to one end or another, including XML schema files used for Grooper's XML Schema Integration.

Root

Grooper.GrooperRoot

The Grooper database Root node is the topmost element of the Grooper Repository. All other nodes in a Grooper Repository are its children/descendants. The Grooper Root also stores several settings that apply to the Grooper Repository, including the license serial number or license service URL and Repository Options.

Scanner Profile

Grooper.Capture.ScannerProfile

scanner Scanner Profiles store configuration settings for operating a document scanner. Scanner Profiles provide users operating the Scan Viewer in the Review activity a quick way to select pre-saved scanner configurations.

Separation Profile

Grooper.Capture.SeparationProfile insert_page_break Separation Profiles store settings that determine how contract Batch Pages are separated into folder Batch Folders. Separation Profiles can be referenced in two ways:

  • In a Review activity's Scan Viewer settings to control how pages are separated in real time during scanning.
  • In a Separate activity as an alternative to configuring separation settings locally.

Work Queue

Grooper.Core.WorkQueue

Processing Queue

Grooper.Core.ThreadPool

memory Processing Queues help automate "machine performed tasks" (Those are Code Activity tasks performed by computer Machines and their Activity Processing services). Processing Queues are assigned to Batch Process Steps to distribute tasks, control the maximum processing rate, and set the "concurrency mode" (specifying if and how parallelism can occur across one or more servers).

  • Processing Queues are used to dedicate Activity Processing services with a capped number of processing threads to resource intensive activities, such as Recognize. That way, these compute hungry tasks won't gobble up all available system resources.
  • Processing Queues are also used to manage activities, such as Render, who can only have one activity instance running per machine (This is done by changing the queue's Concurrency Mode from "Maximum" to "Per Machine").
  • Processing Queues are also used to throttle Export tasks in scenarios where the export destination can only accept one document at a time.

Review Queue

Grooper.Core.ReviewQueue

person_play Review Queues help organize and filter human-performed Review activity tasks. User groups are assigned to each Review Queue, which is then set either on a settings Batch Process or a Review step. Based on a user's membership in Review Queues, this will affect how inventory_2 Batches are distributed in the Batches page and how Review tasks are distributed in the Tasks page.

Core Configuration Types

In Grooper, nodes are configured by editing their property settings. The following are configurable items that are considered a "core" part of Grooper. They are designed to be part of a larger configuration.

  • Most commonly, these are found in the property settings on a node in the Grooper node tree.
  • However, they are also configured when configuring commands or even as part of a larger property configuration.


  • Scripting/Advanced user info: These objects inherit from a base class called "Embedded Object". This is includes a large number of objects that exist as configurable properties, including the major items listed below (and less commonly configured objects too).

Activity

Grooper.Core.BatchProcessingActivity

Grooper Activities define specific document processing operations done to a inventory_2 Batch, folder Batch Folder, or contract Batch Page. In a settings Batch Process, each edit_document Batch Process Step executes a single Activity (determined by the step's "Activity" property).

  • Batch Process Steps are frequently referred by the name of their configured Activity followed by the word "step". For example: "Classify step".

Attended Activities

Grooper.Core.AttendedActivity

Attended Activities are type of Activity in Grooper that require direct user interaction within a settings Batch Process workflow. Attended Activities are designed for steps where human review, validation or intervention is necessary (or automated processing is simply insufficient). The only current Attended Activity in Grooper is person_search Review.

Review =

Grooper.Activities.Review

person_search Review is an Activity that allows user attended review of Grooper's results. This allows human operators to validate processed contract Batch Page and folder Batch Folder content using specialized user interfaces called "Viewers". Different kinds of Viewers assist users in reviewing Grooper's image processing, document classification, data extraction and operating document scanners.

Code Activities

Grooper.Core.CodeActivity

AI Dialogue

BE AWARE: AI Analysts and AI Dialogue are obsolete as of version 2025. This Activity only exists in version 2024. network_intelligence_update AI Dialogue is an Activity that executes a scripted conversation with an psychology AI Analyst and saves the resulting conversion on the document as a JSON file.

Apply Rules

Grooper.Activities.ApplyRules

flowsheet Apply Rules is an Activity that runs flowsheet Data Rules on data that has previously been extracted from documents (folder Batch Folders).

  • The Apply Rules activity will always need to run after an Extract activity runs (An Extract step must come before an Apply Rules step in the order of edit_document Batch Process Steps in a settings Batch Process).

Attach

Grooper.GPT.Attach

file_present Attach is an Activity that physically moves and nests documents within a folder Batch Folder based on attachment markers set by the attach_file_add Mark Attachments activity. It consolidates related documents—such as addenda or supporting documents—under their host documents, updating the inventory_2 Batch hierarchy for downstream processing.

Batch Transfer

Grooper.Activities.BatchTransfer

Template:BatchTransferIcon Batch Transfer is an Activity that

Burst Book

Grooper.Microform.BurstBook

auto_stories Burst Book is an Activity that

Classify

Grooper.Activities.ClassifyFolders

unknown_document Classify is an Activity that "classifies" folder Batch Folders in a inventory_2 Batch by assigning them a description Document Type.

  • Classification is key to Grooper's document processing. It affects how data is extracted from a document (during the Extract activity) and how Behaviors are applied.
  • Classification logic is controlled by a Content Model's "Classify Method". These methods include using text patterns, previously trained document examples, and Label Sets to identify documents.

Clip Frames

view_module Clip Frames is a specialized Activity for processing microfiche in Grooper. It extracts defined areas from microfiche card images, creating new image frames or layers for focused analysis or processing.

Convert Data

switch_access_2 Convert Data is an Activity that converts a document (folder Batch Folder) to another description Document Type using Data Actions to copy and convert Data Elements from the source Document Type to those in the target Document Type. Convert Data is a specialized Activity for use cases requiring a great deal of data transformation before export.

Correct

abc Correct is an Activity that performs spell correction. It can correct a folder Batch Folder's text content or specific Data Element values to resolve OCR errors, deidentify data or otherwise enhance text data.

Deduplicate

Template:DeduplicateIcon Deduplicate is an Activity that

Detect Frames

view_module Detect Frames is a specialized Activity for processing microfiche in Grooper. It locates and identifies frame lines on microfiche card images, enabling the isolation of areas within the frames for further data extraction or processing.

Detect Language

Grooper.GPT.DetectLanguage

travel_explore Detect Language is an Activity that uses a large language model (LLM) to determine the primary language (English, Spanish, French, etc.) of a document. Activities executed downstream, such as export_notes Extract, can use this information to apply language specific logic.

Execute

tv_options_edit_channels Execute is an Activity that runs one or more specified object commands. This gives access to a variety of Grooper commands in a settings Batch Process for which there is no Activity, such as the "Sort Children" command for Batch Folders or the "Expand Attachments" command for email attachments.

Export

output Export is an Activity that transfers documents and extracted information to external file systems and content management systems, completing the data processing workflow.

Extract

export_notes Extract is an Activity that retrieves information from folder Batch Folder documents, as defined by Data Elements in a data_table Data Model. This is how Grooper locates unstructured data on your documents and collects it in a structured, usable format.

Image Processing

wallpaper Image Processing is an Activity that enhances contract Batch Page images and optimizes them for better OCR text recognition and data extraction results.

Initialize Card

view_module Initialize Card is a specialized Activity for processing microfiche in Grooper. It prepares and configures microfiche card images for further processing.

Launch Process

Template:LaunchProcessIcon Launch Process is an Activity that

Mark Attachments

Grooper.GPT.MarkAttachments

attach_file_add Mark Attachments is an Activity that analyzes documents (folder Batch Folders) to determine attachment relationships using configurable rules ("Attachment Rules"). It sets attachment markers on documents—indicating whether they should be attached to neighboring Batch Folders. These markers are then used by the Attach activity to group and nest related documents.

Merge

file_save Merge is an Activity that creates a PDF, TIF, XML or ZIP file from the page and data content of a Batch Folder and saves it to that Batch Folder.

Recognize

format_letter_spacing_wide Recognize is an Activity that obtains machine-readable text from contract Batch Pages and folder Batch Folders. When properly configured with an library_booksOCR Profile, Recognize will selectively perform OCR for images and native-text extraction for digital text in PDFs. Recognize can also reference an perm_mediaIP Profile to collect "layout data" like lines, checkboxes, and barcodes. Other Activities then use this machine-readable text and layout data for document analysis and data extraction.

Redact

format_ink_highlighter Redact is an Activity that visibly obscures (or "redacts") text information on an page based on results returned from a extractor. Be aware, Redact does not alter the text data. It only alters the image.

Remove Level

account_tree Remove Level is an Activity that

Render

print Render is an Activity that converts files of various formats to PDF. It does this by digitally printing the file to PDF using the Grooper Render Printer. This normalizes electronic document content from file formats Grooper cannot read natively to PDF (which it can read natively), allowing Grooper to extract the text via the format_letter_spacing_wide Recognize Activity.

Route

alt_route Route is an Activity that

Send Mail

forward_to_inbox Send Mail is an Activity automates email notifications from Grooper based on events and conditions set by a settings Batch Process. Optionally, documents in the inventory_2 Batch may be attached to the generated email.

Separate

insert_page_break Separate is an Activity that sorts contract Batch Pages into individual folder Batch Folders. This distinguishes "loose pages" from the documents formed by those pages. Once loose pages are separated into Batch Folder documents, they can be further processed by unknown_document Classify, export_notes Extract, output Export and other Activities that need to run on the folder (i.e. document) level.

Spawn Batch

inventory_2 Spawn Batch is an Activity that

Split Pages

Multi-page PDF and TIF files come into Grooper as files attached to single folder Batch Folders. Split Pages is an Activity that creates child contract Batch Pages for each page in the PDF or TIF. This allows Grooper to process and handle these pages as individual objects.

Split Text

receipt Split Text is an Activity that

Text Transform

insert_text Text Transform is an Activity that

Train Lexicon

book_2 Train Lexicon is an Activity that

Translate

translate Translate is an Activity that

XML Transform

code_blocks XML Transform is an Activity that applies XSLT stylesheets to XML data to modify or reformat the output structure for various purposes.

Behavior

A "Behavior" is one of several features applied to a Content Type (such as a description Document Type). Behaviors affect how certain Activities and Commands are executed, based how a document (folder Batch Folder) is classified. They behave differently, according to their Document Type. This includes how they are exported (how Export behaves), if and how they are added to a document search index (how the various indexing commands behave), and if and how Label Sets are used (how Classify and Extract behave in the presence of Label Sets).

  • Each Behavior is enabled by adding it to a Content Type. They are configured in the Behaviors editor.
  • Behaviors extend to descendent Content Types, if the descendent Content Types has no Behavior configuration of its own.
    • For example, all Document Types will inherit their parent Content Model's Behaviors.
    • However, if a Document Type has its own Behavior configuration, it will be used instead.

Export Behavior

An Export Behavior defines the parameters for exporting classified folder Batch Folder content from Grooper to other systems. This includes where they are exported to (what content management system, file system, database etc), what content is exported (attached files, images, and/or data), how it is formatted (PDF, CSV, XML etc), folder pathing, file naming and data mappings (for Data Export and CMIS Export).

Import Behavior

An Import Behavior defines how data is mapped from files in an external content management system to Batch Folders created on import when using CMIS Import.

Indexing Behavior

An Indexing Behavior allows documents (folder Batch Folders) to be indexed via AI Search. Once indexed, users can search for and retrieve documents from the Search Page.

Labeling Behavior

A Labeling Behavior extends "label set" functionality to description Document Types. This allows you to collect field labels and other labels present on a document and use them in a variety of ways. This includes functionality for classification, field extraction, table extraction, and section extraction.

PDF Data Mapping

PDF Data Mapping is a Behavior that enhances PDF files generated by the Merge or Export activities with metadata, bookmarks, annotations and/or different kinds of widgets.

Text Rendering

Text Rendering is a Behavior that causes text documents (e.g. TXT files) to be interpreted and displayed as paginated documents rather than a raw text stream.

  • By default, this renders TXT files to an 8.5 by 11 inch page format, but this can be altered in the Text Rendering settings.

Classify Method

"Classify Methods" define classification logic used by stacks Content Models during the unknown_document Classify activity. Classify Methods organize document content in Grooper by assigning folder Batch Folders a description Document Type.

  • Classify Methods analyze documents (Batch Folders) to determine what kind of document it is.
  • Each Classify Methods analyzes documents according to different methodologies to organize documents accurately. This includes text-based pattern matching, computer vision, machine learning models, label sets and more.
  • Classify Methods are configured by setting and configuring a Content Model's "Classification Method" property.

GPT Embeddings

BE AWARE: GPT Embeddings is obsolete as of version 2025. The LLM Classifier and Search Classifier methods are the new and improved AI-enabled classification methods. GPT Embeddings is a Classify Method that uses an OpenAI embeddings model and trained document samples to tell one document from another.

Labelset-Based

"Labelset-Based" is a Classify Method that leverages the labels defined via a Labeling Behavior to classify folder Batch Folders.

Lexical

"Lexical" is a Classify Method that classifies folder Batch Folders based on the text content of trained document examples. This is achieved through the statistical analysis of word frequencies that identify description Document Types.

LLM Classifier

"LLM Classifier" is a Classify Method that classifies documents (folder Batch Folders) by asking a large language model (LLM) to select its description Document Type from a list.

Rules-Based

"Rules-Based" is a Classify Method that employs "rules" defined on each description Document Type to classify folder Batch Folders. Positive Extractor and Negative Extractor properties are configured for each Document Type to positively or negatively associate a Batch Folder based on predefined criteria.

  • Where the Positive and Negative Extractors will impact all Classify Method results, the Rules-Based method classifies using only these properties and nothing else.

Search Classifier

"Search Classifier" is a Classify Method that classifies documents (folder Batch Folders) by finding similar documents in a document search index. The Search Classifier method uses an embeddings model and vector similarity to give an unclassified document the same description Document Type as its closest match in the search index.

Visual

"Visual" is a Classify Method that uses image analysis instead of text data to determine the description Document Type assigned to a folder Batch Folder during classification. Instead of using text-based extractors, an "Extract Features" IP Command in an perm_media IP Profile is used to collect image-based data from a Batch Folder's image(s). This image-based data is compared against that of previously trained document examples of each Document Type to classify the Batch Folder.

Collation Provider

The Collation property of a pin Data Type defines the method for converting its raw results into a final result set. It is configured by selecting a Collation Provider. The Collation Provider governs how initial matches from the Data Type's extractor(s) are combined and interpreted to produce the Data Type's final output.

AND

AND is a Collation Provider option for pin Data Type extractors. AND returns results only when each of its referenced or child extractors gets at least one hit, thus acting as a logical “AND” operator across multiple extractors.

Array

Array is a Collation Provider option for pin Data Type extractors. Array matches a list of values arranged in horizontal, vertical, or text-flow order, combining instances that qualify into a single result.

Combine

Combine is a Collation Provider option for pin Data Type extractors. Combine combines instances from returned results based on a specified grouping, controlling how extractor results are assembled together for output.

Key-Value List

Key-Value List is a Collation Provider option for pin Data Type extractors. Key-Value List matches instances where a key and a list of one or more values appear together on the document, adhering to a specific layout pattern.

Key-Value Pair

Key-Value Pair is a Collation Provider option for pin Data Type extractors. Key-Value Pair matches instances where a key is paired with a value on the document in a specific layout. Note: Key-Value Pair is an older technique in Grooper. In most cases, the Labeled Value extractor type is preferable to Key-Value Pair collation.

Multi-Column

Multi-Column is a Collation Provider option for pin Data Type extractors. Multi-Column combines multiple columns on a page into a single column for extraction.

Ordered Array

Ordered Array is a Collation Provider option for pin Data Type extractors. Ordered Array finds sequences of values where one result is present for each extractor, in the order they appear, according to a specified horizontal, vertical or text-flow layout.

Pattern-Based

Pattern-Based is a Collation Provider option for pin Data Type extractors. Pattern-Based uses regular expressions to sequence returned results into a final result set.

Split

Split is a Collation Provider option for pin Data Type extractors. Split separates a data instance at each match returned by the Data Type. The results are used as anchor points to "split" text into one or more smaller parts.

IP Command

IP Commands specify an image processing (IP) operation (such as image cleanup, format conversion or feature detection) and are used to construct image IP Steps in an IP Profile. IP Commands are configured using an IP Step's Command property.

Barcode Detection

Barcode Detection is an IP Command that detects and reads barcode data. The detected barcode information is stored as part of the page's layout data.

Barcode Removal

Barcode Removal is an IP Command that detects, reads and digitally removes barcodes from an image. The detected barcode information is stored as part of the page's layout data.

Binarize

Binarize is an IP Command that converts a color or grayscale image to a bi-tonal (black and white) image using various thresholding methods.

Box Detection

Box Detection is an IP Command that detects checkboxes and determines their check state (checked or unchecked). The detected checkbox information is stored as part of the page's layout data.

Box Removal

Box Removal is an IP Command that detects checkboxes, determines their check state (checked or unchecked) and digitally removes them from an image. The detected checkbox information is stored as part of the page's layout data.

Extract Page

Extract Page is an IP Command that removes an image from a carrier image while simultaneously removing any image warping or skewing.

Line Detection

Line Detection is an IP Command that locates horizontal and vertical lines on documents. The detected line locations are stored as part of page's layout data.

Line Removal

Line Removal is an IP Command that locates and removes horizontal and vertical lines from documents. The detected line locations are stored as part of page's layout data.

Scratch Removal

Scratch Removal is an IP Command detects and removes or repairs scratches from film-based images.

Shape Detection

Shape Detection is an IP Command that locates shapes on a document that match one or more sample images. Common shapes targeted by this command are stamps, seals, logos or other graphical marks that can serve as triggers for document separation or anchors for data extraction. Shapes The detected shapes' locations are stored as part of page's layout data.

Shape Removal

Shape Removal is an IP Command detects and removes shapes from documents. Common shapes targeted by this command are stamps, seals, logos or other graphical marks that interfere with OCR and/or can serve as triggers for document separation or anchors for data extraction. The detected shapes' locations are stored as part of page's layout data.

Lookup Specification

A Lookup Specification defines a "lookup operation", where existing Grooper fields (called "lookup fields") are used to query an external data source, such as a database. The results of the lookup can be used to validate or populate field values (called "target fields") in Grooper. Lookup Specifications are created on "container elements" (data_table Data Models, insert_page_break Data Sections and table Data Tables) using their Lookups property. Lookups may query using all single-instance fields relative to the container element (including those defined on parent elements up to the root Data Model), but cannot be used to populate a field value on a parent of the container element.

CMIS Lookup

CMIS Lookup is a Lookup Specification that performs a lookup against a settings_system_daydream CMIS Repository via a "CMISQL query" (a specialized query language based on SQL database queries).

Database Lookup

Database Lookup is a Lookup Specification that performs a lookup against a database Data Connection via a SQL query.

GPT Lookup

PLEASE NOTE: GPT Lookup is obsolete as of version 2025. Much of its functionality was replaced by newer and better LLM-based extraction methods, such as AI Extract. If absolutely necessary, its functionality could also be replicated with a Web Service Lookup implementation. GPT Lookup is a Lookup Specification that performs a lookup using an OpenAI GPT model.

Web Service Lookup

Web Service Lookup is a Lookup Specification that looks up external data at an API endpoint by calling a web service.

XML Lookup

XML Lookup is a Lookup Specification that performs a lookup against an XML file stored as a draft Resource File in the package_2 Project. XML Lookups use XPath expressions to select XML nodes and map XML attributes or an XML element's text to Grooper fields.

OCR Engine

An "OCR engine" is the part of OCR software that recognizes text from images. OCR engines analyze the image's pixels to determine where text is on the page and what each character is. In Grooper, OCR engines are selected when configuring an OCR Profile's OCR Engine property.

Azure OCR

Azure OCR is an OCR Engine option for OCR Profiles that utilizes Microsoft Azure's Read API. Azure's Read engine is an AI-based text recognition software that uses a convolutional neural network (CNN) to recognize text. Compared to traditional OCR engines, it yields superior results, especially for handwritten text and poor quality images. Furthermore, Grooper supplements Azure's results with those from a traditional OCR engine in areas where traditional OCR is better than the Read engine.

Section Extract Method

The Extract Method property of a insert_page_break Data Section defines a "Section Extract Method" which specifies how section instances will be identified and extracted.

Clause Detection

Clause Detection is a insert_page_break Data Section Extract Method. It leverages LLM text embedding models to compare supplied samples of text against the text of a document to return what the AI determines is the "chunk" of text that most closely resembles the supplied samples.

Nested Table

Nested Table is a insert_page_break Data Section Extract Method. This method divides a document into sections by extracting table data within those sections. This gives Grooper users a method for extracting hierarchical tables as well as dividing up a document into sections where each of those sections have the same table (or at least tabular data which can be extracted by a single table Data Table object).

Transaction Detection

Transaction Detection is a insert_page_break Data Section Extract Method. This extraction method produces section instances by detecting repeating patterns of text around the Data Section's child variables Data Fields.

Separation Provider

The Provider property of the Separate Activity defines the type of separation to be performed at the designated Scope.

Change in Value Separation

The Change in Value Separation Separation Provider creates a new folder and separates every time an extracted value changes from one contract Batch Page to another.

Control Sheet Separation

Control Sheet Separation is a Separation Provider that uses Grooper document_scanner Control Sheets to separate documents.

EPI Separation

The EPI Separation Separation Provider uses embedded page information ("EPI") to Separate loose pages into document folders. A Data Extractor is used to find page numbers from the text on a page and Grooper uses this information to separate the pages.

ESP Auto Separation

ESP Auto Separation is a Separation Provider used for document separation. It is unique in that it both separates and classifies documents at the same time. It uses page-level classification training examples (among other things) to determine where to insert document folders in a inventory_2 Batch.

Event-Based Separation

Event-Based Separation is a Separation Provider that Separates documents using one or more "Separation Events". Each Separation Event triggers the creation of a new folder.

Multi Separator

The Multi Separator Separation Provider performs separation using multiple Separation Providers. It allows users to create a list of any of the other Separation Providers. If the first provider on the list fails to separate a page (or, as more often is the case, a series of pages), the next one will be applied. If that fails, the next, and so on.

Pattern-Based Separation

Pattern-Based Separation is a Separation Provider that creates a new document folder every time a value returned by a defined pattern is encountered on a page.

Undo Separation

Undo Separation is a Separation Provider. Instead of putting loose contract Batch Pages into folder Batch Folders, this Separation Provider removes Batch Folders, leaving only loose pages.

Service

Grooper.ServiceInstance

Grooper Services are various executable programs that run as a Windows Service to facilitate Grooper processing. Service instances are installed, configured, started and stopped using Grooper Command Console (or in older Grooper versions, Grooper Config).

Activity Processing

Grooper.Services.ActivityProcessing

Activity Processing is a Grooper Service that executes Activities assigned to edit_document Batch Process Steps in a settings Batch Process. This allows Grooper to automate Batch Steps that do not require a human operator.

API Services

Grooper.Services.ApiServices

You can perform inventory_2 Batch processing via REST API web calls by installing API Services.

  • As of version 2025, the Grooper Web Services (GWS) web app hosts additional API endpoints. Some of these endpoints overlap with the API Services endpoints. Refer to the GWS documentation for more information on its endpoint offerings. You can locate the GWS documentation for your Grooper install at https://{webserver-name-or-domain-name}/GWS

Grooper Licensing

Grooper.Services.LicenseService

Grooper Licensing is a Grooper Service that distributes licenses to multiple workstations running Grooper applications.

Import Watcher

Grooper.Services.ImportWatcher

An Import Watcher is a Grooper Service that schedules and runs Import Jobs. It uses an Import Provider to query files in a file system or content management system that meet specified criteria according to a defined schedule (every minute, every day, only on Sundays, etc.). These files are imported into Grooper as documents (folder Batch Folders) in a new inventory_2 Batch.

  • Afterward, the imported files can be (and should be) moved, deleted, or modified to prevent repeat imports in the next polling cycle.

Indexing Service

Grooper.GPT.IndexingService

An Indexing Service is a Grooper Service that periodically polls the Grooper database to automate AI Search indexing. It checks to see if any documents in a Grooper Repository are classified as a Document Type that inherit from a Content Type configured with an Indexing Behavior. If there are any, and they need to be added, updated, or deleted to/from the search index, the Indexing Service will submit an "Indexing Job" to be picked up by an Activity Processing service.

Table Extract Method

A Table Extract Method defines the settings and logic for a table Data Table to perform extraction. It is set by configuring the Extract Method property of the Data Table.

Delimited Extract

The Delimited Extract Table Extract Method extracts tabular data from a delimiter-separated text file, such as a CSV file.

Fluid Layout

The Fluid Layout Table Extract Method will choose between Tabular Layout and Flow Layout configurations, depending on how labels are collected for a description Document Type.

Grid Layout

The Grid Layout Table Extract Method uses the positional location of row and column headers to interpret where a tabular grid would be around each value in a table and extract values from each cell in the interpreted grid.

Row Match

The Row Match Table Extract Method uses regular expression pattern matching to determine a tables structure based on the pattern of each row and extract cell data from each column.

Tabular Layout

The Tabular Layout Table Extract Method uses column header values determined by the view_column Data Columns Header Extractor results (or labels collected for the Data Columns when a Labeling Behavior is enabled) as well as Data Column Value Extractor results to model a table's structure and return its values.

Value Extractor

An Extractor Type (shorthand for Value Extractor Type) is configured for numerous properties on a wide array of Grooper objects. They are used to return "data instances" from documents for one purpose or another. The Extractor Type defines an operation that reads data from the text or visual content of a document and returns one or more results. Each different Extractor Type uses a specialized logic to return results. Extractor Types are consumed by higher-level objects such as Data Elements, extractor nodes, Content Types and more.

Ask AI

Ask AI is an Extractor Type that executes a chat completion using a large language model (LLM), such as OpenAI's GPT models. It uses a document's text content and user-defined instructions (a question about the document) in the chat prompt. Ask AI then returns the response as the extractor's result. Ask AI is a powerful, LLM based extraction method, that can be used anywhere in Grooper an Extractor Type is referenced. It can complete a wide array of tasks in Grooper with simple text prompts.

Detect Signature

Detect Signature is an Extractor Type that cant detect if a handwritten signature is present on a document. It detects signatures within a specified rectangular region on a document page by measuring the "fill percentage" (what percentage of pixels are filled in the region).

Field Match

Field Match is an Extractor Type that matches the value stored in a previously-extracted variables Data Field or view_column Data Column.

Find Barcode

Find Barcode is an Extractor Type that searches for and returns barcode values previously stored in a folder Batch Folder or contract Batch Page's layout data.

Note: Find Barcode differs slightly from Read Barcode. Read Barcode performs barcode recognition when the extractor executes. Find Barcode can only look up barcode data stored in the document or page's layout data. Find Barcode runs quicker than Read Barcode, but barcode values must have previously been collected in the Batch Process by the Image Processing or Recognize activities.

GPT Complete

GPT Complete is an Extractor Type that leverages Open AI's GPT models to generate chat completions for inputs, returning one hit for each result choice provided by the model's response.

PLEASE NOTE: GPT Complete is a deprecated extractor type. It uses an outdated method to call the OpenAI API. Please use the Ask AI extractor type going forward.

Highlight Zone

Highlight Zone is an Extractor Type that sets a highlight region on a document without performing any actual data extraction. This "extractor" is used to mark areas of interest or importance for Review users or for uncommon scenarios where a data instance location is needed with no actual value.

Label Match

Label Match is an Extractor Type that matches a list of one or more values using matching options defined by a Labeling Behavior. It is similar to List Match but uses shared settings defined in a Labeling Behavior for Fuzzy Matching, Vertical Wrap, and Constrained Wrap.

Labeled OMR

Labeled OMR is an Extractor Type used to output OMR checkbox labels. It determines whether labeled checkboxes are checked or not. If checked, it outputs the label(s) or a Boolean true/false value as the result.

Labeled Value

Labeled Value is an Extractor Type that identifies and extracts a value next to a label. This is one of the most commonly used extractors to extract data from structured documents (such as a standardized form) and static values on semi-structured documents (such as the header details on an invoice).

List Match

List Match is an Extractor Type designed to return values matching one or more items in a defined list. By default, the List Match extractor does not use or require regular expression, but can be configured to utilize regular expression syntax.

Ordered OMR

Ordered OMR is an Extractor Type used to return OMR check box information. Ordered OMR returns information for multiple check boxes within a defined zone based on their order and layout. The zone may be optionally fixed on the page or anchored to a static text value (such as a label).

Pattern Match

Pattern Match is an Extractor Type that extracts values from a document that match a specified regular expression, providing data collection following a known format or pattern.

Query HTML

Query HTML is an Extractor Type specialized for HTML documents. It uses either CSS or XPath selectors to return the inner text or an attribute of an HTML element.

Read Barcode

Read Barcode is an Extractor Type that uses barcode recognition technology to read and extract values from barcodes found in the document content.

Note: Read Barcode differs slightly from Find Barcode. Read Barcode performs barcode recognition when the extractor executes. Find Barcode can only look up barcode data stored in the document or page's layout data. Find Barcode runs quicker than Read Barcode, but barcode values must have previously been collected in the Batch Process by the Image Processing or Recognize activities.

Read Meta Data

Read Meta Data is an Extractor Type retrieves metadata values associated with a document. Read Meta Data can return metadata from a folder Batch Folder's attachment file based on its MIME type, such as PDF, Word and Mail Message ('message/rfc822' or 'application/vnd.ms-outlook'). It can also return data using a Document Link in Grooper, such as a File System Link or a CMIS Document Link.

Read Zone

Read Zone is an Extractor Type that allows you to extract text data in a rectangular region (called an "extraction zone" or just "zone") on a document. This can be a fixed zone, extracting text from the same location on a document, or a zone relative to a text value (such as a label) or a shape location on the document.

Reference

Reference is an Extractor Type used to reference an external extractor nodes within a Grooper property configuration. This allows users to create re-usable extractors and use the more complex pin Data Type and input Field Class extractors throughout Grooper.

Word Match

Word Match is an Extractor Type that extracts individual words or phrases from documents. It is used for n-gram extraction. Each gram may be optionally executed against a dictionary Lexicon to ensure words and phrases only match a set vocabulary.

Zonal OMR

Zonal OMR is an Extractor Type that reads one or more OMR checkboxes using manually-configured zones. The zone may be optionally fixed on the page or anchored to a static text value (such as a label).

BE AWARE: Zonal OMR is outdated compared to Labeled OMR and Ordered OMR. It requires the most manual setup of any OMR extractor to configure. Use this as a last resort when other OMR extractor options have been exhausted.

Import and Export Related Types

These are configuration objects in Grooper that relate to importing documents into Grooper and exporting processed content (files and data) out of Grooper.

CMIS Bindings (aka "connection types")

CMIS Bindings are the platform connection types for cloud CMIS Connections. The CMIS Binding establishes the communication protocols used to connect Grooper with content management systems (CMS) and file systems.

CMIS Bindings use the CMIS standard as a model to define connectivity. Even when connecting to CMS platforms that are not truly CMIS systems (such as a Windows file system), Grooper normalizes connection to them as if they were. This allows Grooper to use CMIS Import and CMIS Export for all storage platforms.

  • You will commonly hear CMIS Binding referred to as a "CMIS connection type", "connection type", or just "connection", as in an "Exchange connection".

AppXtender

AppXtender is a connection option for cloud CMIS Connections. It allows Grooper to connect to the AppEnhancer (formerly ApplicationXtender) content management system for import and export operations.

Box

Box is a connection option for cloud CMIS Connections. It Grooper to the Box content management system for import and export operations.

CMIS

CMIS is a connection option for cloud CMIS Connections. It connects Grooper to a CMIS 1.0 or CMIS 1.1 server for import and export operations. This can be used to connect to CMS platforms that implement the CMIS protocol such as these.

Exchange

Exchange is a connection option for cloud CMIS Connections. It connects Grooper to Microsoft Exchange email servers (including Outlook servers) for import and export operations.

FTP

FTP is a connection option for cloud CMIS Connections. It connects Grooper to FTP directories for import and export operations.

IMAP

IMAP is a connection option for cloud CMIS Connections. It connects Grooper to email messages and folders through an IMAP email server for import and export operations.

NTFS

NTFS is a connection option for cloud CMIS Connections. It connects Grooper to files and folders in the Microsoft Windows NTFS file system for import and export operations.

OneDrive

OneDrive is a connection option for cloud CMIS Connections. It connects Grooper to Microsoft OneDrive cloud services for import and export operations.

SFTP

SFTP is a connection option for cloud CMIS Connections. It connects Grooper to SFTP directories for import and export operations.

SharePoint

SharePoint is a connection option for cloud CMIS Connections. It Grooper to Microsoft SharePoint, providing access to content stored in "document libraries" and "picture libraries" for import and export operations.

Content Link

Grooper.Core.ContentLink

Content Links define references to files or folders stored outside of Grooper, such as in a Windows folder or in a CMIS Repository.

  • Content Link has two sub-types: Document Link and Folder Link. There are 9 types of "Document Link" and only 1 type of "Folder Link". Due to this, Document Link is a more common term than "Content Link".

Document Links

Grooper.Core.DocumentLink

CMIS Document Link

Grooper.CMIS.CmisLink

File System Link

Grooper.Core.FileSystemLink

FTP Link

Grooper.Messaging.FtpLink

HTTP Link

Grooper.Messaging.HTTPLink

Mail Link

Grooper.Messaging.MailLink

PST Link

Grooper.Office.PstLink

SFTP Link

Grooper.Messaging.SftpLink

Subfile Link

Grooper.Core.SubfileLink

ZIP Link

Grooper.Messaging.FtpLink

Folder Links

Grooper.Core.FolderLink

CMIS Folder Link

Grooper.CMIS.CmisFolderLink

Export Definition

Export Behaviors are defined by adding and configuring one or more Export Definitions. An Export Definition defines export parameters to external systems, such as file systems, content management repositories, databases, or mail servers.

CMIS Export

CMIS Export is an Export Definition available when configuring an Export Behavior. It exports content over a cloud CMIS Connection, allowing users to export documents and their metadata to various on-premise and cloud-based storage platforms.

Data Export

Data Export is an Export Definition available when configuring an Export Behavior. It exports extracted document data over a database Data Connection, allowing users to export data to a Microsoft SQL Server or ODBC compliant database.

Import Provider

Grooper.Core.ImportProvider

Import Providers enable Grooper to import file-based content from numerous sources, including Windows file systems, SFTP file systems, mail servers and various content management systems (CMS). An Import Provider is selected and configured when configuring "Import Jobs". Import Jobs are submitted in one of two ways:

  • By a user from the Imports page: Ad-hoc or "user directed" Import Jobs are submitted from the Imports Page, using the "Submit Import Job" button.
  • From an Import Watcher service: Automated or "scheduled" Import Jobs are submitted by an Import Watcher service according to its Poling Loop or Specific Times specification.

In both cases, an Import Provider is selected and configured using using the "Provider" property.

CMIS Import

Grooper.CMIS.CmisImportBase

CMIS Import refers to two Import Providers used to import content from settings_system_daydream CMIS Repositories: Import Descendants and Import Query Results. CMIS Imports allow users to import from various on-premise and cloud based storage platforms (including Windows folders, Outlook inboxes, Box accounts, AppEnhancer applications and more).

Import Descendants

Grooper.CMIS.ImportDescendants

Import Descendants is one of two Import Providers that use cloud CMIS Connections to import document content into Grooper. Import Descendants imports files from a settings_system_daydream CMIS Repository folder location, including any files in any sub-folders (i.e. all "descendant" files).

Import Query Results

Grooper.CMIS.ImportQueryResults

Import Query Results is one of two Import Providers that use cloud CMIS Connections to import document content into Grooper. Import Query Results imports files from a settings_system_daydream CMIS Repository that match a "CMISQL query" (a specialized query language based on SQL database queries).

File System Import

Grooper.Core.FileSystemImport

File System Import refers to a Legacy Import Provider used to import documents directly from your Windows File System into Grooper.

HTTP Import

Grooper.Messaging.HTTPImport

HTTP Import is an Import Provider used to import web-based content (web pages and files hosted on an HTTP server). HTTP Import can be used to ingest individual web pages, defined portions of a website or entire websites into Grooper.

Test Batch

Grooper.Core.TestBatchImport

"Test Batch" is a specialized Import Provider designed to facilitate the import of content from an existing inventory_2 Batch in the test environment. This provider is most commonly used for testing, development, and validation scenarios, and is not intended for production use.

  • Looking for information on "production" vs "test" Batches in Grooper? See here.

Misc Configuration

Property

A property is a mechanism by which an object in Grooper is configured that affects how the object performs its function.

Alignment

"Alignment" refers to how Grooper highlights text from an AI response on a document in a Document Viewer. Alignment properties can be configured to alter how Grooper highlights results when using LLM-based extraction methods, such as AI Extract.

Confidence Multiplier and Output Confidence

Some results carry more weight than others. The Confidence Multiplier and Output Confidence properties allow you to manually adjust an extraction result's confidence.

Constrained Wrap

The Constrained Wrap property allows certain Extractor Types and the Labeling Behavior to match values which wrap from one line to the next inside a box (such as a table cell).

Content Type Filter

The Content Type Filter property restricts Activities to specific collections_bookmark Content Categories and/or description Document Types.

Document Quoting

Document Quoting is a property of the AI Extract Fill Method that limits the text fed to the AI to reduce the amount of tokens consumed. Controlling specifically what is given can not only reduce the monetary cost of using the AI, but also the time cost of running the Fill Method.

Import Mode

Import Mode is a configurable property for CMIS Import providers. This controls how file content is loaded into a Grooper Repository during an Import Job. This property is key to setting up a "Sparse" import in Grooper.

Output Extractor Key

The Output Extractor Key property is another weapon in the arsenal of powerful Grooper classification techniques. It allows pin Data Types to return results normalized in a way more beneficial to document classification.

Paragraph Marking

Paragraph Marking alters the normal text data in a document by placing the carriage return and new line feed pairs at the end of each paragraph, instead of the end of each line. This allows users to break up a document's text flow into segments of paragraphs instead of segments of lines.

Parameters

Parameters is a collection of properties used in the configuration of LLM constructs. Temperature, TopP, Presence Penalty, and Frequency Penalty are parameters that influence text generation in models. Temperature and TopP control the diversity and probability distribution of generated text, while Presence Penalty and Frequency Penalty help manage repetition by discouraging the reuse of words or phrases.

Permission Sets

A Permission Sets is a property that allows you to restrict user access to repositories, pages, and certain activities. This helps eliminate the possibility of an unauthorized individual from editing or deleting information or inventory_2 Batches.

Preprocessing

The Preprocessing grouping of properties consists of settings that adjust how text is formatted and interpreted before any Data Extraction process begins. These properties are crucial for ensuring that the text data is in the most optimal format for subsequent extraction tasks, which could involve complex regular expressions or precise data parsing.

Scope

The Scope property of a edit_document Batch Process Step, as it relates to an Activity, determines at which level in a inventory_2 Batch hierarchy the Activity runs.

Secondary Types

Secondary Types allow the application of multiple Content Types to a single folder Batch Folder.

Tab Marking

Tab Marking allows you to insert tab characters into a document's text data.

Vertical Wrap

Vertical Wrap is a property of certain Extractor Types and a Content Type's Labeling Behavior used to provide simplified extraction of vertically wrapped text (typically stacked labels).

Fill Method

Fill Method is a configurable property for data_table Data Models, insert_page_break Data Sections, and table Data Tables (aka "container elements" or "containers"). Fill Methods provide various mechanisms for populating these containers' child Data Elements. Fill Methods are secondary extraction operations. They populate descendant Data Elements after normal extraction during the Extract step.

AI Extract

AI Extract is a Fill Method that leverages a Large Language Model (LLM) to return extraction results to Data Elements in a data_table Data Model or insert_page_break Data Section. This mechanism provides powerful AI-based data extraction with minimal setup.

Repository Option

Repository Options are optional features that affect the entire repository. These optional features enable functionality that otherwise do not work without first establishing the connections these options provide. Repository Options are added to a Grooper Repository and configured using the database Root node's Options property.

LLM Connector

LLM Connector is a Repository Option that enables large language model (LLM) powered AI features for a Grooper Repository.

AI Search

AI Search enables Grooper's document search and retrieval features in the Search page. It provides the framework to create document search indexes by Content Type and submit documents to an index. Once indexed, documents can be retrieved by full text searches in the Search Page with feature rich querying and filtering capabilities. Once retrieved, users can view documents in the Search page, download the results, or submit documents for further processing in Grooper.

Other Functionality

AI Search

AI Search enables Grooper's document search and retrieval features in the Search page. It provides the framework to create document search indexes by Content Type and submit documents to an index. Once indexed, documents can be retrieved by full text searches in the Search Page with feature rich querying and filtering capabilities. Once retrieved, users can view documents in the Search page, download the results, or submit documents for further processing in Grooper.

AI Generator

AI Generators create custom documents using the results of a Search Page query and a large language model (LLM). Both document content and instructions are fed to the LLM to produce a text-based file.

EDI Integration

EDI Integration refers to Grooper's ability to process EDI files.

Footer Rows and Footer Modes

A "Footer Row" is a row at the bottom of a table Data Table that displays sum totals for numerical view_column Data Columns. This can help Data Viewer users validate data Grooper extracts for one or more Data Columns. The Data Column's "Footer Mode" controls if a sum calculation is performed or not (and if Tabular Layout's "Capture Footer Row" creates the Footer Row if and how document data is used to capture and validate the footer value).

XML Schema Integration

XML Schema Integration refers to Grooper's ability to use XML schemas to build Data Models, extract XML documents, and more.

UI Element

A UI Element is a portion of the Grooper interface that allows users to interact with or otherwise receive information about the application.

Data Inspector

The Grooper Data Inspector is a UI Element that can be found anywhere there is a Document Viewer showing extraction results. This UI Element allows a user to inspect the Data Instance hierarchies of an extracted result.

Document Viewer

The Grooper Document Viewer is the portal to your documents. It is the UI that allows you to see a folder Batch Folder's (or a contract Batch Page's) image, text content, and more.

Node Tree

The Node Tree is the hierarchical list of Grooper node objects found in the left panel in the Design Page. It is the basis for navigation and creation in the Design Page.

Overrides

Overrides is a tab provided to allow overriding of default properties set to a Data Element.

Search Page

The Search Page allows users to leverage AI Search indexes to query indexed documents. Once queried the user can interact with returned results in several ways, including creating new batches, submitting processing jobs, or even starting a conversation with an AI assistant.

Scan Viewer

The Scan Viewer is a user interface that can be added to the user-attended person_search Review step in a settings Batch Process. It is used to scan documents into inventory_2 Batches from one or more scanning workstations.

Summary Tabs

stacks Content Models and collections_bookmark Content Categories have a Summary tab where you can view "Descendant Node Types", description Document Types, and Expressions.

Other

Concepts

There are many objects and properties a user can configure in Grooper, however, gaining an understanding how, why, and when to use these objects and properties is powered by one's understanding of the underlying concepts that define what what these objects and properties are doing and why.

Activity Processing

Activity Processing is the execution of a sequence of configured tasks which are performed within a settings Batch Process to transform raw data from documents into structured and actionable information. Tasks are defined by Grooper Activities, configurated to perform document classification, extraction, or data enhancement.

CMIS+

CMIS+ is a conceptual term that refers to Grooper's connectivity architecture to external storage platforms. CMIS+ standardizes connections to a variety of content management system based on the CMIS standard. This provides a standardized setup to allow Grooper to interoperate with both CMIS compliant systems and non-CMIS systems. It further provides normalized access to document content and metadata for import (CMIS Import) and export (CMIS Export) operations.

CMIS

CMIS (Content Management Interoperability Services) is open standard allowing different content management systems to "interoperate", sharing files, folders and their metadata as well as programmatic control of the platform over the internet.

CMIS Query

A CMIS Query (aka CMISQL Query) is Grooper's way of searching for documents in CMIS Repositories and filtering them upon import when using the Import Query Results Import Provider. CMIS queries are based on a subset of the SQL-92 syntax for querying databases, with some specialized extensions added to support querying CMIS sources.

CSS Data Viewer Styling

CSS Data Viewer Styling refers to using CSS to custom style the Review activity's Data Viewer interface. This gives you a great deal of control over a data_table Data Model's appearance and layout during document review.

Classification

Classification is the process of identifying and organizing documents into categorical types based on their content or layout. Classification is key for efficient document management and data extraction workflows. Grooper has different methods for classifying documents. These include methods that use machine learning and text pattern recognition. In a Grooper Batch Process, the Classify Activity will assign a Content Type to a folder Batch Folder.

Code Expressions

Code Expressions (not to be confused with regular expressions) are snippets of VB.NET code that expand Grooper's core functionality.

Data Context

Data Context refers to contextual information used to extract data, such as a label that identifies the value you want to collect.

Data Extraction

Data Extraction involves identifying and capturing specific information from documents (represented by folder Batch Folders in Grooper). Extraction is performed by configurable Data Extractors, which transform unstructured or semi-structured data into a structured, usable format for processing and analysis.

Data Extractor

Data Extractor (or just "extractor") refers to all Extractor Types and extractor nodes. Extractors define the logic used to return data from a document's text content, including general data (such as a date) and specific data (such as an agreement date on a contract).

Data Instance

A Data Instance is an encapsulation of text data within a document returned by Grooper's extractors. Data instances are the hierarchy of text data created by Grooper's extractors.

Expressions

Expressions (not to be confused with regular expressions) are snippets of VB.NET code that expand Grooper's core functionality.

Expressions Cookbook

The "Expressions Cookbook" is a reference list for commonly used Code Expressions in Grooper.

Field Mapping

Field Mapping refers to how logical connections are made between metadata content in Grooper and an external storage platform.

Five Phases of Grooper

The "Five Phases of Grooper" is a conceptual term that seeks to build understanding of how documents are processed through Grooper.

Flow Collation

"Flow Collation" refers to the text-flow based layout option used by various Collation Providers forpin Data Type extractors.

Fuzzy RegEx

Fuzzy RegEx is Grooper's use of fuzzy logic within Extractor Types that leverage regular expressions to match patterns. Fuzzy RegEx allows extractors to overcome defects in a document's OCR results to accurately return results. Fuzzy RegEx is enabled by enabling the Fuzzy Matching property.

GPT Integration

Grooper's GPT Integration is refers to the usage of OpenAI's GPT models within Grooper to enhance the capabilities of data extractors, classification, and lookups.

Grooper Infrastructure

Grooper Infrastructure refers to the computing underpinnings of what makes up a Grooper Repository and the software that allows the Grooper platform to automate tasks and users to interface with it.

Grooper Repository

A Grooper Repository is the environment used to create, configure and execute objects in Grooper. It provides the framework to "do work" in Grooper. Fundamentally, a Grooper Repository is a connection to a database and file store location, which store the node configurations and their associated file content. The Grooper application interacts with the Grooper Repository to automate tasks and provide the Grooper user interface.

Image Processing

"Image processing", as a general term, refers to software techniques that manipulate and enhance images. Image processing removes imperfections and adjusts images to improve OCR accuracy. In Grooper, images are processed primarily by two Activities:

  • Image Processing - This Activity permanently adjusts the image using. It is primarily used to compensate for defects produced by a document scanner (like border artifacts and skewed images). It does so by applying IP Commands in an perm_media IP Profile.
  • Recognize - This Activity performs OCR. When an library_books OCR Profile references an perm_media IP Profile, the image will be processed temporarily. A temporary image is handed to the OCR engine and discarded once characters are recognized.
  • Grooper also has "computer vision" capabilities that analyze and interpret images. These capabilities are also executed during Grooper's image processing. For example, Grooper's "Line Removal" command will locate lines on an image (computer vision), remove those artifacts to improve OCR results during Recognize (image processing) and store that data for later use in Grooper (computer vision).

LINQ to Grooper Objects

LINQ is Microsoft .NET component that provides data querying capabilities to the .NET framework. In Grooper, you can use the LINQ syntax in Code Expressions to "LINQ to Grooper Objects". This allows expressions to access information from collections of data, such as from multi-instance Data Sections or Data Tables.

Layout Data

Layout Data refers to visual information Grooper certain IP Commands collect, such as lines, checkboxes, barcodes, and detected shapes. This data is stored in a "Grooper.Layout.json" file attached to contract Batch Pages. Layout data is used by certain extractors and other features that rely on the presence of that data to function.

Microfiche Processing

Microfiche Processing refers to Grooper's suite of specialized Activities and IP Commands that process microfiche documents.

Microsoft Office Integration

Grooper's Microsoft Office Integration allows the platform to easily convert Microsoft Word and Microsoft Excel files into formats that Grooper can read natively (PDF and CSV).

Mixed Classification

"Mixed Classification" refers to leveraging a Classify Method and "rules" defined on a description Document Type to overcome the shortcomings of an individual method.

OCR

OCR is stands for Optical Character Recognition. It allows text on paper documents to be digitized, in order to be searched or edited by other software applications. OCR converts typed or printed text from digital images of physical documents into machine readable, encoded text.

OCR Synthesis

OCR Synthesis refers to a suite of OCR related functionality unique to Grooper. The OCR Synthesis suite will pre-process and re-process raw results from the OCR Engine and synthesize its results into a single, more accurate OCR result.

Object Nomenclature

The Grooper Wiki's Object Nomenclature defines how Grooper users categorize and refer to different types of Node Objects in a Grooper Repository. Knowing what objects can be added to the Grooper Node Tree and how they are related is a critical part of understanding Grooper itself.

PDF Page Types

PDF pages can be one of several PDF Page Types. "Page types" describe the kind of content in a PDF page. This informs Grooper how certain Activities should process the page. For example, "single image" pages are OCR'd by the Recognize activity, where "text only" pages have their native text extracted by Recognize.

Prompt Engineering

"Prompt Engineering" is the process of designing and refining prompts to interact more effectively with large language models (LLMs) like GPT-4. The goal is to guide the model to produce desired outputs by carefully crafting the input queries.

Regular Expression

Regular Expression (or regex) is a standard syntax designed to parse text strings. This is a way of finding information in text. It is the primary method by which Grooper extracts and returns data from documents.

Separation

Separation is the process of taking an unorganized inventory_2 Batch of loose contract Batch Pages and organizing them into documents represented by folder Batch Folders in Grooper. This is done so Grooper can later assign a description Document Type to each document folder in a process known as "classification".

TF-IDF

TF-IDF stands for term frequency-inverse document frequency. It is a statistical calculation intended to reflect how important a word is to a document within a document set (or "corpus"). It is how Grooper uses machine learning for training-based document classification (via the Lexical method) and data extraction (via the input Field Class extractor).

Table Extraction

"Table Extraction" refers to Grooper's ability to extract data from cells in tables on documents. This is accomplished by configuring the table Data Table and its child view_column Data Column elements in a data_table Data Model.

Thread

A Thread is the smallest unit of processing that can be performed within an operating system. In Grooper, threads are allocated for processing by Activity Processing services.

Training-Based Approaches to Document Classification

"Training-Based Approaches to Document Classification" refers to Grooper Classify Methods that classify folder Batch Folders using document examples for each description Document Type. The Classify activity then assigns unclassified Batch Folders a Document Type based on how similar it is to the Document Type's training data.

Training Batch

The Training Batch is a special inventory_2 Batch created when training document examples using the Lexical classification method. The Training Batch service two purposes: (1) It is a Batch that holds all previously trained folder Batch Folders. Designers can go to this Batch to view these documents and copy and paste them into other Batches if needed. (2) Batch Folders in the Training Batch will be used to re-train the Content Model's classification data when the Rebuild Training command is executed.

UNC Path

UNC Path is a conceptual term that refers to UNC (Universal Naming Convention) which is a standard used in Microsoft Windows for accessing shared network folders.

Waterfall Classification

Waterfall Classification is a classification technique in Grooper that prioritizes training similarity over classification "rules" set by a description Document Type's Positive Extractor. This can be helpful in scenarios where folder Batch Folders get misclassified and simply retraining won't help.

Miscellaneous Features

URL Endpoints for Review

Three different URL endpoints can be used to open Review tasks in the Grooper Web Client, given certain information like the Grooper Repository ID, settings Batch Process name, inventory_2 Batch Id and more. This allows Grooper users to link directly to a Batch in Review with a URL.

Fine-Tuning for AI Extract

Fine-tuning is the process of further training a large language model (LLM) on a specific dataset to make it more specialized for a particular task or domain. This allows the model to adapt its general language understanding to better handle the unique vocabulary, style, and structure of the domain it's fine-tuned on.
In Grooper, you can easily start fine-tuning a model based on a data_table Data Model that will facilitate better extraction when using AI Extract.

Disambiguation

Repository

A "repository" is a general term in computer science referring to where files and/or data is stored and managed. In Grooper, the term "repository" may refer to:

Base Types

Grooper Object

Grooper.GrooperObject

Connected Object

Grooper.ConnectedObject

Database Row

Grooper.DatabaseRow

Embedded Object

Grooper.EmbeddedObject

Variable Definition

Grooper.Core.VariableDefinition

Variable Definitions define a variable with a computed value that can be called by various code expressions. Variable Definitions are added to Data Models, Data Sections and Data Tables using their "Variables" property

Used By: Data Model, Data Section, Data Table