Glossary: Difference between revisions

From Grooper Wiki
// via Wikitext Extension for VSCode
 
(631 intermediate revisions by 3 users not shown)
Line 1: Line 1:
This glossary seeks to educate readers on various Grooper terms, objects and other entities. Glossary entries will be short paragraphs describing the topic. For each glossary entry, you will find links to a full article about the entry as well as articles on associated terms.
Each entry is organized according to what major Grooper entity they belong to. For example, "Classify" is an "Activity". It is found in the "Activity" section of the Glossary.
== Application ==
<section begin="Application" />'''Grooper''' is an intelligent document processing platform that uses an array of sophisticated techniques to automate end-to-end content capture and delivery. From a technical standpoint, Grooper consists of a '''[[Grooper Repository]]''' and the applications that the support management and execution of configuration assets.
:*<li class="fyi-bullet" style="padding-left:20px"> A '''Grooper Repository''' consists of two things: (1) A series of tables in a SQL [https://en.wikipedia.org/wiki/Database database] (containing configuration nodes and their properties) and (2) a '''[[File Store]]''' (containing files associated to nodes in the database).
The Grooper applications are as follows:
* '''Grooper''' - The primary program files for the Grooper platform. This application will need to be installed on any Grooper web server hosting the Grooper UI and processing servers running Activity Processing services to automate task processing.
* '''Grooper Command Console''' - This is an administrative utility that gets installed with the Grooper application.
* '''Grooper Web Client''' - This application installs the Grooper user interface. It will need to be installed on the Grooper web server. The Grooper web server hosts the Grooper web app which is accessed via a URL.
* '''Grooper Desktop''' - This is a lightweight application required to scan documents using the Grooper web app. It runs in the background and helps operate the Scan Viewer in Grooper. It needs to be installed on any workstation connected to a document scanner.
<section end="Application" />
<div style="padding-left: 1.5em;">
=== Grooper Command Console ===
<section begin="Grooper Command Console" />'''[[Grooper Command Console]]''' is a [https://en.wikipedia.org/wiki/Command-line_interface command-line interface] that performs system configuration and administration tasks within '''Grooper'''.<section end="Grooper Command Console" />
=== Grooper Web Client ===
<section begin="Web Client" />The Grooper user interface is accessed using a web browser from a URL. The [[Grooper Web Client]] is the application that installs the Grooper website on a web server.<section end="Web Client" />
</div>
== Node Types ==
''{{TypeName|Node}}''
<section begin="Node Type" />Nodes are the main configuration objects in Grooper. They are created and accessed in the [[Node Tree]] from the [[Design]] page. The different types of nodes ("Node Types") serve different functions in Grooper. For example, "Batch" nodes are the primary container for document content. They contain "Batch Folder" nodes which represent documents and "Batch Page" nodes which represent individual pages of documents.<section end="Node Type" />
<div style="padding-left: 1.5em;">
=== AI Analyst ===
<section begin="AI Analyst" />''BE AWARE: AI Analysts are obsolete as of version 2025. See [[AI Assistant]] for the new and improved version of AI Analyst.'' An '''[[AI Analyst]]''' facilitates the ability to interact with a document as you might with an AI [https://en.wikipedia.org/wiki/Chatbot chatbot].<section end="AI Analyst" />
=== AI Assistant ===
''{{TypeName|AI Assistant}}''
<section begin="AI Assistant" />'''[[AI Assistant]]s''' are Grooper's conversational AI personas. They answer questions about resources it can access (including content from documents, databases and/or web services). This greatly increases an AI's ability to answer domain-specific questions that require access to these resources.<section end="AI Assistant" />
=== Batch Objects ===
''{{TypeName|Batch Object}}''
<section begin="Batch Object" />Batch Objects are the foundational elements of Grooper's document processing system, providing a unified structure for organizing, processing, and reviewing document content within a {{BatchIcon}} '''[[Batch]]'''. Every item within a Batch—whether a document, folder, or page—is represented as a Batch Object (and Batches themselves are Batch Objects too).<section end="Batch Object" />
<div style="padding-left: 2em;">
==== Batch ====
''{{TypeName|Batch}}''
<section begin="Batch" />{{BatchIcon}} '''[[Batch]]''' nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called {{BatchProcessIcon}} '''[[Batch Process]]es'''. Documents and their pages are represented in '''Batches''' by a hierarchy of {{BatchFolderIcon}} '''[[Batch Folder]]s''' and {{BatchPageIcon}} '''[[Batch Page]]s'''.<section end="Batch" />
==== Batch Folder ====
''{{TypeName|Batch Folder}}''
<section begin="Batch Folder" />The {{BatchFolderIcon}} '''[[Batch Folder]]''' is an organizational unit within a {{BatchIcon}} '''[[Batch]]''', allowing for a structured approach to managing and processing a collection of documents. '''Batch Folder''' nodes serve two purposes in a '''Batch'''. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other '''Batch Folders''' and/or {{BatchPageIcon}} '''[[Batch Page]]''' nodes as children.
* '''Batch Folders''' are frequently referred to simply as "documents" or "folders" depending on how they are used in the '''Batch'''.<section end="Batch Folder" />
==== Batch Page ====
''{{TypeName|Batch Page}}''
<section begin="Batch Page" />{{BatchPageIcon}} '''[[Batch Page]]''' nodes represent individual pages within a {{BatchIcon}}  '''[[Batch]]'''. '''Batch Pages''' are created in one of two ways: (1) When images are scanned into a '''Batch''' using the [[Scan Viewer]]. (2) Or, when split from a PDF or TIFF file using the [[Split Pages]] activity.
* '''Batch Pages''' are frequently referred to simply as "pages".<section end="Batch Page" />
</div>
=== Batch Process ===
''{{TypeName|Batch Process}}''
<section begin="Batch Process" />{{BatchProcessIcon}} '''[[Batch Process]]''' nodes are crucial components in '''Grooper's''' architecture. A '''Batch Process''' is the step-by-step processing instructions given to a {{BatchIcon}} '''Batch'''. Each step is comprised of a "[[Activity|Code Activity]]" or a [[Review]] activity. Code Activities are automated by [[Activity Processing]] services. Review activities are executed by human operators in the Grooper user interface.
* '''Batch Processes''' by themselves do nothing.  Instead, they execute {{BatchProcessStepIcon}} '''[[Batch Process Step]]s''' which are added as children nodes.
* A '''Batch Process''' is often referred to as simply a "process".<section end="Batch Process" />
=== Batch Process Step ===
''{{TypeName|Batch Process Step}}''
<section begin="Batch Process Step" />{{BatchProcessStepIcon}} '''[[Batch Process Step]]s''' are specific actions within a {{BatchProcessIcon}} '''[[Batch Process]]''' sequence. Each '''Batch Process Step''' performs an "[[Activity]]" specific to some document processing task. These Activities will either be a "Code Activity" or "[[Review]]" activities. Code Activities are automated by [[Activity Processing]] services. Review activities are executed by human operators in the Grooper user interface.
* '''Batch Process Steps''' are frequently referred to as simply "steps".
* Because a single '''Batch Process Step''' executes a single Activity configuration, they are often referred to by their referenced Activity as well.  For example, a "Recognize step".<section end="Batch Process Step" />
=== CMIS Connection ===
''{{TypeName|CMIS Connection}}''
<section begin="CMIS Connection" />{{CMISConnectionIcon}} '''[[CMIS Connection]]s''' provide a standardized way of connecting to various [https://en.wikipedia.org/wiki/Content_management_system content management systems (CMS)]. '''CMIS Connections''' allow Grooper to communicate with multiple external storage platforms, enabling access to documents and document metadata that reside outside of Grooper's immediate environment.
* For those that support the [https://en.wikipedia.org/wiki/Content_Management_Interoperability_Services CMIS] standard, the '''CMIS Connection''' connects to the CMS using the CMIS standard.
* For those that do not, the '''CMIS Connection''' normalizes [https://en.wikipedia.org/wiki/Comparison_of_file_transfer_protocols connection and transfer protocol] as if they ''were'' a CMIS platform.<section end="CMIS Connection" />
=== CMIS Repository ===
''{{TypeName|CMIS Repository}}''
<section begin="CMIS Repository" />{{CMISRepositoryIcon}} '''[[CMIS Repository]]''' nodes provide document access in external storage platforms through a {{CMISConnectionIcon}} '''[[CMIS Connection]]'''. With a '''CMIS Repository''', users can manage and interact with those documents within Grooper. They are used primarily for import using [[Import Descendants]] and [[Import Query Results]] and for export using [[CMIS Export]].
* '''CMIS Repositories''' are create as a child node of a '''CMIS Connection''' using the "Import Repository" command.<section end="CMIS Repository" />
=== Content Types ===
''{{TypeName|Content Type}}''
<section begin="Content Type" />[[Content Type]]s are a class of node types used used to classify {{BatchFolderIcon}} '''[[Batch Folder]]s'''. They represent categories of documents ({{ContentModelIcon}} '''[[Content Model]]s''' and  {{ContentCategoryIcon}} '''[[Content Category|Content Categories]]''') or distinct types of documents ({{DocumentTypeIcon}} '''[[Document Type]]s'''). Content Types serve an important role in defining [[Data Element]]s and [[Behavior]]s that apply to a document.<section end="Content Type" />
<div style="padding-left: 2em;">
==== Content Model ====
''{{TypeName|Content Type}}''
<section begin="Content Model" />{{ContentModelIcon}} '''[[Content Model]]''' nodes define a classification taxonomy for document sets in Grooper. This taxonomy is defined by the {{ContentCategoryIcon}} '''[[Content Category|Content Categories]]''' and {{DocumentTypeIcon}} '''[[Document Type]]s''' they contain. '''Content Models''' serve as the root of a '''[[Content Type]]''' hierarchy, which defines [[Data Element]] inheritance and [[Behavior]] inheritance. '''Content Models''' are crucial for organizing documents for data extraction and more.<section end="Content Model" />
==== Content Category ====
''{{TypeName|Content Category}}''
<section begin="Content Category" />{{ContentCategoryIcon}} A '''[[Content Category]]''' is a container for other '''Content Category''' or {{DocumentTypeIcon}} '''Document Type''' nodes in a {{ContentModelIcon}} '''[[Content Model]]'''. '''Content Categories''' are often used simply as organizational buckets for '''Content Models''' with large numbers of '''Document Types'''. However, '''Content Categories''' are also necessary to create branches in a '''Content Model's''' classification taxonomy, allowing for more complex [[Data Element]] inheritance and [[Behavior]] inheritance.<section end="Content Category" />
==== Document Type ====
''{{TypeName|Document Type}}''
<section begin="Document Type" />{{DocumentTypeIcon}} '''[[Document Type]]''' nodes represent a distinct type of document, such as an invoice or a contract. '''Document Types''' are created as child nodes of a {{ContentModelIcon}} '''[[Content Model]]''' or a {{ContentCategoryIcon}} '''[[Content Category]]'''. They serve three primary purposes:
# They are used to classify documents. Documents are considered "classified" when the {{BatchFolderIcon}} '''[[Batch Folder]]''' is assigned a [[Content Type]] (most typically, a '''Document Type''').
# The '''Document Type's''' {{DataModelIcon}} '''Data Model''' defines the '''[[Data Element]]s''' extracted by the [[Extract]] activity (including any Data Elements inherited from parent Content Types).
# The '''Document Type''' defines all "[[Behavior]]s" that apply (whether from the '''Document Type's''' Behavior settings or those inherited from a parent Content Type).<section end="Document Type" />
==== Form Type ====
''{{TypeName|Form Type}}''
<section begin="Form Type" />{{FormTypeIcon}} '''[[Form Type]]s''' represent trained variations of a {{DocumentTypeIcon}} '''[[Document Type]]'''.  These nodes store [https://en.wikipedia.org/wiki/Machine_learning machine learning] training data for [[Lexical]] and [[Visual]] document classification methods.<section end="Form Type" />
==== Page Type ====
''{{TypeName|Page Type}}''
<section begin="Page Type" />{{PageTypeIcon}} '''[[Page Type]]s''' represent individual pages of a {{FormTypeIcon}} [[Form Type]]. These nodes store page-level [https://en.wikipedia.org/wiki/Machine_learning machine learning] training data for [[Lexical]] and [[Visual]] document classification methods. Page Types are used by [[ESP Auto Separation]] to make document separation decisions based on page classification.<section end="Page Type" />
</div>
=== Control Sheet ===
''{{TypeName|Control Sheet}}''
<section begin="Control Sheet" />{{ControlSheetIcon}} '''[[Control Sheet]]s''' are printable pages used to automate document separation at scan time. Control Sheets are placed before each new document before loading pages into the scanner. Then, when pages are scanned using the [[Scan Viewer]] and [[Control Sheet Separation]] is executed, a new {{BatchFolderIcon}} '''[[Batch Folder]]''' is created for every Control Sheet scanned. Control Sheets can also be configured to assign the Batch Folder a {{DocumentTypeIcon}} '''[[Document Type]]''', thus classifying the document at scan time as well.<section end="Control Sheet" />
=== Data Connection ===
''{{TypeName|Data Connection}}''
<section begin="Data Connection" />{{DataConnectionIcon}} '''[[Data Connection]]s''' connect Grooper to [https://en.wikipedia.org/wiki/Microsoft_SQL_Server Microsoft SQL] and supported [https://en.wikipedia.org/wiki/Open_Database_Connectivity ODBC] databases. Once configured, '''Data Connections''' can be used to export data extracted from a document to a database, perform database lookups to validate data Grooper collects and other actions related to [https://en.wikipedia.org/wiki/Database#Database_management_system database management systems (DBMS)].
* Grooper supports MS SQL Server connectivity with the "SQL Server" connection method.
* Grooper supports Oracle, PostgreSQL, Db2, and MySQL connectivity with the "ODBC" connection method.<section end="Data Connection" />
=== Data Elements ===
''{{TypeName|Data Element}}''
<section begin="Data Element" />[[Data Element]]s are a class of node types used to collect data from a document.  These include: {{DataModelIcon}} '''[[Data Model]]s''', {{DataSectionIcon}} '''[[Data Section]]s''', {{DataFieldIcon}} '''[[Data Field]]s''', {{DataTableIcon}} '''[[Data Table]]s''', and {{DataColumnIcon}} '''[[Data Column]]s'''.<section end="Data Element" />
<div style="padding-left: 2em;">
==== Data Model ====
''{{TypeName|Data Model}}''
<section begin="Data Model" />{{DataModelIcon}} '''[[Data Model]]s''' are leveraged during the [[Extract]] activity to collect data from documents ({{BatchFolderIcon}} '''[[Batch Folder]]s'''). '''Data Models''' are the root of a [[Data Element]] hierarchy. The '''Data Model''' and its child Data Elements define a schema for data present on a document. The '''Data Model's''' configuration (and its child Data Elements' configuration) define [[Data Extraction|data extraction]] logic and settings for how data is reviewed in a [[Data Viewer]].<section end="Data Model" />
==== Data Field ====
''{{TypeName|Data Field}}''
<section begin="Data Field" />{{DataFieldIcon}} '''[[Data Field]]s''' represent a single value targeted for [[Data Extraction|data extraction]] on a document. '''Data Fields''' are created as child nodes of a {{DataModelIcon}}  '''[[Data Model]]''' and/or {{DataSectionIcon}} '''[[Data Section]]s'''.
* '''Data Fields''' are frequently referred to simply as "fields".<section end="Data Field" />
==== Data Section ====
''{{TypeName|Data Section}}''
<section begin="Data Section" />A {{DataSectionIcon}} '''[[Data Section]]''' is a container for [[Data Element]]s in a {{DataModelIcon}} '''Data Model'''. {{DataFieldIcon}} They can contain '''Data Fields''', {{DataTableIcon}} '''Data Tables''', and even '''Data Sections''' as child nodes and add hierarchy to a '''Data Model'''. They serve two main purposes:
# They can simply act as organizational buckets for Data Elements in larger '''Data Models'''.
# By configuring its "Extract Method", a '''Data Section''' can subdivide larger and more complex documents into smaller parts to assist in extraction.
#* "Single Instance" sections define a division (or "record") that appears only once on a document.
#* "Multi-Instance" sections define collection of repeating divisions (or "records").<section end="Data Section" />
==== Data Table ====
''{{TypeName|Data Table}}''
<section begin="Data Table" />A {{DataTableIcon}} '''[[Data Table]]''' is a [[Data Element]] specialized in extracting tabular data from documents (i.e. data formatted in rows and columns).
* The '''Data Table''' itself defines the "Table Extract Method". This is configured to determine the logic used to locate and return the table's rows.
* The table's columns are defined by adding {{DataColumnIcon}} '''[[Data Column]]''' nodes to the '''Data Table''' (as its children).<section end="Data Table" />
==== Data Column ====
''{{TypeName|Data Column}}''
<section begin="Data Column" />{{DataColumnIcon}} '''[[Data Column]]s''' represent columns in a table extracted from a document. They are added as child nodes of a {{DataTableIcon}} '''[[Data Table]]'''. They define the type of data each column holds along with its [[Data Extraction|data extraction]] properties.
* '''Data Columns''' are frequently referred to simply as "columns".
* In the context of reviewing data in a [[Data Viewer]], a single '''Data Column''' instance in a single '''Data Table''' row, is most frequently called a "cell".<section end="Data Column" />
==== Data Field Container and Data Element Container ====
''{{TypeName|Data Field Container}}''<br>
''{{TypeName|Data Element Container}}''
<section begin="Data Field Container and Data Element Container" />'''Data Field Container''' and '''Data Element Container''' are two base types in Grooper from which "container" [[Data Element]]s are derived. Container Data Elements ({{IconName|Data Model}} [[Data Model]]s, {{IconName|Data Section}}, [[Data Section]]s {{IconName|Data Table}} [[Data Table]]s) serve an important function in organizing and defining behavior and extraction logic for the {{IconName|Data Field}} [[Data Field]]s and {{IconName|Data Column}} [[Data Column]]s they contain.
:*<li class=fyi-bullet> While "Data Field Container" and "Data Element Container" are distinct classes in the Grooper Object Model, they are closely related. While Grooper scripters/experts should know the difference, for most practical purposes, the terms are used interchangeably (or they're just called "containers" or "container elements"). See [[Data Field Container and Data Element Container#Object Model info|Object Model info]] for more.<section end="Data Field Container and Data Element Container" />
</div>
=== Data Rule ===
''{{TypeName|Data Rule}}''
<section begin="Data Rule" />{{DataRuleIcon}} '''[[Data Rule]]s''' are used to normalize or otherwise prepare data collected in a {{DataModelIcon}} '''[[Data Model]]''' for downstream processes.  '''Data Rules''' define data manipulation logic for data extracted from documents ({{BatchFolderIcon}} '''[[Batch Folder]]s''') to ensure data conforms to expected formats or meets certain standards.
* Each '''Data Rule''' executes a "Data Action" which do things like computing a field's value, parse a field into other fields, perform lookups, and more.
* Data Actions can be conditionally executed based on a '''Data Rule's''' "Trigger" expression.
* A hierarchy of '''Data Rules''' can be created to execute multiple Data Actions and perform complex data transformation tasks.
* '''Data Rules''' can be applied by:
** The [[Apply Rules]] activity (must be done after data is collected by the [[Extract]] activity)
** The Extract activity (will run after the '''Data Model''' extraction)
** The [[Convert Data]] activity when converting document to another '''Document Type'''
** They can be applied manually in a [[Data Viewer]] with the "Run Rule" command.<section end="Data Rule" />
=== Extractor Nodes ===
''{{TypeName|Extractor Node}}''
<div style="padding-left: 2em;">
==== Data Type ====
''{{TypeName|Data Type}}''
<section begin="Data Type" />{{DataTypeIcon}} '''[[Data Type]]s''' are nodes used to extract text data from a document. '''Data Types''' have more capabilities than {{ValueReaderIcon}} [[Value Reader]]s. Data Types can collect results from multiple extractor sources, including a locally defined extractor, child extractor nodes, and referenced extractor nodes. '''Data Types''' can also collate results using [[Collation Provider]]s to combine, sift and manipulate results further.<section end="Data Type" />
==== Value Reader ====
''{{TypeName|Value Reader}}''
<section begin="Value Reader" />{{IconName|Value Reader}} [[Value Reader]] nodes define a single [[Data Extraction (Concept)|data extraction]] operation. Each Value Reader executes a single [[Value Extractor]] configuration. The Value Extractor determines the logic for returning data from a text-based document or page. (Example: [[Pattern Match]] is a Value Extractor that returns data using regular expressions).
*<li class="fyi-bullet"> Value Readers are can be used on their own or in conjunction with {{IconName|Data Type}} [[Data Type]]s for more complex data extraction and collation.<section end="Value Reader" />
==== Field Class ====
''{{TypeName|Field Class}}''
<section begin="Field Class" />{{FieldClassIcon}} '''[[Field Class]]es''' are NLP (natural language processing) based extractor nodes. They find values based on some natural language context near that value. Values are positively or negatively associated with text-based "features" nearby by training the extractor. During [[Data Extraction|extraction]], the extractor collects values based on these training weightings.
* '''Field Classes''' are most useful when attempting to find values within the flow of natural language.
* '''Field Classes''' ''can'' be configured to distinguish values within highly structured documents, but this type of extraction is better suited to simpler "extractor nodes" like {{ValueReaderIcon}} '''[[Value Reader]]s''' or {{DataTypeIcon}}  '''[[Data Type]]s'''.
*<li class="attn-bullet"> Advances in large-language models (LLMs) have largely made '''Field Classes''' obsolete. LLM-based extraction methods in Grooper (such as [[AI Extract]]) can achieve similar results with nowhere near the amount of set up.<section end="Field Class" />
</div>
=== File Store ===
''{{TypeName|File Store}}''
<section begin="File Store" />{{FileStoreIcon}} '''[[File Store]]''' nodes are a key part of Grooper's "database and file store" architecture. They define a storage location where file content associated with Grooper nodes are saved. This allows processing tasks to create, store and manipulate content related to documents, images, and other "files".
:*<li class="fyi-bullet" style="padding-left:20px"> Not every node in Grooper will have files associated with it, but if it does, those files are stored in the Windows folder location defined by the '''File Store''' node.<section end="File Store" />
=== Folder ===
''{{TypeName|Folder}}''
<div style="padding-left: 2em;">
==== Batches Folder ====
''{{TypeName|Batches Folder}}''
==== Projects Folder ====
''{{TypeName|Projects Folder}}''
==== Machines Folder ====
''{{TypeName|Machines}}''
==== Local Resources Folder ====
''{{TypeName|Local Resources Folder}}''
</div>
=== IP Elements ===
''{{TypeName|IP Element}}''
<div style="padding-left: 2em;">
==== IP Group ====
''{{TypeName|IP Group}}''
<section begin="IP Group" />{{IPGroupIcon}} '''[[IP Group]]s''' are containers of {{IPStepIcon}} '''[[IP Step]]s''' and/or '''IP Groups''' that can be added to {{IPProfileIcon}} '''[[IP Profile]]s'''. '''IP Groups''' add hierarchy to '''IP Profiles'''. They serve two primary purposes:
# They can be used simply to organize '''[[IP Step]]s''' for '''IP Profiles''' with large numbers of steps.
# They are often used with "Should Execute Expressions" and "Next Step Expressions" to conditionality execute a sequence of '''IP Steps'''.<section end="IP Group" />
==== IP Profile ====
''{{TypeName|IP Profile}}''
<section begin="IP Profile" />{{IPProfileIcon}} '''[[IP Profile]]s''' are a step-by-step list of image processing operations ([[IP Command]]s). They are used for several image processing related operations, but primarily for:
# Permanently enhancing an image during the [[Image Processing (Activity)|Image Processing]] activity (usually to get rid of defects in a scanned image, such as skewing or borders).
# Cleaning up an image in-memory during the [[Recognize]] activity without altering the image to improve OCR accuracy.
# Computer vision operations that collect layout data (table line locations, OMR checkboxes, barcode value and more) utilized in data extraction.<section end="IP Profile" />
==== IP Step ====
''{{TypeName|IP Step}}''
<section begin="IP Step" />{{IPStepIcon}} '''[[IP Step]]s''' are the basic units of an {{IPProfileIcon}} '''[[IP Profile]]'''. They define a single image processing operation, called an [[IP Command]] in Grooper.<section end="IP Step" />
</div>
=== Lexicon ===
''{{TypeName|Lexicon}}''
<section begin="Lexicon" />{{LexiconIcon}} '''[[Lexicon]]s''' are dictionaries used throughout Grooper to store lists of words, phrases, weightings for [[Fuzzy RegEx]], and more. Users can add entries to a '''Lexicon''', '''Lexicons''' can import entries from other '''Lexicons''' by referencing them, and entries can be dynamically imported from a database using a {{DataConnectionIcon}} '''[[Data Connection]]'''. '''Lexicons''' are commonly used to aid in data extraction, with the "[[List Match]]" and "[[Word Match]]" extractors utilizing them most commonly.<section end="Lexicon" />
=== Machine ===
''{{TypeName|Machine}}''
<section begin="Machine" />{{MachineIcon}} '''[[Machine]]''' nodes represent servers that have connected to the [[Grooper Repository]]. They are essential for distributing task processing loads across multiple servers. Grooper creates '''Machine''' nodes automatically whenever a server makes a new connection to a Grooper Repository's database. Once added, '''Machine''' nodes can be used to view server information and to manage [[Grooper Service]] instances.<section end="Machine" />
=== OCR Profile ===
''{{TypeName|OCR Profile}}''
<section begin="OCR Profile" />{{OCRProfileIcon}} '''[[OCR Profile]]s''' store configuration settings for optical character recognition ([https://en.wikipedia.org/wiki/Optical_character_recognition OCR]). They are used by the '''[[Recognize]]''' activity to convert images of text on {{BatchPageIcon}} '''[[Batch Page]]s''' into machine-encoded text. '''OCR Profiles''' are highly configurable, allowing fine-grained control over how OCR occurs, how pre-OCR image cleanup occurs, and how Grooper's [[OCR Synthesis]] occurs. All this works to the end goal of highly accurate OCR text data, which is used to classify documents, extract data and more.<section end="OCR Profile" />
=== Object Library ===
''{{TypeName|Object Library}}''
<section begin="Object Library" />{{ObjectLibraryIcon}} '''[[Object Library]]''' nodes are [https://en.wikipedia.org/wiki/.NET_Framework .NET] [https://en.wikipedia.org/wiki/Library_(computing) libraries] that contain code files for customizing the Grooper's functionality. These libraries are used for a range of customization and integration tasks, allowing users to extend Grooper's capabilities.
: Examples include:
:* Adding custom [[Activity|Activities]] that execute within '''[[Batch Process]]es'''
:* Creating custom commands available during the [[Review]] activity and in the Design page.
:* Defining custom [https://en.wikipedia.org/wiki/Method_(computer_programming) methods] that can be called from [[Expressions|code expressions]] on '''Data Field''' and '''Batch Process Step''' objects.
:* Creating custom Connection Types for '''[[CMIS Connection]]s''' for import/export operations from/to CMS systems.
:* Establish custom [[Grooper Service]]s that perform automated background tasks at regular intervals<section end="Object Library" />
=== Project ===
''{{TypeName|Project}}''
<section begin="Project" />{{ProjectIcon}} '''[[Project]]s''' are the primary containers for configuration nodes within Grooper. The '''Project''' is where various processing objects such as {{ContentModelIcon}} '''[[Content Model]]s''', {{BatchProcessIcon}} '''[[Batch Process]]es''', [[Object Nomenclature (Concept)#Profile_Objects|profile objects]] are stored. This makes resources easier to manage, easier to save, and simplifies how node references are made in a Grooper Repository.<section end="Project" />
=== Resource File ===
''{{TypeName|Resource File}}''
<section begin="Resource File" />'''[[Resource File]]s''' are nodes you can add to a {{ProjectIcon}} '''[[Project]]''' and store any kind of file. Each '''Resource File''' stores one file. While you can use '''Resource Files''' to store any kind of file in a '''Project''', there are several areas in Grooper that can reference '''Resource Files''' to one end or another, including XML schema files used for Grooper's [[XML Schema Integration]].<section end="Resource File" />
=== Root ===
''{{TypeName|Root}}''
<section begin="Root" />The Grooper {{GrooperRootIcon}} '''[[Root]]''' node is the topmost element of the [[Grooper Repository]]. All other nodes in a Grooper Repository are its children/descendants. The Grooper '''Root''' also stores several settings that apply to the Grooper Repository, including the license serial number or license service URL and [[Repository Option]]s.<section end="Root" />
=== Scanner Profile ===
''{{TypeName|Scanner Profile}}''
<section begin="Scanner Profile" />{{ScannerProfileIcon}} '''[[Scanner Profile]]s''' store configuration settings for operating a document scanner. '''Scanner Profiles''' provide users operating the [[Scan Viewer]] in the [[Review]] activity a quick way to select pre-saved scanner configurations.<section end="Scanner Profile" />
=== Separation Profile ===
''{{TypeName|Separation Profile}}''
<section begin="Separation Profile" />{{SeparationProfileIcon}} '''[[Separation Profile]]s''' store settings that determine how {{BatchPageIcon}} '''[[Batch Page]]'''s are separated into {{BatchFolderIcon}} '''[[Batch Folder]]s'''. '''Separation Profiles''' can be referenced in two ways:
* In a [[Review]] activity's [[Scan Viewer]] settings to control how pages are separated in real time during scanning.
* In a [[Separate]] activity as an alternative to configuring separation settings locally.<section end="Separation Profile" />
=== Work Queue ===
''{{TypeName|Work Queue}}''
<div style="padding-left: 2em;">
==== Processing Queue ====
''{{TypeName|Processing Queue}}''
<section begin="Processing Queue" />{{ProcessingQueueIcon}} '''[[Processing Queue]]s''' help automate "machine performed tasks" (Those are [[Activity|Code Activity]] tasks performed by {{MachineIcon}} '''[[Machine]]s''' and their [[Activity Processing]] services). '''Processing Queues''' are assigned to '''Batch Process Steps''' to distribute tasks, control the maximum processing rate, and set the "concurrency mode" (specifying if and how parallelism can occur across one or more servers).
* '''Processing Queues''' are used to dedicate Activity Processing services with a capped number of processing threads to resource intensive activities, such as [[Recognize]]. That way, these compute hungry tasks won't gobble up all available system resources.
* '''Processing Queues''' are also used to manage activities, such as [[Render]], who can only have one activity instance running per machine (This is done by changing the queue's Concurrency Mode from "Maximum" to "Per Machine").
* '''Processing Queues''' are also used to throttle [[Export]] tasks in scenarios where the export destination can only accept one document at a time.<section end="Processing Queue" />
==== Review Queue ====
''{{TypeName|Review Queue}}''
<section begin="Review Queue" />{{ReviewQueueIcon}} '''[[Review Queue]]s''' help organize and filter human-performed [[Review]] activity tasks. User groups are assigned to each '''Review Queue''', which is then set either on a {{BatchProcessIcon}} '''[[Batch Process]]''' or a Review step. Based on a user's membership in '''Review Queues''', this will affect how {{BatchIcon}} '''[[Batch]]es''' are distributed in the Batches page and how Review tasks are distributed in the Tasks page.<section end="Review Queue" />
</div>
== Core Configuration Types ==
In Grooper, nodes are configured by editing their property settings. The following are configurable items that are considered a "core" part of Grooper. These objects are designed to be part of a larger configuration.
* These "core configuration types" are found most commonly in the property settings on a node in the Grooper node tree.
* However, they may also be configured when configuring commands or as part of a larger property configuration.
This includes:
* [[#Activity|Activities]]
* [[#Behaviors|Behaviors]]
* [[#Classify Method|Classify Methods]]
* [[#IP Command|IP Commands]]
* [[#OCR Engine|OCR Engines]]
* [[#Repository Option|Repository Options]]
* [[#Separation Provider|Separation Providers]]
* [[#Service|Services]]
*<li class="fyi-bullet"> Scripting/Advanced user info: These objects inherit from a base class called "Embedded Object". This is includes a large number of objects that exist as configurable properties.
== Activity ==
== Activity ==
<section begin="Activity" />
''{{TypeName|Activity}}''
'''''[[Activity (Property)|Activity]]''''' is a property on [[image:GrooperIcon_BatchProcessStep.png]] '''[[Batch Process Step]]''' objects. '''''Activities''''' define specific document processing operations done to a [[image:GrooperIcon_Batch.png]] '''[[Batch (Object)|Batch]]''', [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder]]''', or [[image:GrooperIcon_BatchPage.png]] '''[[Batch Page]]'''.
 
<section begin="Activity" />Grooper [[Activity|Activities]] define specific document processing operations done to a {{BatchIcon}} '''[[Batch]]''', {{BatchFolderIcon}} '''[[Batch Folder]]''', or {{BatchPageIcon}} '''[[Batch Page]]'''. In a {{BatchProcessIcon}} '''[[Batch Process]]''', each {{BatchProcessStepIcon}} '''[[Batch Process Step]]''' executes a single Activity (determined by the step's "Activity" property).
:*<li class="fyi-bullet" style="padding-left:20px"> '''Batch Process Steps''' are frequently referred by the name of their configured Activity followed by the word "step". For example: "Classify step".<section end="Activity" />
<div style="padding-left: 1.5em;">
=== Attended Activities ===
''{{TypeName|Attended Activity}}''
 
Attended Activities are type of [[Activity]] in Grooper that require direct user interaction within a {{BatchProcessIcon}} [[Batch Process]] workflow. Attended Activities are designed for steps where human review, validation or intervention is necessary (or automated processing is simply  insufficient). The only current Attended Activity in Grooper is {{IconName|person_search}} [[Review]].
<div style="padding-left: 1.5em;">
==== Review ====
''{{TypeName|Review}}''
 
<section begin="Review" />{{IconName|person_search}} [[Review]] is an [[Activity]] that allows user attended review of Grooper's results.  This allows human operators to validate processed {{BatchPageIcon}} '''[[Batch Page]]''' and  {{BatchFolderIcon}} '''[[Batch Folder]]''' content using specialized user interfaces called "Viewers". Different kinds of Viewers assist users in reviewing  Grooper's image processing, document classification, data extraction and operating document scanners.<section end="Review" />
</div>
 
=== Code Activities ===
''{{TypeName|Code Activity}}''
<div style="padding-left: 1.5em;">
==== AI Dialogue ====
<section begin="AI Dialogue" />''BE AWARE: AI Analysts and AI Dialogue are obsolete as of version 2025. This Activity only exists in version 2024.'' {{AIDialogueIcon}} [[AI Dialogue]] is an [[Activity]] that executes a scripted conversation with an {{AIAnalystIcon}} '''[[AI Analyst]]''' and saves the resulting conversion on the document as a [https://en.wikipedia.org/wiki/JSON JSON] file.<section end="AI Dialogue" />


'''Batch Process Steps''' configured with specific '''''Activities''''' are frequently referred by the name of the '''''Activity''''' followed by the word "step". For example: '''Classify Step'''.
==== Apply Rules ====
<section end="Activity" />
''{{TypeName|Apply Rules}}''


=== Classify ===
<section begin="Apply Rules" />{{ApplyRulesIcon}} [[Apply Rules]] is an [[Activity]] that runs {{DataRuleIcon}} '''[[Data Rules]]''' on data that has previously been extracted from documents ({{BatchFolderIcon}} '''[[Batch Folder]]s''').
<section begin="Classify" />
* The Apply Rules activity will always need to run ''after'' an [[Extract]] activity runs (An Extract step must come before an Apply Rules step in the order of {{BatchProcessStepIcon}} '''[[Batch Process Step]]s''' in a {{BatchProcessIcon}} '''[[Batch Process]]''').<section end="Apply Rules" />
'''''[[Classify (Activity)|Classify]]''''' is an '''''[[Activity (Property)|Activity]]''''' that "classifies" [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folders]]''' in a [[image:GrooperIcon_Batch.png]] '''[[Batch (Object)|Batch]]''' by assigning them a '''[[Content Type (Concept)|Content Type]]''' using patterns, lexical understanding, or rules as defined by a [[image:GrooperIcon_ContentModel.png]] '''[[Content Model (Object)|Content Model]]'''.
<section end="Classify" />


=== Clip Frames ===
==== Attach ====
<section begin="Clip Frames" />
''{{TypeName|Attach}}''
The '''''[[Clip Frames (Activity)|Clip Frames]]''''' '''''[[Activity (Property)|Activity]]''''' extracts defined areas from [https://en.wikipedia.org/wiki/Microform microfiche] card images, creating new image frames or layers for focused analysis or processing.
<section end="Clip Frames" />


=== Detect Frames ===
<section begin="Attach" />{{AttachIcon}} [[Attach]] is an [[Activity]] that physically moves and nests documents within a {{BatchFolderIcon}} [[Batch Folder]] based on attachment markers set by the {{MarkAttachmentsIcon}} [[Mark Attachments]] activity. It consolidates related documents—such as addenda or supporting documents—under their host documents, updating the {{BatchIcon}} [[Batch]] hierarchy for downstream processing.<section end="Attach" />
<section begin="Detect Frames" />
The '''''[[Detect Frames (Activity)|Detect Frames]]''''' '''''[[Activity (Property)|Activity]]''''' locates and identifies frame lines on [https://en.wikipedia.org/wiki/Microform microfiche] card images, enabling the isolation of areas within the frames for further data extraction or processing.
<section end="Detect Frames" />


=== Execute ===
==== Batch Transfer ====
<section begin="Execute" />
''{{TypeName|Batch Transfer}}''
The '''''[[Execute (Activity)|Execute]]''''' '''''[[Activity (Property)|Activity]]''''' runs a specified child command, allowing for the modular and controlled execution of tasks within a larger automated workflow.
<section end="Execute" />


=== Export ===
<section begin="Batch Transfer" />{{BatchTransferIcon}} [[Batch Transfer]] is an [[Activity]] that <section end="Batch Transfer" />
<section begin="Export" />
The '''''[[Export (Activity)|Export]]''''' '''''[[Activity (Property)|Activity]]''''' facilitates the transfer of documents and extracted information to external systems or formats, completing the data processing workflow.
<section end="Export" />


=== Extract ===
==== Burst Book ====
<section begin="Extract" />
''{{TypeName|Burst Book}}''
The '''''[[Extract (Activity)|Extract]]''''' '''''[[Activity (Property)|Activity]]''''' retrieves relevant information, defined by '''[[Data Element (Concept)|Data Elements]]''', from [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folders]]''', transforming unstructured or semi-structured content into structured, usable data.
<section end="Extract" />


=== Image Processing ===
<section begin="Burst Book" />{{BurstBookIcon}} [[Burst Book]] is an [[Activity]] that <section end="Burst Book" />
<section begin="Image Processing" />
 
The '''''[[Image Processing (Activity)|Image Processing]]''''' '''''[[Activity (Property)|Activity]]''''' enhances and optimizes [[image:GrooperIcon_BatchPage.png]] '''[[Batch Page|Batch Pages]]''' for better recognition and data extraction results.
==== Classify ====
<section end="Image Processing" />
''{{TypeName|Classify}}''
 
<section begin="Classify" />{{ClassifyIcon}} [[Classify]] is an [[Activity]] that "classifies" {{BatchFolderIcon}} '''[[Batch Folder]]s''' in a {{BatchIcon}} '''[[Batch]]''' by assigning them a {{DocumentTypeIcon}} '''[[Document Type]]'''.
* Classification is key to Grooper's document processing. It affects how data is extracted from a document (during the [[Extract]] activity) and how [[Behavior]]s are applied.
* Classification logic is controlled by a '''Content Model's''' "[[Classify Method]]". These methods include using text patterns, previously trained document examples, and [[Label Sets]] to identify documents.<section end="Classify" />
 
==== Clip Frames ====
<section begin="Clip Frames" />{{ClipFramesIcon}} [[Clip Frames]] is a specialized [[Activity]] for [[Microfiche Processing|processing microfiche in Grooper]]. It extracts defined areas from microfiche card images, creating new image frames or layers for focused analysis or processing.<section end="Clip Frames" />
 
==== Convert Data ====
<section begin="Convert Data" />{{ConvertDataIcon}} '''[[Convert Data]]''' is an [[Activity]] that converts a document ({{BatchFolderIcon}} '''[[Batch Folder]]''') to another {{DocumentTypeIcon}} '''[[Document Type]]''' using Data Actions to copy and convert Data Elements from the source Document Type to those in the target Document Type. Convert Data is a specialized Activity for use cases requiring a great deal of data transformation before export.<section end="Convert Data" />
 
==== Correct ====
<section begin="Correct" />{{IconName|Correct}} [[Correct]] is an [[Activity]] that performs spell correction.  It can correct a {{BatchFolderIcon}} '''[[Batch Folder|Batch Folder's]]''' text content or specific '''[[Data Element]]''' values to resolve OCR errors, deidentify data or otherwise enhance text data.<section end="Correct" />
 
==== Deduplicate ====
<section begin="Deduplicate" />{{DeduplicateIcon}} [[Deduplicate]] is an [[Activity]] that <section end="Deduplicate" />
 
==== Detect Frames ====
<section begin="Detect Frames" />{{DetectFramesIcon}} [[Detect Frames]] is a specialized [[Activity]] for [[Microfiche Processing|processing microfiche in Grooper]]. It locates and identifies frame lines on microfiche card images, enabling the isolation of areas within the frames for further [[Data Extraction|data extraction]] or processing.<section end="Detect Frames" />
 
==== Detect Language ====
{{TypeName|Detect Language}}
 
<section begin="Detect Language" />{{IconName|Detect Language}} [[Detect Language]] is an [[Activity]] that uses a large language model (LLM) to determine the primary language (English, Spanish, French, etc.) of a document. Activities executed downstream, such as {{IconName|Extract}} [[Extract]], can use this information to apply language specific logic.<section end="Detect Language" />
 
==== Execute ====
<section begin="Execute" />{{ExecuteIcon}} [[Execute]] is an [[Activity]] that runs one or more specified object commands. This gives access to a variety of Grooper commands in a {{BatchProcessIcon}} '''[[Batch Process]]''' for which there is no Activity, such as the "Sort Children" command for '''Batch Folders''' or the "Expand Attachments" command for email attachments.<section end="Execute" />
 
==== Export ====
<section begin="Export" />{{ExportIcon}} [[Export]] is an [[Activity]] that transfers documents and extracted information to external file systems and content management systems, completing the data processing workflow.<section end="Export" />
 
==== Extract ====
<section begin="Extract" />{{ExtractIcon}} [[Extract]] is an [[Activity]] that retrieves information from {{BatchFolderIcon}} '''[[Batch Folder]]''' documents, as defined by '''[[Data Element]]s''' in a {{DataModelIcon}} '''[[Data Model]]'''.  This is how Grooper locates unstructured data on your documents and collects it in a structured, usable format.<section end="Extract" />
 
==== Image Processing ====
<section begin="Image Processing" />{{ImageProcessingIcon}} [[Image Processing (Activity)|Image Processing]] is an [[Activity]] that enhances {{BatchPageIcon}} '''[[Batch Page]]''' images and optimizes them for better [[OCR]] text recognition and [[Data Extraction|data extraction]] results.<section end="Image Processing" />
 
==== Initialize Card ====
<section begin="Initialize Card" />{{InitializeCardIcon}} [[Initialize Card]] is a specialized [[Activity]] for [[Microfiche Processing|processing microfiche in Grooper]]. It prepares and configures microfiche card images for further processing.<section end="Initialize Card" />
 
==== Launch Process ====
<section begin="Launch Process" />{{LaunchProcessIcon}} [[Launch Process]] is an [[Activity]] that <section end="Launch Process" />
 
==== Mark Attachments ====
<section begin="Mark Attachments" />
''{{TypeName|Mark Attachments}}''
 
{{IconName|Mark Attachments}} [[Mark Attachments]] is an [[Activity]] that analyzes documents ({{BatchFolderIcon}} [[Batch Folder]]s) to determine attachment relationships using configurable rules ("Attachment Rules"). It sets attachment markers on documents—indicating whether they should be attached to neighboring Batch Folders. These markers are then used by the [[Attach]] activity to group and nest related documents.<section end="Mark Attachments" />
 
==== Merge ====
<section begin="Merge" />{{MergeIcon}} '''[[Merge]]''' is an [[Activity]] that creates a PDF, TIF, XML or ZIP file from the page and data content of a '''[[Batch Folder]]''' and saves it to that '''Batch Folder'''.<section end="Merge" />
 
==== Recognize ====
<section begin="Recognize" />{{IconName|Recognize}} [[Recognize]] is an [[Activity]] that obtains machine-readable text from {{IconName|Batch Page}} '''[[Batch Page]]s''' and {{IconName|Batch Folder}} '''[[Batch Folder]]s'''. When properly configured with an {{IconName|OCR Profile}}'''[[OCR Profile]]''', Recognize will selectively perform [[OCR]] for images and native-text extraction for digital text in PDFs.  Recognize can also reference an {{IconName|IP Profile}}'''[[IP Profile]]''' to collect "layout data" like lines, checkboxes, and barcodes.  Other Activities then use this machine-readable text and layout data for document analysis and [[Data Extraction|data extraction]].<section end="Recognize" />
 
==== Redact ====
<section begin="Redact" />{{RedactIcon}} [[Redact]] is an [[Activity]] that visibly obscures (or "redacts") text information on an page based on results returned from a [[Data Extractor|extractor]]. Be aware, Redact does ''not'' alter the text data. It only alters the image.<section end="Redact" />
 
==== Remove Level====
<section begin="Remove Level" />{{IconName|Remove Level}} [[Remove Level]] is an [[Activity]] that <section end="Remove Level" />


=== Initialize Card ===
==== Render ====
<section begin="Initialize Card" />
<section begin="Render" />{{IconName|Render}} [[Render]] is an [[Activity]] that converts files of various formats to PDF.  It does this by digitally printing the file to PDF using the Grooper Render Printer.  This normalizes electronic document content from file formats Grooper cannot read natively to PDF (which it can read natively), allowing Grooper to extract the text via the {{IconName|Recognize}} [[Recognize]] Activity.<section end="Render" />
The '''''[[Initialize Card (Activity)|Initialize Card]]''''' '''''[[Activity (Property)|Activity]]''''' prepares and configures [https://en.wikipedia.org/wiki/Microform microfiche] card images for further processing.
<section end="Initialize Card" />


=== Recognize ===
==== Route ====
<section begin="Recognize" />
<section begin="Route" />{{IconName|Route}} [[Route]] is an [[Activity]] that <section end="Route" />
The '''''[[Recognize (Activity)|Recognize]]''''' '''''[[Activity (Property)|Activity]]''''' interprets [[image:GrooperIcon_BatchPage.png]] '''[[Batch Page|Batch Pages]]''' and [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folders]]''', converting them into machine-readable text and capturing layout data for comprehensive analysis and data extraction. This will attach a text and/or layoutData file to the respective object.
<section end="Recognize" />


=== Render ===
==== Send Mail ====
<section begin="Render" />
<section begin="Send Mail" />{{SendMailIcon}} [[Send Mail]] is an [[Activity]] automates email notifications from Grooper based on events and conditions set by a {{BatchProcessIcon}} '''[[Batch Process]]'''. Optionally, documents in the {{BatchIcon}} '''[[Batch]]''' may be attached to the generated email.<section end="Send Mail" />
The '''''[[Render (Activity)|Render]]''''' '''''[[Activity (Property)|Activity]]''''' normalizes electronic document content from file formats '''Grooper''' cannot read innately to a [https://en.wikipedia.org/wiki/PDF PDF format]. This allows '''Grooper''' to extract the text via the '''''[[Recognize (Activity)|Recognize]]''''' '''''[[Activity (Property)|Activity]]'''''.
<section end="Render" />


=== Review ===
==== Separate ====
<section begin="Review" />
<section begin="Separate" />{{SeparateIcon}} [[Separate]] is an [[Activity]] that sorts {{BatchPageIcon}} '''[[Batch Page]]s''' into individual {{BatchFolderIcon}} '''[[Batch Folder]]s'''.  This distinguishes "loose pages" from the documents formed by those pages.  Once loose pages are separated into '''Batch Folder''' documents, they can be further processed by {{ClassifyIcon}} [[Classify]], {{ExtractIcon}} [[Extract]], {{ExportIcon}} [[Export]] and other Activities that need to run on the folder (i.e. document) level.<section end="Separate" />
The '''''[[Review (Activity)|Review]]''''' '''''[[Activity (Property)|Activity]]''''' facilitates human evaluation and validation of processed [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folders]]''' and extracted data for accuracy and completeness.
<section end="Review" />


=== Send Mail ===
==== Spawn Batch ====
<section begin="Send Mail" />
<section begin="Spawn Batch" />{{IconName|Spawn Batch}} [[Spawn Batch]] is an [[Activity]] that <section end="Spawn Batch" />
The '''''[[Send Mail (Activity)|Send Mail]]''''' '''''[[Activity (Property)|Activity]]''''' automates the dispatch of emails with or without attachments, based on [[image:GrooperIcon_BatchProcess.png]] '''[[Batch Process (Object)|Batch Process]]''' events and conditions.
<section end="Send Mail" />


=== Separate ===
==== Split Pages ====
<section begin="Separate" />
<section begin="Split Pages" />Multi-page PDF and TIF files come into Grooper as files attached to single {{BatchFolderIcon}} '''[[Batch Folder]]s'''. [[Split Pages]] is an [[Activity]] that creates child {{BatchPageIcon}} '''[[Batch Page]]s''' for each page in the PDF or TIF. This allows Grooper to process and handle these pages as individual objects.<section end="Split Pages" />
The '''''[[Separate (Activity)|Separate]]''''' '''''[[Activity (Property)|Activity]]''''' sorts [[image:GrooperIcon_BatchPage.png]] '''[[Batch Page|Batch Pages]]''' into individual [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folders]]''', distinguishing them for independent processing and organization.
<section end="Separate" />


=== Split Pages ===
==== Split Text ====
<section begin="Split Pages" />
<section begin="Split Text" />{{SplitTextIcon}} [[Split Text]] is an [[Activity]] that <section end="Split Text" />
Multi-page documents (typically [https://en.wikipedia.org/wiki/PDF PDFs] and [https://en.wikipedia.org/wiki/TIFF TIFFs]) come into '''Grooper''' represented as single [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folders]]'''. The '''''[[Split Pages (Activity)|Split Pages]]''''' '''''[[Activity (Property)|Activity]]''''' exposes [[image:GrooperIcon_BatchPage.png]] '''[[Batch Page|Batch Pages]]''' as child objects of the [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folders]]''' for individualized processing and handling.
<section end="Split Pages" />


=== XML Transform ===
==== Text Transform ====
<section begin="XML Transform" />
<section begin="Text Transform" />{{TextTransformIcon}} [[Text Transform]] is an [[Activity]] that <section end="Text Transform" />
The '''''[[XML Transform (Activity)|XML Transform]]''''' '''''[[Activity (Property)|Activity]]''''' applies [https://en.wikipedia.org/wiki/XSLT XSLT] stylesheets to [https://en.wikipedia.org/wiki/XML XML] data to modify or reformat the output structure for various purposes.
<section end="XML Transform" />


== Application ==
==== Train Lexicon ====
A '''Grooper''' [[Repository (Concept)|repository]] consists of a series of [https://en.wikipedia.org/wiki/Table_(information) tables] in a [https://en.wikipedia.org/wiki/Database database], and a '''[[File Store (Object)|File Store]]''' containing relevant files associated to objects that exist within that database. An '''Grooper''' [https://en.wikipedia.org/wiki/Application_software application] is the interface by which a user can interact with that repository of information in an intuitive way.
<section begin="Train Lexicon" />{{IconName|Train Lexicon}} [[Train Lexicon]] is an [[Activity]] that <section end="Train Lexicon" />


=== Grooper Command Console ===
==== Translate ====
The '''[[Grooper Command Console (Application)|Grooper Command Console]]''' is a [https://en.wikipedia.org/wiki/Command-line_interface command-line interface] that performs system configuration and administration tasks within '''Grooper'''.
<section begin="Translate" />{{TranslateIcon}} [[Translate]] is an [[Activity]] that <section end="Translate" />


=== Web Client ===
==== XML Transform ====
The '''[[Web Client (Application)|Grooper Web Client]]''' allows users to connect to '''Grooper''' via a [https://en.wikipedia.org/wiki/Web_browser web browser] using a [https://en.wikipedia.org/wiki/URL URL]. The URL is pointed at a [https://en.wikipedia.org/wiki/Website website] hosted by a [https://en.wikipedia.org/wiki/Server_(computing) server] on which '''Grooper''' is installed and [https://en.wikipedia.org/wiki/Internet_Information_Services Internet Information Services] configured.
<section begin="XML Transform" />{{IconName|XML Transform}} [[XML Transform]] is an [[Activity]] that applies [https://en.wikipedia.org/wiki/XSLT XSLT] stylesheets to XML data to modify or reformat the output structure for various purposes.<section end="XML Transform" />
</div>


== Behavior ==
== Behavior ==
<section begin="Behavior" />
<section begin="Behavior" />
'''[[Content Type (Concept)|Content Type]]''' and '''''[[Export (Activity)|Export]]''''' '''''[[Behaviors (Property)|Behaviors]]''''' are configurable actions that automate processing tasks based on the identified '''Content Type''' of a [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folder]]'''.
A "'''[[Behavior]]'''" is one of several features applied to a [[Content Type]] (such as a {{DocumentTypeIcon}} [[Document Type]]). Behaviors affect how certain Activities and Commands are executed, based how a document ({{BatchFolderIcon}} [[Batch Folder]]) is classified. They ''behave'' differently, according to their Document Type. This includes how they are exported (how [[Export]] behaves), if and how they are added to a document search index (how the various [[AI Search|indexing]] commands behave), and if and how [[Label Sets]] are used (how [[Classify]] and [[Extract]] behave in the presence of Label Sets).
<section end="Behavior" />
* Each Behavior is enabled by adding it to a Content Type. They are configured in the Behaviors editor.
* Behaviors extend to descendent Content Types, if the descendent Content Types has no Behavior configuration of its own.
** For example, all Document Types will inherit their parent Content Model's Behaviors.
** However, if a Document Type has its own Behavior configuration, it will be used instead.<section end="Behavior" />
<div style="padding-left: 1.5em;">
=== Export Behavior ===
<section begin="Export Behavior" />An [[Export Behavior]] defines the parameters for exporting classified {{BatchFolderIcon}} '''[[Batch Folder]]''' content from '''Grooper''' to other systems. This includes where they are exported to (what content management system, file system, database etc), what content is exported (attached files, images, and/or data), how it is formatted (PDF, CSV, XML etc), folder pathing, file naming and data mappings (for [[Data Export]] and [[CMIS Export]]).<section end="Export Behavior" />


=== Export Behavior ===
=== Import Behavior ===
<section begin="Export Behavior" />
<section begin="Import Behavior" />An [[Import Behavior]] defines how data is mapped from files in an external content management system to '''Batch Folders''' created on import when using [[CMIS Import]].<section end="Import Behavior" />
An '''''[[Export Behavior (Behavior)|Export Behavior]]''''' defines the conditions and actions for exporting [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folders]]''' and their associated data from '''Grooper''' to other systems.
 
<section end="Export Behavior" />
=== Indexing Behavior ===
<section begin="Indexing Behavior" />An [[Indexing Behavior]] allows documents ({{BatchFolderIcon}} [[Batch Folder]]s) to be indexed via [[AI Search]]. Once indexed, users can search for and retrieve documents from the [[Search Page]].<section end="Indexing Behavior" />


=== Labeling Behavior ===
=== Labeling Behavior ===
<section begin="Labeling Behavior" />
<section begin="Labeling Behavior" />A [[Labeling Behavior]] extends "label set" functionality to {{DocumentTypeIcon}} [[Document Type]]s. This allows you to collect field labels and other labels present on a document and use them in a variety of ways. This includes functionality for [[Labeling_Behavior_(Behavior)#How_to_use_Label_Sets_for_classification|classification]], [[Labeling_Behavior_(Behavior)#How_to_use_Label_Sets_for_field_based_extraction|field extraction]], [[Labeling_Behavior_(Behavior)#How_to_use_Label_Sets_for_tabular_extraction|table extraction]], and [[Labeling_Behavior_(Behavior)#How_to_use_Label_Sets_for_section_extraction|section extraction]].<section end="Labeling Behavior" />
A '''''[[Labeling Behavior (Behavior)|Labeling Behavior]]''''' is a '''[[Content Type (Concept)|Content Type]]''' '''''[[Behaviors (Property)|Behavior]]''''' designed to collect and utilize a document's field labels in a variety of ways. This includes functionality for '''''[[Classify (Activity)|Classification]]''''' and '''''[[Extract (Activity)|Extraction]]'''''.
<section end="Labeling Behavior" />


=== PDF Data Mapping ===
=== PDF Data Mapping ===
<section begin="PDF Data Mapping" />
<section begin="PDF Data Mapping" />[[PDF Data Mapping]] is a [[Behavior]] that enhances PDF files generated by the [[Merge]] or [[Export]] activities with metadata, bookmarks, annotations and/or different kinds of widgets.<section end="PDF Data Mapping" />
'''''[[PDF Data Mapping (Behavior)|PDF Data Mapping]]''''' is a '''[[Content Type (Concept)|Content Type]]''' '''''[[Behaviors (Property)|Behavior]]''''' designed to create an exportable [https://en.wikipedia.org/wiki/PDF PDF] file with additional native PDF elements.
 
<section end="PDF Data Mapping" />
=== Text Rendering ===
<section begin="Text Rendering" />[[Text Rendering]] is a [[Behavior]] that causes text documents (e.g. TXT files) to be interpreted and displayed as paginated documents rather than a raw text stream.
:*<li class="fyi-bullet" style="padding-left:20px"> By default, this renders TXT files to an 8.5 by 11 inch page format, but this can be altered in the Text Rendering settings.<section end="Text Rendering" />
</div>


== CMIS Connection Type ==
== Classify Method ==
<section begin="CMIS Connection Type" />
<section begin="Classify Method" />"[[Classify Method]]s" define [[Classification (Concept)|classification]] logic used by {{ContentModelIcon}} [[Content Model]]s during the {{ClassifyIcon}} [[Classify]] activity. Classify Methods organize document content in Grooper by assigning {{BatchFolderIcon}} [[Batch Folder]]s a {{DocumentTypeIcon}} [[Document Type]].
'''''CMIS Connection Type''''', or "binding", establishes the communication protocols used to connect '''Grooper''' with content management systems adhering to the [https://en.wikipedia.org/wiki/Content_Management_Interoperability_Services CMIS] standard.
* Classify Methods analyze documents (Batch Folders) to determine what kind of document it is.
<section end="CMIS Connection Type" />
* Each Classify Methods analyzes documents according to different methodologies  to organize documents accurately. This includes text-based pattern matching, computer vision, machine learning models, [[Labeling Behavior|label sets]] and more.
* Classify Methods are configured by setting and configuring a Content Model's "Classification Method" property.<section end="Classify Method" />
<div style="padding-left: 1.5em;">
=== GPT Embeddings ===
<section begin="GPT Embeddings" />''BE AWARE: GPT Embeddings is obsolete as of version 2025. The [[LLM Classifier]] and [[Search Classifier]] methods are the new and improved AI-enabled classification methods.'' [[GPT Embeddings]] is a [[Classify Method]] that uses an [https://en.wikipedia.org/wiki/OpenAI OpenAI] embeddings model and trained document samples to tell one document from another.<section end="GPT Embeddings" />
 
=== Labelset-Based ===
<section begin="Labelset-Based" />"[[Labelset-Based]]" is a [[Classify Method]] that leverages the labels defined via a [[Labeling Behavior]] to classify {{BatchFolderIcon}} '''[[Batch Folder]]s'''.<section end="Labelset-Based" />


=== AppXtender ===
=== Lexical ===
<section begin="AppXtender" />
<section begin="Lexical" />"[[Lexical]]" is a [[Classify Method]] that classifies {{BatchFolderIcon}} '''[[Batch Folder]]s''' based on the text content of trained document examples. This is achieved through the statistical analysis of word frequencies that identify {{DocumentTypeIcon}} '''[[Document Type]]s'''.<section end="Lexical" />
The '''''[[AppXtender (CMIS Connection Type)|AppXtender]]''''' '''''CMIS Connection Type''''', or "binding", connects '''Grooper''' to the [https://en.wikipedia.org/wiki/OpenText#AppEnhancer_(formerly_ApplicationXtender) ApplicationXtender] [https://en.wikipedia.org/wiki/Content_management_system content management system] for import and export operations.
<section end="AppXtender" />


=== Box ===
=== LLM Classifier ===
<section begin="Box" />
<section begin="LLM Classifier" />"[[LLM Classifier]]" is a [[Classify Method]] that classifies documents ({{BatchFolderIcon}} [[Batch Folder]]s) by asking a large language model (LLM) to select its {{DocumentTypeIcon}} [[Document Type]] from a list.<section end="LLM Classifier" />
The '''''[[Box (CMIS Connection Type)|Box]]''''' '''''CMIS Connection Type''''', or "binding", connects '''Grooper''' to the [https://en.wikipedia.org/wiki/Box_(company) Box] [https://en.wikipedia.org/wiki/Content_management_system content management system] for import and export operations.
<section end="Box" />


=== Exchange ===
=== Rules-Based ===
<section begin="Exchange" />
<section begin="Rules-Based" />"[[Rules-Based]]" is a [[Classify Method]] that employs "rules" defined on each  {{DocumentTypeIcon}} '''[[Document Type]]''' to classify {{BatchFolderIcon}} '''[[Batch Folder]]s'''. Positive Extractor and Negative Extractor properties are configured for each '''Document Type''' to positively or negatively associate a '''Batch Folder''' based on predefined criteria.
The '''''[[Exchange (CMIS Connection Type)|Exchange]]''''' '''''CMIS Connection Type''''', or "binding", connects '''Grooper''' to the [https://en.wikipedia.org/wiki/Microsoft_Exchange_Server Microsoft Exchange Server] [https://en.wikipedia.org/wiki/Message_transfer_agent mail server] for import and export operations.
:*<li class="fyi-bullet" style="padding-left:20px"> Where the Positive and Negative Extractors will impact ''all'' Classify Method results, the Rules-Based method classifies using ''only'' these properties and nothing else.<section end="Rules-Based" />
<section end="Exchange" />


=== FTP ===
=== Search Classifier ===
<section begin="FTP" />
<section begin="Search Classifier" />"[[Search Classifier]]" is a [[Classify Method]] that classifies documents ({{BatchFolderIcon}} [[Batch Folder]]s) by finding similar documents in a [[AI Search|document search index]]. The Search Classifier method uses an embeddings model and vector similarity to give an unclassified document the same {{DocumentTypeIcon}} [[Document Type]] as its closest match in the search index.<section end="Search Classifier" />
The '''''[[FTP (CMIS Connection Type)|FTP]]''''' '''''CMIS Connection Type''''', or "binding", connects '''Grooper''' to [https://en.wikipedia.org/wiki/File_Transfer_Protocol FTP] directories for import and export operations.  
<section end="FTP" />


=== IMAP ===
=== Visual ===
<section begin="IMAP" />
<section begin="Visual" />"[[Visual]]" is a [[Classify Method]] that uses image analysis instead of text data to determine the {{DocumentTypeIcon}} '''[[Document Type]]''' assigned to a {{BatchFolderIcon}} '''[[Batch Folder]]''' during [[Classification|classification]].  Instead of using text-based extractors, an "Extract Features" [[IP Command]] in an {{IPProfileIcon}} '''[[IP Profile]]''' is used to collect image-based data from a '''Batch Folder's''' image(s).  This image-based data is compared against that of previously trained document examples of each '''Document Type''' to classify the '''Batch Folder'''.<section end="Visual" />
The '''''[[IMAP (CMIS Connection Type)|IMAP]]''''' '''''CMIS Connection Type''''', or "binding", connects '''Grooper''' to email messages and folders through an [https://en.wikipedia.org/wiki/Internet_Message_Access_Protocol IMAP] email server.
</div>
<section end="IMAP" />


=== NTFS ===
== IP Command ==
<section begin="NTFS" />
<section begin="IP Command" />[[IP Command]]s specify an [[Image Processing (Concept)|image processing (IP)]] operation (such as image cleanup, format conversion or feature detection) and are used to construct {{IPStepIcon}} '''[[IP Step]]s''' in an '''[[IP Profile]]'''. IP Commands are configured using an '''IP Step's''' Command property.<section end="IP Command" />
The '''''[[NTFS (CMIS Connection Type)|NTFS]]''''' '''''CMIS Connection Type''''', or "binding", connects '''Grooper''' to files and folders in the [https://en.wikipedia.org/wiki/Microsoft_Windows Microsoft Windows] [https://en.wikipedia.org/wiki/NTFS NTFS] [https://en.wikipedia.org/wiki/File_system file system].
<div style="padding-left: 1.5em;">
<section end="NTFS" />
=== Barcode Detection ===
<section begin="Barcode Detection" />[[Barcode Detection and Barcode Removal|Barcode Detection]] is an [[IP Command]] that detects and reads barcode data. The detected barcode information is stored as part of the page's [[Layout Data|layout data]].<section end="Barcode Detection" />


=== OneDrive ===
=== Barcode Removal ===
<section begin="OneDrive" />
<section begin="Barcode Removal" />[[Barcode Detection and Barcode Removal|Barcode Removal]] is an [[IP Command]] that detects, reads and digitally removes barcodes from an image. The detected barcode information is stored as part of the page's [[Layout Data|layout data]].<section end="Barcode Removal" />
The '''''[[OneDrive (CMIS Connection Type)|OneDrive]]''''' '''''CMIS Connection Type''''', or "binding", connects '''Grooper''' to [https://en.wikipedia.org/wiki/OneDrive Microsoft OneDrive] [https://en.wikipedia.org/wiki/Cloud_computing#Service_models cloud services].
<section end="OneDrive" />


=== SFTP ===
=== Binarize ===
<section begin="SFTP" />
<section begin="Binarize" />[[Binarize]] is an [[IP Command]] that converts a color or grayscale image to a bi-tonal (black and white) image using various thresholding methods.<section end="Binarize" />
The '''''[[SFTP (CMIS Connection Type)|SFTP]]''''' '''''CMIS Connection Type''''', or "binding", connects '''Grooper''' to [https://en.wikipedia.org/wiki/SSH_File_Transfer_Protocol SFTP] directories for import and export operations.  
<section end="SFTP" />


=== SharePoint ===
=== Box Detection ===
<section begin="SharePoint" />
<section begin="Box Detection" />[[Box Detection and Box Removal|Box Detection]] is an [[IP Command]] that detects checkboxes and determines their check state (checked or unchecked). The detected checkbox information is stored as part of the page's [[Layout Data|layout data]].<section end="Box Detection" />
The '''''[[SharePoint (CMIS Connection Type)|SharePoint]]''''' '''''CMIS Connection Type''''', or "binding", connects '''Grooper''' to [https://en.wikipedia.org/wiki/SharePoint Microsoft SharePoint], providing access to content stored in "document libraries" and "picture lLibraries".
<section end="SharePoint" />


== Classification Method ==
=== Box Removal ===
<section begin="Classification Method" />
<section begin="Box Removal" />[[Box Detection and Box Removal|Box Removal]] is an [[IP Command]] that detects checkboxes, determines their check state (checked or unchecked) and digitally removes them from an image. The detected checkbox information is stored as part of the page's [[Layout Data (Concept)|layout data]].<section end="Box Removal" />
The '''''[[Classification Method (Property)|Classification Method]]''''' property determines the technique used for document [[Classification (Concept)|classification]] within a [[image:GrooperIcon_ContentModel.png]] '''[[Content Model (Object)|Content Model]]''', enabling the sorting of [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folders]]''' into categories based on their content or structure. It can utilize pattern matching, machine learning models, or other methodologies to identify and organize documents accurately.
<section end="Classification Method" />


=== Labelset-Based ===
=== Extract Page ===
<section begin="Labelset-Based" />
<section begin="Extract Page" />[[Extract Page]] is an [[IP Command]] that removes an image from a carrier image while simultaneously removing any image warping or skewing.<section end="Extract Page" />
'''''[[Labeling Behavior (Behavior)#About Labelset-Based Classification|Labelset-Based]]''''' is a '''''Classification Method''''' that leverages the labels defined via a '''''[[Labeling Behavior (Behavior)|Labeling Behavior]]''''' to classify [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folders]]'''.
<section end="Labelset-Based" />


=== Lexical ===
=== Line Detection ===
<section begin="Lexical" />
<section begin="Line Detection" />[[Line Detection and Line Removal|Line Detection]] is an [[IP Command]] that locates horizontal and vertical lines on documents. The detected line locations are stored as part of page's [[Layout Data|layout data]].<section end="Line Detection" />
The '''''[[Lexical (Classification Method)|Lexical]]''''' '''''Classification Method''''' classifies [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folders]]''' based on their text content by utilizing either pre-configured training or rules. This is achieved through the analysis of word frequencies or defined rules that identify document types .
<section end="Lexical" />


=== Rules-Based ===
=== Line Removal ===
<section begin="Rules-Based" />
<section begin="Line Removal" />[[Line Detection and Line Removal|Line Removal]] is an [[IP Command]] that locates and removes horizontal and vertical lines from documents. The detected line locations are stored as part of page's [[Layout Data|layout data]].<section end="Line Removal" />
The '''''[[Rules-Based (Classification Method)|Rules-Based]]''''' '''''Classification Method''''' employs defined "rules" on [[image:GrooperIcon_DocumentType.png]] '''[[Document Type (Object)|Document Types]]''' to classify [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folders]]''', utilizing ''Positive Extractor'' and ''Negative Extractor'' properties to accurately categorize them through rule application, thereby ensuring [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folders]]''' match predefined criteria .
<section end="Rules-Based" />


=== Visual ===
=== Scratch Removal ===
<section begin="Visual" />
<section begin="Scratch Removal" />[[Scratch Removal]] is an [[IP Command]] detects and removes or repairs scratches from film-based images.<section end="Scratch Removal" />
The '''''[[Visual (Classification Method)|Visual]]''''' '''''Classification Method''''' uses image data instead of text data to determine the [[image:GrooperIcon_DocumentType.png]] '''[[Document Type (Object)|Document Type]]''' assigned to a [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder]]''' during [[Classification (Concept)|classification]].  Instead of using text-based extractors, an [[image:GrooperIcon_IPProfile.png]] '''[[IP Profile (Object)|IP Profile]]''' is used with an '''''[[Extract Features (IP Command)|Extract Features]]''''' '''''[[IP Command (Property)|IP Command]]''''' to obtain data pertaining to a [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folder's]]''' image(s). Document samples are trained as examples of a '''Document Type'''.
<section end="Visual" />


== Collation Provider ==
=== Shape Detection ===
<section begin="Collation Provider" />
<section begin="Shape Detection" />[[Shape Detection and Shape Removal|Shape Detection]] is an [[IP Command]] that locates shapes on a document that match one or more sample images. Common shapes targeted by this command are stamps, seals, logos or other graphical marks that can serve as triggers for document separation or anchors for data extraction. Shapes The detected shapes' locations are stored as part of page's [[Layout Data|layout data]].<section end="Shape Detection" />
The '''''[[Collation Provider (Property)|Collation Provider]]''''' property of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]''' defines the method for converting its raw results into a final result set, governing how lists of matches from the '''Data Type''' are combined and interpreted to produce the output data of the '''Data Type'''.
<section end="Collation Provider" />


=== AND ===
=== Shape Removal ===
<section begin="AND" />
<section begin="Shape Removal" />[[Shape Detection and Shape Removal|Shape Removal]] is an [[IP Command]] detects and removes shapes from documents. Common shapes targeted by this command are stamps, seals, logos or other graphical marks that interfere with OCR and/or can serve as triggers for document separation or anchors for data extraction. The detected shapes' locations are stored as part of page's [[Layout Data|layout data]].<section end="Shape Removal" />
The '''''[[AND (Collation Provider)|AND]]''''' '''''[[Collation Provider (Property)|Collation Provider]]''''' of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]''' returns results only when each individual extractor specified within it gets at least one hit, thus acting as a logical “AND” operator across multiple extractors .
</div>
<section end="AND" />


=== Array ===
== OCR Engine ==
<section begin="Array" />
<section begin="OCR Engine" />An "[[OCR Engine|OCR engine]]" is the part of [[OCR]] software that recognizes text from images. OCR engines analyze the image's pixels to determine where text is on the page and what each character is. In Grooper, OCR engines are selected when configuring an '''[[OCR Profile]]'s''' OCR Engine property.<section end="OCR Engine" />
The '''''[[Array (Collation Provider)|Array]]''''' '''''[[Collation Provider (Property)|Collation Provider]]''''' of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]''' matches a list of values arranged in horizontal, vertical, or flow order, combining instances that qualify into a single result .
<div style="padding-left: 1.5em;">
<section end="Array" />


=== Combine ===
=== Azure OCR ===
<section begin="Combine" />
<section begin="Azure OCR" />[[Azure OCR]] is an [[OCR Engine]] option for '''[[OCR Profile|OCR Profiles]]''' that utilizes Microsoft Azure's Read API. Azure's Read engine is an AI-based text recognition software that uses a convolutional neural network (CNN) to recognize text. Compared to traditional OCR engines, it yields superior results, especially for handwritten text and poor quality images. Furthermore, Grooper supplements Azure's results with those from a traditional OCR engine in areas where traditional OCR is better than the Read engine.<section end="Azure OCR" />
The '''''[[Combine (Collation Provider)|Combine]]''''' '''''[[Collation Provider (Property)|Collation Provider]]''''' of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]''' combines instances from returned results based on a specified grouping, controlling how extractor results are assembled together for output.
</div>
<section end="Combine" />


=== Key-Value List ===
== Repository Option ==
<section begin="Key-Value List" />
<section begin="Repository Option" />[[Repository Option]]s are optional features that affect the entire repository. These optional features enable functionality that otherwise do not work without first establishing the connections these options provide. Repository Options are added to a '''Grooper Repository''' and configured using the {{GrooperRootIcon}} '''[[Root]]''' node's Options property. <section end="Repository Option" />
The '''''[[Key-Value List (Collation Provider)|Key-Value List]]''''' '''''[[Collation Provider (Property)|Collation Provider]]''''' of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]''' matches instances where a key and a list of one or more values appear together on the document, adhering to a specific layout pattern .
<div style="padding-left: 1.5em;">
<section end="Key-Value List" />
=== LLM Connector ===
<section begin="LLM Connector" />[[LLM Connector]] is a [[Repository Option]] that enables large language model (LLM) powered AI features for a Grooper Repository.<section end="LLM Connector" />


=== Key-Value Pair ===
=== AI Search ===
<section begin="Key-Value Pair" />
<section begin="AI Search" />[[AI Search and the Search Page|AI Search]] is a [[Repository Option]] that enables Grooper's document search and retrieval features in the Search page.  Once enabled, [[Indexing Behavior]]s can be added to [[Content Type]]s (such as {{IconName|Content Model}} [[Content Model]]s), which will allow users to submit documents to a search index. Once indexed, documents can be retrieved by full text and metadata searches in the [[AI Search and the Search Page|Search Page]].<section end="AI Search" />
The '''''[[Key-Value Pair (Collation Provider)|Key-Value Pair]]''''' '''''[[Collation Provider (Property)|Collation Provider]]''''' of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]''' matches instances where a key is paired with a value on the document in a specific layout, essential for extracting label-value pairs .
</div>
<section end="Key-Value Pair" />


=== Ordered Array ===
== Separation Provider ==
<section begin="Ordered Array" />
<section begin="Separation Provider" />The [[Separation Provider (Property)|Provider]] property of the [[Separate]] [[Activity]] defines the type of [[Separation (Concept)|separation]] to be performed at the designated [[Scope]].<section end="Separation Provider" />
The '''''[[Ordered Array (Collation Provider)|Ordered Array]]''''' '''''[[Collation Provider (Property)|Collation Provider]]''''' of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]''' finds sequences of values where one result is present for each extractor, in the order they appear .
<div style="padding-left: 1.5em;">
<section end="Ordered Array" />
=== Change in Value Separation ===
<section begin="Change in Value Separation" />The [[Change in Value Separation]] [[Separation Provider]] creates a new folder and separates every time an extracted value changes from one {{BatchPageIcon}} '''[[Batch Page]]''' to another.<section end="Change in Value Separation" />


=== Pattern-Based ===
=== Control Sheet Separation ===
<section begin="Pattern-Based" />
<section begin="Control Sheet Separation" />[[Control Sheet Separation]] is a [[Separation Provider]] that uses '''Grooper''' {{ControlSheetIcon}} '''Control Sheets''' to separate documents.<section end="Control Sheet Separation" />
The '''''[[Pattern-Based (Collation Provider)|Pattern-Based]]''''' '''''[[Collation Provider (Property)|Collation Provider]]''''' of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]''' uses regular expressions to sequence returned results into a final result set.
<section end="Pattern-Based" />


=== Split ===
=== EPI Separation ===
<section begin="Split" />
<section begin="EPI Separation" />The [[EPI Separation (Separation Provider)|EPI Separation]] [[Separation Provider (Property)|Separation Provider]] uses embedded page information ("EPI") to [[Separate (Activity)|Separate]] loose pages into document folders. A [[Data Extractor (Concept)|Data Extractor]] is used to find page numbers from the text on a page and '''Grooper''' uses this information to separate the pages.<section end="EPI Separation" />
The '''''[[Split (Collation Provider)|Split]]''''' '''''[[Collation Provider (Property)|Collation Provider]]''''' of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]''' separates a [[Data Instance (Concept)|data instance]] at each match returned by the '''Data Type'''.
<section end="Split" />


== Concept ==
=== ESP Auto Separation ===
<section begin="Concept" />
<section begin="ESP Auto Separation" />[[ESP Auto Separation]] is a [[Separation Provider]] used for document separation.  It is unique in that it ''both'' separates ''and'' classifies documents at the same time.  It uses page-level classification training examples (among other things) to determine where to insert document folders in a {{BatchIcon}} '''[[Batch]]'''.<section end="ESP Auto Separation" />
There are many objects and properties a user can configure in '''Grooper''', however, gaining an understanding how, why, and when to use these objects and properties is powered by one's understanding of the underlying concepts that define what what these objects and properties are doing and why.
<section end="Concept" />


=== Activity Processing ===
=== Event-Based Separation ===
<section begin="Activity Processing" />
<section begin="Event-Based Separation" />[[Event-Based Separation (Separation Provider)|Event-Based Separation]] is a [[Separation Provider (Property)|Separation Provider]] that [[Separate (Activity)|Separates]] documents using one or more "Separation Events".  Each Separation Event triggers the creation of a new folder.<section end="Event-Based Separation" />
[[Activity Processing (Concept)|Activity Processing]] is a conceptual term that refers to the execution of a sequence of configured tasks, such as [[Classification (Concept)|classification]], [[Extraction (Concept)|extraction]], or data enhancement on documents, which are performed within a [[image:GrooperIcon_BatchProcess.png]] '''[[Batch Process (Object)|Batch Process]]''' to transform raw data from documents into structured and actionable information.
<section end="Activity Processing" />


=== CMIS+ ===
=== Multi Separator ===
<section begin="CMIS+" />
<section begin="Multi Separator" />The [[Multi Separator]] [[Separation Provider]] performs [[Separation|separation]] using multiple Separation Providers. It allows users to create a list of any of the other Separation Providers. If the first provider on the list fails to separate a page (or, as more often is the case, a series of pages), the next one will be applied.  If that fails, the next, and so on.<section end="Multi Separator" />
[[CMIS+ (Concept)|CMIS+]] is a conceptual term that refers to '''Grooper's''' [https://en.wikipedia.org/wiki/Content_Management_Interoperability_Services CMIS]+ architecture that provides a standardized access to document content and [https://en.wikipedia.org/wiki/Metadata metadata] across a variety of external storage platforms.
<section end="CMIS+" />


=== CMIS ===
=== Pattern-Based Separation ===
<section begin="CMIS" />
<section begin="Pattern-Based Separation" />[[Pattern-Based Separation]] is a [[Separation Provider]] that creates a new document folder every time a value returned by a defined pattern is encountered on a page.<section end="Pattern-Based Separation" />
[[CMIS (Concept)|CMIS]] is a conceptual term that refers to [https://en.wikipedia.org/wiki/Content_Management_Interoperability_Services Content Management Interoperability Services]: an open standard allowing different [https://en.wikipedia.org/wiki/Content_management_system content management systems] to share information over the [https://en.wikipedia.org/wiki/Internet Internet].
<section end="CMIS" />


=== CMIS Query ===
=== Undo Separation ===
<section begin="CMIS Query" />
<section begin="Undo Separation" />[[Undo Separation]] is a [[Separation Provider]]. Instead of putting loose {{BatchPageIcon}} '''[[Batch Page]]s''' into {{BatchFolderIcon}} '''[[Batch Folder]]s''', this Separation Provider removes '''Batch Folders''', leaving only loose pages.<section end="Undo Separation" />
[[CMIS Query (Concept)|CMIS Query]] is a conceptual term that refers to the fact that [https://en.wikipedia.org/wiki/Content_Management_Interoperability_Services CMIS] [https://en.wikipedia.org/wiki/Query#Computing_and_technology Queries] are utilized to search documents in CMIS [https://en.wikipedia.org/wiki/Repository#Archives_and_online_databases Repositories] and to filter documents upon import when using the '''''[[Import Query Results (Import Provider)|Import Query Results]]''''' '''''[[Import Provider (Property)|Import Provider]]'''''.
</div>
<section end="CMIS Query" />


=== CSS Data Viewer Styling ===
== Service ==
<section begin="CSS Data Viewer Styling" />
''{{TypeName|Service Instance}}''
[[CSS Data Viewer Styling (Concept)|CSS Data Viewer Styling]] is a conceptual term that refers to the idea that the '''[[Web Client (Application)|Grooper Web Client's]]''' '''[[Data View (Task View)|Data View]]''' task view of the '''[[Review (Activity)|Review]]''' interface is styled using [https://en.wikipedia.org/wiki/CSS CSS]. This gives you a great deal of control over a [[image:GrooperIcon_DataModel.png]] '''[[Data Model (Object)|Data Model's]]''' appearance and layout during document review.
<section end="CSS Data Viewer Styling" />


=== Classification ===
<section begin="Service" />Grooper '''[[Service]]s''' are various executable programs that run as a [https://en.wikipedia.org/wiki/Windows_service Windows Service] to facilitate Grooper processing. Service instances are installed, configured, started and stopped using [[Grooper Command Console]] (or in older Grooper versions, [[Grooper Config]]).<section end="Service" />
<section begin="Classification" />
<div style="padding-left: 1.5em;">
[[Classification (Concept)|Classification]] is a conceptual term that refers to the process of identifying and organizing documents into categorical types based on their content or layout, often using [https://en.wikipedia.org/wiki/Machine_learning machine learning], rules, or pattern recognition for efficient document management and data extraction workflows. Specifically, the '''''[[Classify (Activity)|Classify]]''''' '''''[[Activity (Property)|Activity]]''''' will assign a '''[[Content Type (Concept)|Content Type]]''' to a [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folder]]'''.
=== Activity Processing ===
<section end="Classification" />
''{{TypeName|Activity Processing}}''


=== Code Expressions ===
<section begin="Activity Processing Service" />'''[[Activity Processing]]''' is a [[Grooper Service]] that executes '''[[Activity|Activities]]''' assigned to {{BatchProcessStepIcon}} '''[[Batch Process Step]]s''' in a {{BatchProcessIcon}} '''[[Batch Process]]'''. This allows '''Grooper''' to automate '''Batch Steps''' that do not require a human operator.<section end="Activity Processing Service" />
<section begin="Code Expressions" />
[[Code Expressions (Concept)|Code Expressions]] (not to be confused with [https://en.wikipedia.org/wiki/Regular_expression regular expressions]) is a conceptual term that refers to snippets of [https://en.wikipedia.org/wiki/Visual_Basic_(.NET) VB.Net] code that expand '''Grooper’s''' core functionality.
<section end="Code Expressions" />


=== Combined Methods ===
=== API Services ===
<section begin="Combined Methods" />
''{{TypeName|API Services}}''
[[Combined Methods (Concept)|Combined Methods]] is a conceptual term that refers to the idea that a user can leverage multiple '''''[[Classification Method (Property)|Classification Methods]]''''' to overcome the shortcomings of an individual method.
<section end="Combined Methods" />


=== Content Type ===
<section begin="API Services" />You can perform {{BatchIcon}} '''[[Batch]]''' processing via [https://en.wikipedia.org/wiki/REST REST] [https://en.wikipedia.org/wiki/API API] web calls by installing  [[API Services]].
<section begin="Content Type" />
*<li class="attn-bullet">As of version 2025, the Grooper Web Services (GWS) web app hosts additional API endpoints. Some of these endpoints overlap with the API Services endpoints. Refer to the GWS documentation for more information on its endpoint offerings. You can locate the GWS documentation for your Grooper install at <code><nowiki>https://{webserver-name-or-domain-name}/GWS</nowiki></code> <section end="API Services" />
'''[[Content Type (Concept)|Content Type]]''' is a conceptual term that refers to the grouping of three '''Grooper''' objects: [[image:GrooperIcon_ContentModel.png]] '''[[Content Model (Object)|Content Models]]''', [[image:GrooperIcon_ContentCategory.png]] '''[[Content Category (Object)|Content Categories]]''', and [[image:GrooperIcon_DocumentType.png]] '''[[Document Type (Object)|Document Types]]'''.
<section end="Content Type" />


=== Data Context ===
=== Grooper Licensing ===
<section begin="Data Context" />
''{{TypeName|Grooper Licensing}}''
[[Data Context (Concept)|Data Context]] is a conceptual term that gives definition to data that, without it, is otherwise meaningless.
<section end="Data Context" />


=== Data Element ===
<section begin="Grooper Licensing" />'''[[Grooper Licensing]]''' is a [[Grooper Service]] that distributes licenses to multiple workstations running '''Grooper''' applications.<section end="Grooper Licensing" />
<section begin="Data Element" />
'''[[Data Element (Concept)|Data Element]]''' is a conceptual term that refers to the grouping of five '''Grooper''' objects: [[image:GrooperIcon_DataModel.png]] '''[[Data Model (Object)|Data Models]]''', [[image:GrooperIcon_DataSection.png]] '''[[Data Section (Object)|Data Sections]]''', [[image:GrooperIcon_DataField.png]] '''[[Data Field (Object)|Data Fields]]''', [[image:GrooperIcon_DataTable.png]] '''[[Data Table (Object)|Data Tables]]''', and [[image:GrooperIcon_DataColumn.png]] '''[[Data Column|Data Columns]]'''.
<section end="Data Element" />


=== Data Extractor ===
=== Import Watcher ===
<section begin="Data Extractor" />
''{{TypeName|Import Watcher}}''
[[Data Extractor (Concept)|Data Extractor]] is a conceptual term that refers to the grouping of all [[Data Extractor (Concept)#Extractor_Types|extractor types]] and [[Object Nomenclature#Extractor Objects|extractor objects]].
<section end="Data Extractor" />


=== Data Instance ===
<section begin="Import Watcher Service" />An '''[[Import Watcher]]''' is a [[Grooper Service]] that schedules and runs [[Import Jobs]]. It uses an [[Import Provider]] to query files in a file system or content management system that meet specified criteria according to a defined schedule (every minute, every day, only on Sundays, etc.). These files are imported into Grooper as documents ({{BatchFolderIcon}} [[Batch Folder]]s) in a new {{BatchIcon}} [[Batch]].
<section begin="Data Instance" />
:*<li class="fyi-bullet" style="padding-left:20px"> Afterward, the imported files can be (and should be) moved, deleted, or modified to prevent repeat imports in the next polling cycle.<section end="Import Watcher Service" />
[[Data Instance (Concept)|Data Instance]] is a conceptual term that refers to an encapsulation of text data within a document. Data instances are the hierarchy of text data that '''Grooper's''' extraction mechanisms create.
<section end="Data Instance" />


=== EDI Integration ===
=== Indexing Service ===
<section begin="EDI Integration" />
''{{TypeName|Indexing Service}}''
[[EDI Integration (Concept)|EDI Integration]] is a conceptual term that refers to '''Grooper's''' ability to process [https://en.wikipedia.org/wiki/Electronic_data_interchange EDI] files.
<section end="EDI Integration" />


=== Expressions ===
<section begin="Indexing Service" />An '''[[Indexing Service]]''' is a [[Grooper Service]] that periodically polls the '''Grooper''' database to automate [[AI Search]] indexing. It checks to see if any documents in a '''Grooper Repository''' are classified as a '''[[Document Type]]''' that inherit from a '''Content Type''' configured with an [[Indexing Behavior]]. If there are any, and they need to be added, updated, or deleted to/from the search index, the '''Indexing Service''' will submit an "Indexing Job" to be picked up by an '''[[Activity Processing]]''' service.<section end="Indexing Service" />
<section begin="Expressions" />
</div>
[[Expressions (Concept)|Expressions]] (not to be confused with [https://en.wikipedia.org/wiki/Regular_expression regular expressions]) is a conceptual term that refers to snippets of [https://en.wikipedia.org/wiki/Visual_Basic_(.NET) VB.Net] code that expand '''Grooper’s''' core functionality.
<section end="Expressions" />


=== Expressions Cookbook ===
== Extraction Related Types ==
<section begin="Expressions Cookbook" />
[[Expressions Cookbook (Concept)|Expressions Cookbook]] is a conceptual term that refers to a reference list for commonly used [https://en.wikipedia.org/wiki/Expression_(computer_science) expressions] in '''Grooper'''.
<section end="Expressions Cookbook" />


=== Field Mapping ===
These are configuration objects in Grooper that relate to extracting data from documents. These objects include specialized items such as "Table Extract Methods" which pertain only to configuring Data Table nodes. These also include more general items such as Value Extractors which are used by various extractor related properties on a variety of node types in Grooper.
<section begin="Field Mapping" />
[[Field Mapping (Concept)|Field Mapping]] is a conceptual term that refers to how logical connections are made between [https://en.wikipedia.org/wiki/Metadata metadata] content in '''Grooper''' and an external storage platform.
<section end="Field Mapping" />


=== Five Phases of Grooper ===
These "extraction related types" are always found when configuring properties of:
<section begin="Five Phases of Grooper" />
* Extractor Nodes ([[Data Type]], [[Value Reader]] and [[Field Class]])
[[Five Phases of Grooper (Concept)|Five Phases of Grooper]] is a conceptual term that seeks to build understanding of how documents are processed through '''Grooper'''.
* Data Elements ([[Data Model]], [[Data Field]], [[Data Section]], [[Data Table]] and [[Data Column]])
<section end="Five Phases of Grooper" />


=== Flow Collation ===
<section begin="Flow Collation" />
[[Flow Collation (Concept)|Flow Collation]] is a conceptual term used to define a type of layout used in '''''[[Collation Provider (Property)|Collation Providers]]''''' of [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Types]]'''.
<section end="Flow Collation" />


=== Footer Rows and Footer Modes ===
This includes:
<section begin="Footer Rows and Footer Modes" />
* [[#Value Extractor|Value Extractors]]
[[Footer Rows and Footer Modes (Concept)|Footer Rows and Footer Modes]] is a conceptual term that refers to how a "footer row" (enabled by the '''''Generate Footer Row''''' property of a [[image:GrooperIcon_DataTable.png]] '''[[Data Table (Object)|Data Table]]''') provides '''Grooper''' users a quick way to validate numerical data in a [[image:GrooperIcon_DataColumn.png]] '''[[Data Column|Data Column]]'''. The '''Data Column's''' '''''Footer Mode''''' property controls if and how a total is determined for numerical values in a '''Data Column'''.
* [[#Colation Provider|Colation Providers]]
<section end="Footer Rows and Footer Modes" />
* [[#Fill Method|Fill Methods]]
* [[#Lookup Specification|Lookup Specifications]]
* [[#Section Extract Method|Section Extract Methods]]
* [[#Table Extract Method|Table Extract Methods]]


=== Fuzzy RegEx ===
*<li class="fyi-bullet"> Scripting/Advanced user info: These objects inherit from a base class called "Embedded Object". This is includes a large number of objects that exist as configurable properties.
<section begin="Fuzzy RegEx" />
[[Fuzzy RegEx (Concept)|Fuzzy RegEx]] is a conceptual term that refers to the usage of [https://en.wikipedia.org/wiki/Fuzzy_logic fuzzy logic] within [[Data Extractor (Concept)#Extractor_Types|extractor types]] that leverage regular expressions to match patterns via the enabling of the '''''Fuzzy Matching'''''' property.
<section end="Fuzzy RegEx" />


=== GPT Integration ===
== Collation Provider ==
<section begin="GPT Integration" />
<section begin="Collation Provider" />The [[Collation Provider|Collation]] property of a {{DataTypeIcon}} '''[[Data Type]]''' defines the method for converting its raw results into a final result set. It is configured by selecting a Collation Provider. The Collation Provider governs how initial matches from the '''Data Type's''' extractor(s) are combined and interpreted to produce the '''Data Type's''' final output.<section end="Collation Provider" />
[[GPT Integration (Concept)|GPT Integration]] is a conceptual term that refers to the usage of [https://en.wikipedia.org/wiki/OpenAI OpenAI's] [https://en.wikipedia.org/wiki/Generative_pre-trained_transformer GPT] models within '''Grooper''' to enhance the capabilities of [[Data Extractor (Concept)|data extractors]], [[Classification (Concept)|classification]], and lookups.
<div style="padding-left: 1.5em;">
<section end="GPT Integration" />
=== AND ===
<section begin="AND" />[[AND]] is a [[Collation Provider]] option for {{DataTypeIcon}} '''[[Data Type]]''' extractors. AND returns results only when each of its referenced or child extractors gets at least one hit, thus acting as a logical “AND” operator across multiple extractors.<section end="AND" />


=== Grooper Infrastructure ===
=== Array ===
<section begin="Grooper Infrastructure" />
<section begin="Array" />[[Array]] is a [[Collation Provider]] option for {{DataTypeIcon}} '''[[Data Type]]''' extractors. Array matches a list of values arranged in horizontal, vertical, or text-flow order, combining instances that qualify into a single result.<section end="Array" />
[[Grooper Infrastructure (Concept)|Grooper Infrastructure]] is a conceptual term that refers to computing underpinnings of what makes up a [[Grooper Repository (Concept)|Grooper repository]] and the [https://en.wikipedia.org/wiki/Software software] that allows interface with it.
<section end="Grooper Infrastructure" />


=== Grooper Repository ===
=== Combine ===
<section begin="Grooper Repository" />
<section begin="Combine" />[[Combine]] is a [[Collation Provider]] option for {{DataTypeIcon}} '''[[Data Type]]''' extractors. Combine combines instances from returned results based on a specified grouping, controlling how extractor results are assembled together for output.<section end="Combine" />
[[Grooper Repository (Concept)|Grooper Repository]] is a conceptual term that refers to the environment used to create, configure and execute objects in '''Grooper'''. It provides the framework to "do work" in '''Grooper'''.
<section end="Grooper Repository" />


=== Grooper Service ===
=== Key-Value List ===
<section begin="Grooper Service" />
<section begin="Key-Value List" />[[Key-Value List]] is a [[Collation Provider]] option for {{DataTypeIcon}} '''[[Data Type]]''' extractors. Key-Value List matches instances where a key and a list of one or more values appear together on the document, adhering to a specific layout pattern.<section end="Key-Value List" />
[[Grooper Service (Concept)|Grooper Service]] is a conceptual term that refers to the various [https://en.wikipedia.org/wiki/Computer_program executable programs] that run as a [https://en.wikipedia.org/wiki/Windows_service Windows Services] to facilitate '''Grooper''' processing. Service instances are installed, configured, started and stopped using [[Grooper Config (Application)|Grooper Config]].
<section end="Grooper Service" />


=== Image Processing ===
===Key-Value Pair ===
<section begin="Image Processing Concept" />
<section begin="Key-Value Pair" />[[Key-Value Pair]] is a [[Collation Provider]] option for {{DataTypeIcon}} '''[[Data Type]]''' extractors.  Key-Value Pair matches instances where a key is paired with a value on the document in a specific layout. ''Note: Key-Value Pair is an older technique in Grooper. In most cases, the [[Labeled Value]] extractor is preferable to Key-Value Pair collation.<section end="Key-Value Pair" />
[[Image Processing (Concept)|Image Processing]] is a conceptual term that refers to how '''Grooper''' applies a variety of techniques to enhance scanned documents' quality, improving [[OCR (Concept)|OCR]] accuracy by removing imperfections and adjusting visual characteristics to prepare images for [[Extraction (Concept)|data extraction]] and [[Classification (Concept)|classification]].
<section end="Image Processing Concept" />


=== Import Mode and Document Linking ===
=== Multi-Column ===
<section begin="Import Mode and Document Linking" />
<section begin="Multi-Column" />[[Multi-Column]] is a [[Collation Provider]] option for {{DataTypeIcon}} '''[[Data Type]]''' extractors. Multi-Column combines multiple columns on a page into a single column for [[Data Extraction (Concept)|extraction]].<section end="Multi-Column" />
[[Import Mode and Document Linking (Concept)|Import Mode and Document Linking]] is a conceptual term that refers to the usage of the '''''Import Mode''''' property. This affects whether or not an imported document maintains a link to its original file and/or if a copy of the file is made on import or not.
<section end="Import Mode and Document Linking" />


=== LINQ to Grooper Objects ===
=== Ordered Array ===
<section begin="LINQ to Grooper Objects" />
<section begin="Ordered Array" />[[Ordered Array]] is a [[Collation Provider]] option for {{DataTypeIcon}} '''[[Data Type]]''' extractors. Ordered Array finds sequences of values where one result is present for each extractor, in the order they appear, according to a specified horizontal, vertical or text-flow layout.<section end="Ordered Array" />
[[LINQ to Grooper Objects (Concept)|LINQ to Grooper Objects]] is a conceptual term that refers to the ability of '''Grooper''' to leverage [https://en.wikipedia.org/wiki/Language_Integrated_Query LINQ] [https://en.wikipedia.org/wiki/Syntax syntax] in [[Expressions (Concept)|expressions]].
<section end="LINQ to Grooper Objects" />


=== Layered OCR ===
=== Pattern-Based ===
<section begin="Layered OCR" />
<section begin="Pattern-Based" />[[Pattern-Based]] is a [[Collation Provider]] option for {{DataTypeIcon}} '''[[Data Type]]''' extractors. Pattern-Based uses [https://en.wikipedia.org/wiki/Regular_expression regular expressions] to sequence returned results into a final result set.<section end="Pattern-Based" />
[[Layered OCR (Concept)|Layered OCR]] is a conceptual term that refers to the usage of the ''Layered OCR'' setting of the '''''OCR Engine''''' property of an [[image:GrooperIcon_OCRProfile.png]] '''[[OCR Profile (Object)|OCR Profile]]'''. The use of this setting enables the usage of secondary '''OCR Profiles''' on a single page. The [[OCR (Concept)|OCR]] results from these secondary '''OCR Profiles''' are merged with (or ''layered'' on top of) the primary '''OCR Profile's''' results.
<section end="Layered OCR" />


=== Layout Data ===
=== Split ===
<section begin="Layout Data" />
<section begin="Split" />[[Split]] is a [[Collation Provider]] option for {{DataTypeIcon}} '''[[Data Type]]''' extractors. Split separates a [[Data Instance|data instance]] at each match returned by the '''Data Type'''. The results are used as anchor points to "split" text into one or more smaller parts.<section end="Split" />
[[Layout Data (Concept)|Layout Data]] is a conceptual term that refers to information such as line locations, [https://en.wikipedia.org/wiki/Optical_mark_recognition OMR] checkbox locations and states, [https://en.wikipedia.org/wiki/Barcode barcode] values, and detected shapes captured by certain [[Image Processing (Concept)|image processing]] commands. This data is stored as an attached file on a [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder]]''' or [[image:GrooperIcon_BatchPage.png]] '''[[Batch Page]]''' object and can later be recalled by various functions within '''Grooper''' that rely on the presence of that data to function.
</div>
<section end="Layout Data" />


=== Microfiche Processing ===
== Fill Method ==
<section begin="Microfiche Processing" />
<section begin="Fill Method" />[[Fill Method]]s provide various mechanisms for populating child [[Data Element]]s of a {{IconName|Data Model}} [[Data Model]], {{IconName|Data Section}} [[Data Section]] or {{IconName|Data Table}} [[Data Table]]. Fill Methods can be added to these nodes using their "Fill Methods" property and editor.
[[Microfiche Processing (Concept)|Microfiche Processing]] is a conceptual term that refers to how '''Grooper''' leverages several '''''[[IP Command (Property)|IP Commands]]''''' to accurately process [https://en.wikipedia.org/wiki/Microform microform] documents.
*<li class="attn-bullet"> Fill Methods are secondary extraction operations. They populate descendant Data Elements '''after''' normal extraction when the {{IconName|Extract}} [[Extract]] activity runs.<section end="Fill Method" />
<section end="Microfiche Processing" />
<div style="padding-left: 1.5em;">
=== AI Extract ===
''{{TypeName|AI Extract}}''


=== Microsoft Office Integration ===
<section begin="AI Extract" />[[AI Extract]] is a [[Fill Method]] that leverages a [https://en.wikipedia.org/wiki/Large_language_model Large Language Model (LLM)] to return extraction results to [[Data Element]]s in a {{DataModelIcon}} [[Data Model]] or {{DataSectionIcon}} [[Data Section]]. This mechanism provides powerful AI-based data extraction with minimal setup.<section end="AI Extract" />
<section begin="Microsoft Office Integration" />
[[Microsoft Office Integration (Concept)|Microsoft Office Integration]] is a conceptual term that refers to '''Grooper's''' ability to convert [https://en.wikipedia.org/wiki/Microsoft_Word Microsoft Word] and [https://en.wikipedia.org/wiki/Microsoft_Excel Microsoft Excel] files into formats that '''Grooper''' can read.
<section end="Microsoft Office Integration" />


=== OCR ===
=== Fill Descendants ===
<section begin="OCR" />
''{{TypeName|Fill Descendants}}''
[[OCR (Concept)|OCR]] is a conceptual term that stands for [https://en.wikipedia.org/wiki/Optical_character_recognition Optical Character Recognition]. It allows text from paper documents to be digitized, in order to be searched or edited by other [https://en.wikipedia.org/wiki/Software software applications]. OCR converts typed or printed text from digital images of physical documents into machine readable, encoded text
<section end="OCR" />


=== OCR Synthesis ===
<section begin="Fill Descendants" />[[Fill Descendants]] is a [[Fill Method]] that executes any Fill Methods on child [[Data Element]]s in parallel. This has been shown to dramatically increase efficiency on larger {{IconName|Data Model}} [[Data Model]]s with multiple {{IconName|Data Section}} [[Data Section]]s using [[AI Extract]].<section end="Fill Descendants" />
<section begin="OCR Synthesis" />
[[OCR Synthesis (Concept)|OCR Synthesis]] is a conceptual term that refers to '''Grooper's''' unique method of pre-processing and re-processing raw results from the '''''[[OCR Engine (Property)|OCR Engine]]''''' to get better results out of it.
<section end="OCR Synthesis" />


=== Object Nomenclature ===
=== Run Child Extractors ===
<section begin="Object Nomenclature" />
''{{TypeName|Run Child Extractors}}''
[[Object Nomenclature (Concept)|Object Nomenclature]] is a conceptual term that refers to the idea that mastery of a '''Grooper''' environment is greatly enhanced by understanding the myriad of objects that can exist and how they are related.
<section end="Object Nomenclature" />


=== PDF Page Types ===
<section begin="Run Child Extractors" />[[Run Child Extractors]] is a [[Fill Method]] that executes extraction for a subset of child [[Data Element]]s. This allows you to selectively run extraction logic for one or more Data Elements in a {{IconName|Data Model}} [[Data Model]], {{IconName|Data Section}} [[Data Section]], or {{IconName|Data Table}} [[Data Table]].<section end="Run Child Extractors" />
<section begin="PDF Page Types" />
</div>
[[PDF Page Types (Concept)|PDF Page Types]] is a conceptual term that refers to specific types of [https://en.wikipedia.org/wiki/PDF PDF] pages. Page types describe the kind of content in a PDF page and informs '''Grooper''' how certain '''''[[Activity (Property)|Activities]]''''' should process the page. For example, "single image" pages are [[OCR (Concept)|OCR'd]] by the '''''[[Recognize (Activity)|Recognize]]''''' activity where "text only" pages have their native text extracted.
<section end="PDF Page Types" />


=== Regular Expression ===
== Section Extract Method ==
<section begin="Regular Expression" />
<section begin="Section Extract Method" />The Extract Method property of a {{DataSectionIcon}} '''[[Data Section]]''' defines a "Section Extract Method" which specifies how section instances will be identified and extracted.<section end="Section Extract Method" />
[[Regular Expression (Concept)|Regular Expression]] is a conceptual term that refers to a standard [https://en.wikipedia.org/wiki/Syntax syntax] designed to parse [https://en.wikipedia.org/wiki/String_(computer_science) text strings]. This is a way of finding information in a block of text. It is the primary method by which '''Grooper''' extracts and returns data from documents.
<div style="padding-left: 1.5em;">
<section end="Regular Expression" />
=== Clause Detection ===
<section begin="Clause Detection" />[[Clause Detection]] is a {{DataSectionIcon}} '''[[Data Section]]''' Extract Method. It leverages [https://en.wikipedia.org/wiki/Large_language_model LLM] text embedding models to compare supplied samples of text against the text of a document to return what the AI determines is the "chunk" of text that most closely resembles the supplied samples. <section end="Clause Detection" />


=== Repository ===
=== Nested Table ===
<section begin="Repository" />
<section begin="Nested Table" />[[Nested Table (Section Extract Method)|Nested Table]] is a {{DataSectionIcon}} '''[[Data Section]]''' Extract Method. This method divides a document into sections by extracting table data within those sections. This gives '''Grooper''' users a method for extracting hierarchical tables as well as dividing up a document into sections where each of those sections have the same table (or at least tabular data which can be extracted by a single {{DataTableIcon}} '''[[Data Table]]''' object).<section end="Nested Table" />
[[Repository (Concept)|Repository]] is a conceptual term that refers to a location where files and/or data is stored and managed.
<section end="Repository" />


=== Separation ===
=== Transaction Detection ===
<section begin="Separation" />
<section begin="Transaction Detection" />[[Transaction Detection]] is a {{DataSectionIcon}} '''[[Data Section]]''' Extract Method.  This [[Data Extraction (Concept)|extraction]] method produces section instances by detecting repeating patterns of text around the '''Data Section's''' child {{DataFieldIcon}} '''[[Data Field]]s'''.<section end="Transaction Detection" />
[[Separation (Concept)|Separation]] is a conceptual term that refers to the process of taking an unorganized [[image:GrooperIcon_Batch.png]] '''[[Batch (Object)|Batch]]''' of loose [[image:GrooperIcon_BatchPage.png]] '''[[Batch Page|Batch Pages]]''' and organizing them into document folders. This is done so Grooper can later assign a Document Type to each document folder in a process known as Classification.
</div>
<section end="Separation" />


=== TF-IDF ===
== Lookup Specification ==
<section begin="TF-IDF" />
<section begin="Lookup" />A [[Lookup Specification]] defines a "lookup operation", where existing Grooper fields (called "lookup fields") are used to query an external data source, such as a database. The results of the lookup can be used to validate or populate field values (called "target fields") in Grooper. Lookup Specifications are created on "container elements" ({{DataModelIcon}} '''[[Data Model]]s''', {{DataSectionIcon}} '''[[Data Section]]s''' and {{DataTableIcon}} '''[[Data Table]]s''') using their Lookups property. Lookups may query using all single-instance fields relative to the container element (including those defined on parent elements up to the root '''Data Model'''), but ''cannot'' be used to populate a field value on a parent of the container element.<section end="Lookup" />
[[TF-IDF (Concept)|TF-IDF]] is a conceptual term that refers to ([https://en.wikipedia.org/wiki/Tf%E2%80%93idf term frequency-inverse document frequency]), a numerical statistic intended to reflect how important a word is to a document within a collection (or document set or [https://en.wikipedia.org/wiki/Text_corpus corpus]). It is how '''Grooper''' uses [https://en.wikipedia.org/wiki/Machine_learning machine learning] for training-based document [[Classification (Concept)|classification]] (via the [[Lexical (Classification Method)|Lexical]] method) and data extraction (via the [[image:GrooperIcon_FieldClass.png]] [[Field Class (Object)|Field Class]] extractor).
<div style="padding-left: 1.5em;">
<section end="TF-IDF" />
=== CMIS Lookup ===
<section begin="CMIS Lookup" />[[CMIS Lookup]] is a [[Lookup Specification]] that performs a lookup against a {{CMISRepositoryIcon}} '''[[CMIS Repository]]''' via a "[[CMIS Query|CMISQL query]]" (a specialized query language based on SQL database queries).<section end="CMIS Lookup" />


=== Table Extraction ===
=== Database Lookup ===
<section begin="Table Extraction" />
<section begin="Database Lookup" />[[Database Lookup]] is a [[Lookup Specification]] that performs a lookup against a {{DataConnectionIcon}} '''[[Data Connection]]''' via a [https://en.wikipedia.org/wiki/SQL SQL query].<section end="Database Lookup" />
[[Table Extraction (Concept)|Table Extraction]] is a conceptual term that refers to '''Grooper's''' functionality to extract data from [https://en.wikipedia.org/wiki/Table_cell cells] in [https://en.wikipedia.org/wiki/Table_(information) tables].  This is accomplished by configuring the [[image:GrooperIcon_DataTable.png]] '''[[Data Table (Object)|Data Table]]''' and its child [[image:GrooperIcon_DataColumn.png]] '''[[Data Column|Data Column]]''' '''[[Data Element (Concept)|Data Elements]]''' in a [[image:GrooperIcon_DataModel.png]] '''[[Data Model (Object)|Data Model]]'''.
<section end="Table Extraction" />


=== Test Batch ===
=== GPT Lookup ===
<section begin="Test Batch" />
<section begin="GPT Lookup" />''PLEASE NOTE: GPT Lookup is obsolete as of version 2025. Much of its functionality was replaced by newer and better LLM-based extraction methods, such as [[AI Extract]]. If absolutely necessary, its functionality could also be replicated with a [[Web Service Lookup]] implementation.'' [[GPT Lookup]] is a [[Lookup Specification]] that performs a lookup using an [https://en.wikipedia.org/wiki/OpenAI OpenAI] [https://en.wikipedia.org/wiki/Generative_pre-trained_transformer GPT] model.<section end="GPT Lookup" />
[[Test Batch (Concept)|Test Batch]] is a conceptual term that refers to any [[image:GrooperIcon_Batch.png]] '''[[Batch (Object)|Batch]]''' created in the '''Test''' folder of the '''Batches''' folder in the [[Node Tree (UI Element)|Node Tree]]).
<section end="Test Batch" />


=== Thread ===
=== Lexicon Lookup ===
<section begin="Thread" />
<section begin="Lexicon Lookup" />[[Lexicon Lookup]] is a [[Lookup Specification]] that performs a lookup against a {{LexiconIcon}} '''[[Lexicon]]'''.<section end="Lexicon Lookup" />
[[Thread (Concept)|Thread]] is a conceptual term that refers to the smallest unit of processing that can be performed within an [https://en.wikipedia.org/wiki/Operating_system operating system].
<section end="Thread" />


=== Training-Based Approaches to Document Classification ===
=== Web Service Lookup ===
<section begin="Training-Based Approaches to Document Classification" />
<section begin="Web Service Lookup" />[[Web Service Lookup]] is a [[Lookup Specification]] that looks up external data at an [https://en.wikipedia.org/wiki/API API] endpoint by calling a [https://en.wikipedia.org/wiki/Web_service web service].<section end="Web Service Lookup" />
[[Training-Based Approaches to Document Classification (Concept)|Training-Based Approaches to Document Classification]] is a conceptual term that refers to an approach to document [[Classification (Concept)|classification]] that classifies [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folders]]''' according to the similarity of unclassified '''[[Batch Folder|Batch Folders]]''' to trained examples of that kind of '''[[Document Type (Object)|Document Type]]'''.
<section end="Training-Based Approaches to Document Classification" />


=== Training Batch ===
=== XML Lookup ===
<section begin="Training Batch" />
<section begin="XML Lookup" />[[XML Lookup]] is a [[Lookup Specification]] that performs a lookup against an XML file stored as a {{ResourceFileIcon}} [[Resource File]] in the {{ProjectIcon}} [[Project]]. XML Lookups use XPath expressions to select XML nodes and map XML attributes or an XML element's text to Grooper fields.<section end="XML Lookup" />
[[Training Batch (Concept)|Training Batch]] is a conceptual term that refers to a more convenient way to work with all of the samples a [[image:GrooperIcon_ContentModel.png]] [[Content Model (Object)|Concent Model]] has been trained against. You can also still look at the '''[[Form Type (Object)|Form Types]]''' underneath each '''[[Content Type (Concept)|Content Type]]''', but the '''Training Set''' can show you all the samples in one place.
</div>
<section end="Training Batch" />


=== UNC Path ===
== Table Extract Method ==
<section begin="UNC Path" />
<section begin="Table Extract Method" />A [[Table Extract Method]] defines the settings and logic for a {{DataTableIcon}} '''[[Data Table]]''' to perform [[Data Extraction|extraction]]. It is set by configuring the Extract Method property of the '''Data Table'''.<section end="Table Extract Method" />
[[UNC Path (Concept)|UNC Path]] is a conceptual term that refers to [https://en.wikipedia.org/wiki/Path_(computing)#UNC UNC (Universal Naming Convention)] which is a standard used in [https://en.wikipedia.org/wiki/Microsoft_Windows Microsoft Windows] for accessing [https://en.wikipedia.org/wiki/Shared_resource shared network folders].
<div style="padding-left: 1.5em;">
<section end="UNC Path" />
=== Delimited Extract ===
<section begin="Delimited Extract" />The [[Delimited Extract]] [[Table Extract Method]] extracts tabular data from a [https://en.wikipedia.org/wiki/Delimiter-separated_values delimiter-separated] text file, such as a [https://en.wikipedia.org/wiki/Comma-separated_values CSV file].<section end="Delimited Extract" />


=== URL Endpoints for Review ===
=== Fluid Layout ===
<section begin="URL Endpoints for Review" />
<section begin="Fluid Layout" />The [[Fluid Layout]] [[Table Extract Method]] will choose between [[Tabular Layout]] and Flow Layout configurations, depending on how labels are collected for a {{DocumentTypeIcon}} '''[[Document Type]]'''.<section end="Fluid Layout" />
[[URL Endpoints for Review (Concept)|URL Endpoints for Review]] is a conceptual term that refers to three [https://en.wikipedia.org/wiki/URL URL] [https://en.wikipedia.org/wiki/Web_API#Endpoints endpoints] that can be used to open '''''[[Review (Activity)|Review]]''''' tasks in the '''[[Web Client (Application)|Grooper Web Client]]''', given certain information like the '''Grooper''' '''''Repository ID''''', [[image:GrooperIcon_BatchProcess.png]] '''[[Batch Process (Object)|Batch Process]]''' name, [[image:GrooperIcon_Batch.png]] '''[[Batch (Object)|Batch]]''' '''''Id''''' and more.
<section end="URL Endpoints for Review" />


=== Waterfall Classification ===
=== Grid Layout ===
<section begin="Waterfall Classification" />
<section begin="Grid Layout" />The [[Grid Layout]] [[Table Extract Method]] uses the positional location of row and column headers to interpret where a tabular grid would be around each value in a table and extract values from each cell in the interpreted grid.<section end="Grid Layout" />
[[Waterfall Classification (Concept)|Waterfall Classification]] is a conceptual term that refers to a [[Classification (Concept)|classification]] notion in '''Grooper''' that manipulates the '''''Positive Extractor''''' property to prioritize training similarity in order to achieve a middle ground between high specificity and accuracy, and generality with minimal accuracy. This is helpful whenever '''[[Batch Folder|Batch Folders]]''' get misclassified, and simply retraining won't help.
<section end="Waterfall Classification" />


=== XML Schema Integration ===
=== Row Match ===
<section begin="XML Schema Integration" />
<section begin="Row Match" />The [[Row Match (Table Extract Method)|Row Match]] [[Table Extract Method (Property)|Table Extract Method]] uses regular expression pattern matching to determine a tables structure based on the pattern of each row and extract cell data from each column.<section end="Row Match" />
[[XML Schema Integration (Concept)|XML Schema Integration]] is a conceptual term that refers to '''Grooper's''' ability to interact with [https://en.wikipedia.org/wiki/XML_schema XML schemas] and the configuration required to do so.
<section end="XML Schema Integration" />


== Export Type ==
=== Tabular Layout ===
<section begin="Export Type" />
<section begin="Tabular Layout" />The [[Tabular Layout]] [[Table Extract Method]] uses column header values determined by the {{DataColumnIcon}} '''[[Data Column]]s Header Extractor results (or labels collected for the '''Data Columns''' when a [[Labeling Behavior]] is enabled) as well as '''Data Column''' Value Extractor results to model a table's structure and return its values.<section end="Tabular Layout" />
<section end="Export Type" />
</div>


=== CMIS Export ===
== Value Extractor ==
<section begin="CMIS Export" />
''{{TypeName|Value Extractor}}''
<section end="CMIS Export" />


=== Data Export ===
<section begin="Value Extractor" />[[Value Extractor]]s define an operation that reads data from the text (and sometimes visual) content of a page or document. There are over 20 unique Value Extractors, each using specialized logic to return results. Value Extractors are consumed by multiple higher-level objects in Grooper (such as [[Data Element]]s, [[Extractor Node]]s, various [[Activity|Activities]] and more) to perform a diverse set of document processing duties.
<section begin="Data Export" />
:*<li class="fyi-bullet">Value Extractors return a list of one or more "[[Data Instance|data instances]]". Data instances contain both the value and its page location, which allows Grooper to highlight results in a Document Viewer.<section end="Value Extractor" />
<section end="Data Export" />
<div style="padding-left: 1.5em;">
=== Ask AI ===
''{{TypeName|Ask AI}}''


== Extractor Type ==
<section begin="Ask AI" />[[Ask AI]] is a [[Value Extractor]] that executes a chat completion using a large language model (LLM), such as OpenAI's GPT models. It uses a document's text content and user-defined instructions (a question about the document) in the chat prompt. Ask AI then returns the response as the extractor's result. Ask AI is a powerful, LLM-based extraction method, that can be used anywhere in Grooper a Value Extractor is referenced. It can complete a wide array of tasks in Grooper with simple text prompts.<section end="Ask AI" />
<section begin="Extractor Type" />
<section end="Extractor Type" />


=== Detect Signature ===
=== Detect Signature ===
<section begin="Detect Signature" />
''{{TypeName|Detect Signature}}''
<section end="Detect Signature" />
 
<section begin="Detect Signature" />[[Detect Signature]] is a [[Value Extractor]] that cant detect if a handwritten signature is present on a document.  It detects signatures within a specified rectangular region on a document page by measuring the "fill percentage" (what percentage of pixels are filled in the region).<section end="Detect Signature" />
 
=== Field Match ===
''{{TypeName|Field Match}}''
 
<section begin="Field Match" />[[Field Match]] is a [[Value Extractor]] that matches the value stored in a previously-extracted {{DataFieldIcon}} '''[[Data Field]]''' or {{DataColumnIcon}} '''[[Data Column]]'''.<section end="Field Match" />


=== Find Barcode ===
=== Find Barcode ===
<section begin="Find Barcode" />
''{{TypeName|Find Barcode}}''
<section end="Find Barcode" />
 
<section begin="Find Barcode" />[[Find Barcode]] is a [[Value Extractor]] that searches for and returns barcode values previously stored in a {{BatchFolderIcon}} [[Batch Folder]] or {{BatchPageIcon}} [[Batch Page]]'s [[Layout Data (Concept)|layout data]].
*<li class="fyi-bullet">''Note: Find Barcode differs slightly from [[Read Barcode]]. Read Barcode performs barcode recognition when the extractor executes. Find Barcode can only look up barcode data stored in the document or page's layout data. Find Barcode runs quicker than Read Barcode, but barcode values must have previously been collected in the Batch Process by the Image Processing or Recognize activities.''<section end="Find Barcode" />
 
=== GPT Complete ===
''Removed in version 2025''
 
<section begin="GPT Complete" />[[GPT Complete]] is a [[Value Extractor]] that leverages Open AI's  [https://en.wikipedia.org/wiki/Generative_pre-trained_transformer GPT] models to generate chat completions for inputs, returning one hit for each result choice provided by the model's response.
:<span style="color: red;">PLEASE NOTE</span>: GPT Complete is a deprecated Value Extractor. It uses an outdated method to call the [https://en.wikipedia.org/wiki/OpenAI OpenAI] API. Please use the [[Ask AI]] extractor going forward.<section end="GPT Complete" />


=== Highlight Zone ===
=== Highlight Zone ===
<section begin="Highlight Zone" />
''{{TypeName|Highlight Zone}}''
<section end="Highlight Zone" />
 
<section begin="Highlight Zone" />[[Highlight Zone]] is a [[Value Extractor]] that sets a highlight region on a document without performing any actual [[Data Extraction (Concept)|data extraction]]. This "extractor" is used to mark areas of interest or importance for '''Review''' users or for uncommon scenarios where a [[Data Instance|data instance]] location is needed with no actual value.<section end="Highlight Zone" />
 
=== Label Match ===
''{{TypeName|Label Match}}''
 
<section begin="Label Match" />[[Label Match]] is a [[Value Extractor]] that matches a list of one or more values using matching options defined by a [[Labeling Behavior]]. It is similar to [[List Match]] but uses shared settings defined in a Labeling Behavior for [[Fuzzy RegEx|Fuzzy Matching]], [[Vertical Wrap]], and [[Constrained Wrap]].<section end="Label Match" />


=== Labeled OMR ===
=== Labeled OMR ===
<section begin="Labeled OMR" />
''{{TypeName|Labeled OMR}}''
<section end="Labeled OMR" />
 
<section begin="Labeled OMR" />[[Labeled OMR]] is a [[Value Extractor]] used to output [https://en.wikipedia.org/wiki/Optical_mark_recognition OMR] checkbox labels. It determines whether labeled checkboxes are checked or not. If checked, it outputs the label(s) or a Boolean true/false value as the result.<section end="Labeled OMR" />


=== Labeled Value ===
=== Labeled Value ===
<section begin="Labeled Value" />
''{{TypeName|Labeled Value}}''
<section end="Labeled Value" />
 
<section begin="Labeled Value" />[[Labeled Value]] is a [[Value Extractor]] that identifies and extracts a value next to a label. This is one of the most commonly used extractors to extract data from structured documents (such as a standardized form) and static values on semi-structured documents (such as the header details on an invoice).<section end="Labeled Value" />


=== List Match ===
=== List Match ===
<section begin="List Match" />
''{{TypeName|List Match}}''
<section end="List Match" />
 
<section begin="List Match" />[[List Match]] is a [[Value Extractor]] designed to return values matching one or more items in a defined list. By default, the List Match extractor does not use or require [https://en.wikipedia.org/wiki/Regular_expression regular expression], but can be configured to utilize regular expression syntax.<section end="List Match" />


=== Ordered OMR ===
=== Ordered OMR ===
<section begin="Ordered OMR" />
''{{TypeName|Ordered OMR}}''
<section end="Ordered OMR" />
 
<section begin="Ordered OMR" />[[Ordered OMR]] is a [[Value Extractor]] used to return [https://en.wikipedia.org/wiki/Optical_mark_recognition OMR] check box information. Ordered OMR returns information for multiple check boxes within a defined zone based on their order and layout. The zone may be optionally fixed on the page or anchored to a static text value (such as a label).<section end="Ordered OMR" />


=== Pattern Match ===
=== Pattern Match ===
<section begin="Pattern Match" />
''{{TypeName|Pattern Match}}''
<section end="Pattern Match" />
 
<section begin="Pattern Match" />[[Pattern Match]] is a [[Value Extractor]] that extracts values from a document that match a specified [https://en.wikipedia.org/wiki/Regular_expression regular expression], providing data collection following a known format or pattern.<section end="Pattern Match" />
 
=== Query HTML ===
''{{TypeName|Query HTML}}''
 
<section begin="Query HTML" />[[Query HTML]] is a [[Value Extractor]] specialized for [https://www.w3schools.com/html/ HTML] documents. It uses either [https://www.w3schools.com/css/ CSS] or [https://www.w3schools.com/xml/xpath_intro.asp XPath] selectors to return the inner text or an attribute of an HTML element.<section end="Query HTML" />


=== Read Barcode ===
=== Read Barcode ===
<section begin="Read Barcode" />
''{{TypeName|Read Barcode}}''
<section end="Read Barcode" />
 
<section begin="Read Barcode" />[[Read Barcode]] is a [[Value Extractor]] that uses barcode recognition technology to read and extract values from barcodes found in the document content.
*<li class="fyi-bullet">''Note: Read Barcode differs slightly from [[Find Barcode]]. Read Barcode performs barcode recognition when the extractor executes. Find Barcode can only look up barcode data stored in the document or page's [[Layout Data|layout data]]. Find Barcode runs quicker than Read Barcode, but barcode values must have previously been collected in the Batch Process by the Image Processing or Recognize activities.''<section end="Read Barcode" />
 
=== Read Metadata ===
''{{TypeName|Read Metadata}}''
 
<section begin="Read Meta Data" />[[Read Metadata]] is a [[Value Extractor]] retrieves  metadata values associated with a document. Read Metadata can return metadata from a {{BatchFolderIcon}} '''[[Batch Folder]]'s''' attachment file based on its MIME type, such as PDF, Word and Mail Message ('message/rfc822' or 'application/vnd.ms-outlook'). It can also return data using a Document Link in Grooper, such as a File System Link or a CMIS Document Link.<section end="Read Meta Data" />


=== Read Zone ===
=== Read Zone ===
<section begin="Read Zone" />
''{{TypeName|Read Zone}}''
<section end="Read Zone" />
 
<section begin="Read Zone" />[[Read Zone]] is a [[Value Extractor]] that allows you to extract text data in a rectangular region (called an "extraction zone" or just "zone") on a document. This can be a fixed zone, extracting text from the same location on a document, or a zone relative to a text value (such as a label) or a shape location on the document.<section end="Read Zone" />
 
=== Reference ===
''{{TypeName|Reference}}''
 
<section begin="Reference" />[[Reference]] is a [[Value Extractor]] used to reference an [[Extractor Node]]. This allows users to create re-usable extractors and use the more complex {{DataTypeIcon}} '''[[Data Type]]''' and {{FieldClassIcon}} '''[[Field Class]]''' extractors throughout Grooper.<section end="Reference" />


=== Word Match ===
=== Word Match ===
<section begin="Word Match" />
''{{TypeName|Word Match}}''
<section end="Word Match" />
 
<section begin="Word Match" />[[Word Match]] is a [[Value Extractor]] that extracts individual words or phrases from documents. It is used for [https://en.wikipedia.org/wiki/N-gram n-gram] extraction. Each gram may be optionally executed against a {{LexiconIcon}} '''[[Lexicon]]''' to ensure words and phrases only match a set vocabulary.<section end="Word Match" />


=== Zonal OMR ===
=== Zonal OMR ===
<section begin="Zonal OMR" />
''{{TypeName|Zonal OMR}}''
<section end="Zonal OMR" />


== IP Command ==
<section begin="Zonal OMR" />[[Zonal OMR]] is a [[Value Extractor]] that reads one or more [https://en.wikipedia.org/wiki/Optical_mark_recognition OMR] checkboxes using manually-configured zones. The zone may be optionally fixed on the page or anchored to a static text value (such as a label).
<section begin="IP Command" />
 
<section end="IP Command" />
BE AWARE: Zonal OMR is outdated compared to [[Labeled OMR]] and [[Ordered OMR]]. It requires the most manual setup of any OMR extractor to configure. Use this as a last resort when other OMR extractor options have been exhausted.<section end="Zonal OMR" />
</div>
 
== Import and Export Related Types ==
 
These are configuration objects in Grooper that relate to importing documents into Grooper, exporting processed content (files and data) out of Grooper, and otherwise accessing document content linked in Grooper to external file systems and content management systems.
 
This includes:
* [[#CMIS Binding|CMIS Bindings (aka "connection types")]]
* [[#Content Link|Content Links]]
* [[#Export Definition|Export Definitions]]
* [[#Import Provider|Import Providers]]
 
''Please Note: [[Import Behavior]] and [[Export Behavior]] are obviously import and export related. Because their parent type is "Behavior", they are found in the [[#Core Configuration Types|Core Configuration Types]] portion of this Glossary.''
 
*<li class="fyi-bullet"> Scripting/Advanced user info: These objects inherit from a base class called "Embedded Object". This is includes a large number of objects that exist as configurable properties.
 
== CMIS Binding ==
<section begin="CMIS Binding" />[[CMIS Binding]]s are the platform connection types for {{CMISConnectionIcon}} '''[[CMIS Connection]]s'''. The CMIS Binding establishes the communication protocols used to connect Grooper with content management systems (CMS) and file systems.
 
CMIS Bindings use the [https://en.wikipedia.org/wiki/Content_Management_Interoperability_Services CMIS] standard as a model to define connectivity. Even when connecting to CMS platforms that are not truly CMIS systems (such as a Windows file system), Grooper normalizes connection to them as if they were. This allows Grooper to use [[CMIS Import]] and [[CMIS Export]] for all storage platforms.
:*<li class="fyi-bullet" style="padding-left:20px"> You will commonly hear CMIS Binding referred to as a "CMIS connection type", "connection type", or just "connection", as in an "Exchange connection".<section end="CMIS Binding" />
<div style="padding-left: 1.5em;">
=== AppXtender ===
<section begin="AppXtender" />[[AppXtender]] is a connection option for {{CMISConnectionIcon}} '''[[CMIS Connection]]s'''. It allows Grooper to connect to the [https://en.wikipedia.org/wiki/OpenText#AppEnhancer_(formerly_ApplicationXtender) AppEnhancer (formerly ApplicationXtender)] content management system for import and export operations.<section end="AppXtender" />
 
=== Box ===
<section begin="Box" />[[Box]] is a connection option for {{CMISConnectionIcon}} '''[[CMIS Connection]]s'''. It Grooper to the [https://en.wikipedia.org/wiki/Box_(company) Box] content management system for import and export operations.<section end="Box" />
 
=== CMIS ===
<section begin="CMIS CMIS Binding" />[[CMIS (CMIS Binding)|CMIS]] is a connection option for {{CMISConnectionIcon}} '''[[CMIS Connection]]s'''. It connects Grooper to a [https://docs.oasis-open.org/cmis/CMIS/v1.0/os/cmis-spec-v1.0.html CMIS 1.0] or [https://docs.oasis-open.org/cmis/CMIS/v1.1/os/CMIS-v1.1-os.html CMIS 1.1] server for import and export operations. This can be used to connect to CMS platforms that implement the CMIS protocol [https://en.wikipedia.org/wiki/Content_Management_Interoperability_Services#List_of_implementations such as these].<section end="CMIS CMIS Binding" />
 
=== Exchange ===
<section begin="Exchange" />[[Exchange]] is a connection option for {{CMISConnectionIcon}} '''[[CMIS Connection]]s'''. It connects Grooper to [https://en.wikipedia.org/wiki/Microsoft_Exchange_Server Microsoft Exchange] email servers (including Outlook servers) for import and export operations.<section end="Exchange" />
 
=== FTP ===
<section begin="FTP" />[[FTP]] is a connection option for {{CMISConnectionIcon}} '''[[CMIS Connection]]s'''. It connects Grooper to [https://en.wikipedia.org/wiki/File_Transfer_Protocol FTP] directories for import and export operations.<section end="FTP" />
 
=== IMAP ===
<section begin="IMAP" />[[IMAP]] is a connection option for {{CMISConnectionIcon}} '''[[CMIS Connection]]s'''. It connects Grooper to email messages and folders through an [https://en.wikipedia.org/wiki/Internet_Message_Access_Protocol IMAP] email server for import and export operations.<section end="IMAP" />
 
=== NTFS ===
<section begin="NTFS" />[[NTFS]] is a connection option for {{CMISConnectionIcon}} '''[[CMIS Connection]]s'''. It connects Grooper to files and folders in the Microsoft Windows [https://en.wikipedia.org/wiki/NTFS NTFS] file system for import and export operations.<section end="NTFS" />


=== Barcode Detection ===
=== OneDrive ===
<section begin="Barcode Detection" />
<section begin="OneDrive" />[[OneDrive]] is a connection option for {{CMISConnectionIcon}} '''[[CMIS Connection]]s'''. It connects Grooper to [https://en.wikipedia.org/wiki/OneDrive Microsoft OneDrive] cloud services for import and export operations.<section end="OneDrive" />
<section end="Barcode Detection" />


=== Binarize ===
=== SFTP ===
<section begin="Binarize" />
<section begin="SFTP" />[[SFTP]] is a connection option for {{CMISConnectionIcon}} '''[[CMIS Connection]]s'''. It connects Grooper to [https://en.wikipedia.org/wiki/SSH_File_Transfer_Protocol SFTP] directories for import and export operations.<section end="SFTP" />
<section end="Binarize" />


=== Extract Page ===
=== SharePoint ===
<section begin="Extract Page" />
<section begin="SharePoint" />[[SharePoint]] is a connection option for {{CMISConnectionIcon}} '''[[CMIS Connection]]s'''. It Grooper to [https://en.wikipedia.org/wiki/SharePoint Microsoft SharePoint], providing access to content stored in "document libraries" and "picture libraries" for import and export operations.<section end="SharePoint" />
<section end="Extract Page" />
</div>


=== Line Removal ===
== Content Link ==
<section begin="Line Removal" />
{{TypeName|Content Link}}
<section end="Line Removal" />


=== Scratch Removal ===
<section begin="Content Link" />[[Content Link]]s define references to files or folders stored outside of Grooper, such as in a Windows folder or in a CMIS Repository.
<section begin="Scratch Removal" />
:*<li class="fyi-bullet" style="padding-left:20px"> Content Link has two sub-types: Document Link and Folder Link. There are 9 types of "Document Link" and only 1 type of "Folder Link". Due to this, Document Link is a more common term than "Content Link".<section end="Content Link" />
<section end="Scratch Removal" />
<div style="padding-left: 1.5em;">
=== Document Links ===
{{TypeName|Document Link}}
<div style="padding-left: 1.5em;">
==== CMIS Document Link ====
{{TypeName|CMIS Document Link}}
==== File System Link ====
{{TypeName|File System Link}}
==== FTP Link ====
{{TypeName|FTP Link}}
==== HTTP Link ====
{{TypeName|HTTP Link}}
==== Mail Link ====
{{TypeName|Mail Link}}
==== PST Link ====
{{TypeName|PST Link}}
==== SFTP Link ====
{{TypeName|SFTP Link}}
==== Subfile Link ====
{{TypeName|Subfile Link}}
==== ZIP Link ====
{{TypeName|FTP Link}}
</div>
=== Folder Links ===
{{TypeName|Folder Link}}
<div style="padding-left: 1.5em;">
==== CMIS Folder Link ====
{{TypeName|CMIS Folder Link}}
</div>


=== Shape Detection ===
== Export Definition ==
<section begin="Shape Detection" />
<section begin="Export Definition" />[[Export Behavior]]s are defined by adding and configuring one or more '''Export Definitions''' (See [[Export Definition Types]] or the [[Export#Export Definitions|Export Definitions]] section of the Export article). An Export Definition defines export parameters to external systems, such as [https://en.wikipedia.org/wiki/File_system file systems], [https://en.wikipedia.org/wiki/Content_management_system content management repositories], [https://en.wikipedia.org/wiki/Database databases], or [https://en.wikipedia.org/wiki/Message_transfer_agent mail servers].<section end="Export Definition" />
<section end="Shape Detection" />
<div style="padding-left: 1.5em;">
=== CMIS Export ===
<section begin="CMIS Export" />[[CMIS Export]] is an [[Export Definition]] available when configuring an [[Export Behavior]].  It exports content over a {{CMISConnectionIcon}} '''[[CMIS Connection]]''', allowing users to export documents and their [https://en.wikipedia.org/wiki/Metadata metadata] to various [https://en.wikipedia.org/wiki/On-premises_software on-premise] and [https://en.wikipedia.org/wiki/Cloud_storage cloud-based storage platforms].<section end="CMIS Export" />


=== Shape Removal ===
=== Data Export ===
<section begin="Shape Removal" />
<section begin="Data Export" />[[Data Export]] is an [[Export Definition]] available when configuring an [[Export Behavior]].  It exports extracted document data over a {{DataConnectionIcon}} '''[[Data Connection]]''', allowing users to export data to a [https://en.wikipedia.org/wiki/Microsoft_SQL_Server Microsoft SQL Server] or [https://en.wikipedia.org/wiki/Open_Database_Connectivity ODBC] compliant [https://en.wikipedia.org/wiki/Database database].<section end="Data Export" />
<section end="Shape Removal" />
</div>


== Import Provider ==
== Import Provider ==
<section begin="Import Provider" />
''{{TypeName|Import Provider}}''
<section end="Import Provider" />


<section begin="Import Provider" />[[Import Provider]]s enable Grooper to import file-based content from numerous sources, including Windows file systems, SFTP file systems, mail servers and various content management systems (CMS). An Import Provider is selected and configured when configuring "'''Import Jobs'''". Import Jobs are submitted in one of two ways:
* '''By a user from the Imports page''': Ad-hoc or "user directed" Import Jobs are submitted from the [[Imports Page]], using the "Submit Import Job" button.
* '''From an Import Watcher service''': Automated or "scheduled" Import Jobs are submitted by an '''[[Import Watcher]]''' service according to its Poling Loop or Specific Times specification.
In both cases, an Import Provider is selected and configured using using the "Provider" property.<section end="Import Provider" />
<div style="padding-left: 1.5em;">
=== CMIS Import ===
=== CMIS Import ===
<section begin="CMIS Import" />
''{{TypeName|CMIS Import Base}}''
<section end="CMIS Import" />
 
<section begin="CMIS Import" />[[CMIS Import]] refers to two [[Import Provider]]s used to import content from {{CMISRepositoryIcon}} '''[[CMIS Repository|CMIS Repositories]]''': [[Import Descendants]] and [[Import Query Results]].  CMIS Imports allow users to import from various on-premise and cloud based storage platforms (including Windows folders, Outlook inboxes, Box accounts, AppEnhancer applications and more).<section end="CMIS Import" />
 
<div style="padding-left:1.5em">
==== Import Descendants ====
''{{TypeName|Import Descendants}}''
 
<section begin="Import Descendants" />[[Import Descendants]] is one of two [[Import Provider]]s that use {{CMISConnectionIcon}} '''[[CMIS Connection]]s''' to import document content into '''Grooper'''.  Import Descendants imports files from a {{CMISRepositoryIcon}} '''[[CMIS Repository]]''' folder location, including any files in any sub-folders (i.e. all "descendant" files).<section end="Import Descendants" />
 
==== Import Query Results ====
''{{TypeName|Import Query Results}}''
 
<section begin="Import Query Results" />[[Import Query Results]] is one of two [[Import Provider]]s that use {{CMISConnectionIcon}} '''[[CMIS Connection]]s''' to import document content into '''Grooper'''. Import Query Results imports files from a {{CMISRepositoryIcon}} '''[[CMIS Repository]]''' that match a "[[CMIS Query|CMISQL query]]" (a specialized query language based on SQL database queries).<section end="Import Query Results" />
</div>
 
=== File System Import ===
''{{TypeName|File System Import}}''
 
<section begin="File System Import" />[[File System Import]] refers to a Legacy [[Import Provider]] used to import documents directly from your Windows File System into Grooper.<section end="File System Import" />
 
=== HTTP Import ===
''{{TypeName|HTTP Import}}''
 
<section begin="HTTP Import" />[[HTTP Import]] is an [[Import Provider]] used to import web-based content (web pages and files hosted on an HTTP server). HTTP Import can be used to ingest individual web pages, defined portions of a website or entire websites into Grooper.<section end="HTTP Import" />
 
=== Test Batch ===
''{{TypeName|Test Batch}}''
 
<section begin="Test Batch" />"'''[[Test Batch]]'''" is a specialized [[Import Provider]] designed to facilitate the import of content from an existing {{BatchIcon}} [[Batch]] in the test environment. This provider is most commonly used for testing, development, and validation scenarios, and is not intended for production use.
:*<li class="fyi-bullet">Looking for information on "production" vs "test" Batches in Grooper? [[Batch (Node Type)#Test Batches vs production Batches|See here.]]<section end="Test Batch" />
</div>
 
== Misc Properties and Other Configuration Types ==
<div style="padding-left: 1.5em;">
=== AI Generator/Generators ===
<section begin="AI Generator" />[[AI Generator]]s create custom documents using the results of a [[Search Page]] query and a large language model (LLM). Both document content and instructions are fed to the LLM to produce a text-based file.
*<li class="fyi-bullet">AI Generators are added and configured using an [[Indexing Behavior]]'s "Generators" property and editor. They are executed from the [[Search Page]] using the "Download" command and "Download Custom" format. <section end="AI Generator" />
 
=== CMISQL Query/CMIS Query ===
''{{TypeName|CMISQL Query}}''
 
<section begin="CMIS Query" />A [[CMISQL Query]] (aka CMIS Query) is Grooper's way of searching for documents in [[CMIS Repository|CMIS Repositories]]. Commonly, CMISQL Queries are used by [[Import Query Results]] to import documents from a CMIS Repository. CMISQL Queries are also used by [[CMIS Lookup]] to lookup data from a CMIS Repository. CMISQL Queries are based on a subset of the SQL-92 syntax for querying databases, with some specialized extensions added to support querying CMIS sources.
*<li class="fyi-bullet"> CMISQL Queries are configured using the "CMIS Query" property found in "Import Query Results" and "CMIS Lookup".<section end="CMIS Query" />
 
=== Paragraph Marker/Paragraph Marking ===
''{{TypeName|Paragraph Marker}}''


=== Import Descendants ===
<section begin="Paragraph Marking" />[[Paragraph Marking]] is a component of Grooper's [[Text Preprocessor]]. It enables the "Paragraph Marker", which detects paragraph boundaries and marks them by altering the normal [https://en.wikipedia.org/wiki/Carriage_return carriage return] and [https://en.wikipedia.org/wiki/Newline new line feed] pairs at the end of each line. Instead of placing like breaks at the end of each line, the Paragraph Marker places them at the end of each paragraph. This produces a normalized text flow, making it easier to extract values that span lines.
<section begin="Import Descendants" />
*<li class="fyi-bullet"> "Paragraph Marker" is the embedded object that actually performs paragraph detection and marking in Grooper. "Paragraph Marking" is the property that enables the Paragraph Marker and allows users to configure it.<section end="Paragraph Marking" />
<section end="Import Descendants" />


=== Import Query Results ===
=== Preprocessing/Text Preprocessor ===
<section begin="Import Query Results" />
''{{TypeName|Text Preprocessor}}''
<section end="Import Query Results" />


== Lookup ==
<section begin="Text Preprocessor" />Grooper's "[[Text Preprocessor]]" adjusts how raw text is formatted before extraction. It manipulates control characters (such as CR/LF pairs) to allow regular expression patterns to match (or ignore) structural elements, such as line breaks, paragraph boundaries and tab markers. The Text Preprocessor executes the following:
<section begin="Lookup" />
* [[Paragraph Marking]]
<section end="Lookup" />
* [[Tab Marking]]
* [[Text Preprocessor#Vertical Tab Marking|Vertical Tab Marking]]
* [[Text Preprocessor#Ignore Control Characters|Ignore Control Characters]]
*<li class="fyi-bullet"> "Text Preprocessor" is the embedded object that actually performs paragraph detection and marking in Grooper. The Text Preprocessor can be enabled and configured by various items (mostly extractors such as [[Pattern Match]]) using either a "Preprocessing" or "Preprocessing Options" property.<section end="Text Preprocessor" />


=== CMIS Lookup ===
=== Permission Set/Permission Sets ===
<section begin="CMIS Lookup" />
''{{TypeName|Permission Set}}''
<section end="CMIS Lookup" />
 
<section begin="Permission Sets" />[[Permission Sets]] define security permissions in a Grooper Repository for a user or group. This allows you to restrict user access to specified Grooper pages (such as the [[Design Page]]) and Grooper [[Command]]s.
*<li class="fyi-bullet">"Permission Set" is the embedded object that defines security principles. They are added to a Grooper Repository and configured using the "Permission Sets" property found on the {{IconName|Root}} Root node.<section end="Permission Sets" />
 
=== Quoting Method/Document Quoting ===
''{{TypeName|Quoting Method}}''
 
<section begin="Quoting Method" />[[Quoting Method]]s provide various mechanisms to feed "quotes" from a document to an AI model for Grooper's LLM-based features. Quoting Methods control what text is fed to the AI, allowing users to feed the AI only the necessary context needed to respond or reduce costs by reducing the amount of input tokens sent to the LLM service. Depending on which Quoting Method is selected and configured, the quote may be the entire document text, a portion of a document's text, data extracted from the document, layout data, or a combination of this data.
*<li class="fyi-bullet"> "Quoting Method" is class of embedded objects that feed quotes to an LLM. Quoting Methods are selected and configured by various items (including [[AI Extract]]) using a "Document Quoting" property.<section end="Quoting Method" />
 
=== Variable Definition ===
''{{TypeName|Variable Definition}}''
 
<section begin="Variable Definition" />'''[[Variable Definition]]s''' define a variable with a computed value that can be called by various code expressions. Variable Definitions are added to Data Models, Data Sections and Data Tables using their "Variables" property
:'''Used By:''' [[Data Model]], [[Data Section]], [[Data Table]]<section end="Variable Definition" />
 
=== Vertical Wrap Detection/Vertical Wrap ===
<section begin="Vertical Wrap Detection" />[[Vertical Wrap Detection]] enables simplified extraction of multi-line text segments that are stacked vertically within a document. Vertical Wrap Detection can be used by Content Types configured with a [[Labeling Behavior]] and by the [[List Match]] and [[Label Match]] Value Extractors.
*<li class="fyi-bullet"> "Vertical Wrap Detection" is the embedded object that actually performs wrap detection in Grooper. Vertical Wrap Detection is enabled and configured with the "Vertical Wrap" property found in configuration items that support it.<section end="Vertical Wrap Detection" />
 
=== Properties ===
<section begin="Property" />A property is a mechanism by which an object in '''Grooper''' is configured that affects how the object performs its function.<section end="Property" />
<div style="padding-left: 1.5em;">
==== Alignment ====
<section begin="Alignment" />"[[Alignment]]" refers to how Grooper highlights text from an AI response on a document in a [[Document Viewer]]. Alignment properties can be configured to alter how Grooper highlights results when using LLM-based extraction methods, such as [[AI Extract]].<section end="Alignment" />


=== Database Lookup ===
==== Confidence Multiplier and Output Confidence ====
<section begin="Database Lookup" />
<section begin="Confidence Multiplier and Output Confidence" />Some results carry more weight than others.  The [[Confidence Multiplier and Output Confidence (Property)|Confidence Multiplier]] and [[Confidence Multiplier and Output Confidence (Property)|Output Confidence]] properties allow you to manually adjust an [[Data Extraction (Concept)|extraction]] result's confidence.<section end="Confidence Multiplier and Output Confidence" />
<section end="Database Lookup" />


=== Web Service Lookup ===
==== Constrained Wrap ====
<section begin="Web Service Lookup" />
<section begin="Constrained Wrap" />The [[Constrained Wrap]] property allows certain [[Value Extractor]]s and the [[Labeling Behavior]] to match values which wrap from one line to the next inside a box (such as a [https://en.wikipedia.org/wiki/Table_cell table cell]).<section end="Constrained Wrap" />
<section end="Web Service Lookup" />


== Object ==
==== Content Type Filter ====
<section begin="Object" />
<section begin="Content Type Filter" />The [[Content Type Filter]] property restricts [[Activity|Activities]] to specific {{ContentCategoryIcon}} '''[[Content Category|Content Categories]]''' and/or {{DocumentTypeIcon}} '''[[Document Type]]s'''.<section end="Content Type Filter" />
<section end="Object" />


=== Batch ===
==== Import Mode ====
<section begin="Batch" />
<section begin="Import Mode" />[[Import Mode]] is a configurable property for [[CMIS Import]] providers. This controls how file content is loaded into a [[Grooper Repository]] during an [[Import Job]]. This property is key to setting up a "Sparse" import in Grooper.<section end="Import Mode" />
<section end="Batch" />


=== Batch Folder ===
==== Output Extractor Key ====
<section begin="Batch Folder" />
<section begin="Output Extractor Key" />The [[Output Extractor Key]] property is another weapon in the arsenal of powerful '''Grooper''' [[Classification (Concept)|classification]] techniques.  It allows {{DataTypeIcon}} '''[[Data Type]]s''' to return results normalized in a way more beneficial to document classification.<section end="Output Extractor Key" />
<section end="Batch Folder" />


=== Batch Page ===
==== Parameters ====
<section begin="Batch Page" />
<section begin="Parameters" />[[Parameters]] is a collection of properties used in the configuration of LLM constructs. Temperature, TopP, Presence Penalty, and Frequency Penalty are parameters that influence text generation in models. Temperature and TopP control the diversity and probability distribution of generated text, while Presence Penalty and Frequency Penalty help manage repetition by discouraging the reuse of words or phrases.<section end="Parameters" />
<section end="Batch Page" />


=== Batch Process ===
==== Scope ====
<section begin="Batch Process" />
<section begin="Scope" />The [[Scope]] property of a {{BatchProcessStepIcon}} '''[[Batch Process Step]]''', as it relates to an [[Activity]], determines at which level in a {{BatchIcon}} '''[[Batch]]''' hierarchy the Activity runs.<section end="Scope" />
<section end="Batch Process" />


=== CMIS Connection ===
==== Secondary Types ====
<section begin="CMIS Connection" />
<section begin="Secondary Types" />[[Secondary Types]] allow the application of multiple '''[[Content Type]]s''' to a single {{BatchFolderIcon}} '''[[Batch Folder]]'''.<section end="Secondary Types" />
<section end="CMIS Connection" />


=== CMIS Repository ===
==== Tab Marking ====
<section begin="CMIS Repository" />
<section begin="Tab Marking" />[[Tab Marking]] allows you to insert [https://en.wikipedia.org/wiki/Tab_key#Tab_characters tab characters] into a document's text data.<section end="Tab Marking" />
<section end="CMIS Repository" />
</div>


=== Content Category ===
== Misc Features and Functionality ==
<section begin="Content Category" />
<div style="padding-left: 1.5em;">
<section end="Content Category" />
=== CSS Data Viewer Styling ===
<section begin="CSS Data Viewer Styling" />[[CSS Data Viewer Styling]] refers to using [https://en.wikipedia.org/wiki/CSS CSS] to custom style the Review activity's Data Viewer interface. This gives you a great deal of control over a {{DataModelIcon}} '''[[Data Model]]'s''' appearance and layout during document review.<section end="CSS Data Viewer Styling" />


=== Content Model ===
=== EDI Integration ===
<section begin="Content Model" />
<section begin="EDI Integration" />[[EDI Integration]] refers to '''Grooper's''' ability to process [https://en.wikipedia.org/wiki/Electronic_data_interchange EDI] files.<section end="EDI Integration" />
<section end="Content Model" />


=== Data Connection ===
=== Fine-Tuning for AI Extract ===
<section begin="Data Connection" />
<section end="Data Connection" />


=== Data Field ===
<section begin="Fine Tuning" />Fine-tuning is the process of further training a large language model (LLM) on a specific dataset to make it more specialized for a particular task or domain. This allows the model to adapt its general language understanding to better handle the unique vocabulary, style, and structure of the domain it's fine-tuned on.
<section begin="Data Field" />
<br>
<section end="Data Field" />
In Grooper, you can easily start fine-tuning a model based on a {{DataModelIcon}} [[Data Model]] that will facilitate better extraction when using [[AI Extract]].<section end="Fine Tuning" />


=== Data Model ===
=== Footer Rows and Footer Modes ===
<section begin="Data Model" />
<section begin="Footer Rows and Footer Modes" />A "[[Footer Rows and Footer Modes|Footer Row]]" is a row at the bottom of a {{DataTableIcon}} [[Data Table]] that displays sum totals for numerical {{DataColumnIcon}} [[Data Column]]s. This can help [[Data Viewer]] users validate data Grooper extracts for one or more Data Columns. The Data Column's "Footer Mode" controls if a sum calculation is performed or not (and if Tabular Layout's "Capture Footer Row" creates the Footer Row if and how document data is used to capture and validate the footer value).<section end="Footer Rows and Footer Modes" />
<section end="Data Model" />


=== Data Rule ===
=== Label Sets ===
<section begin="Data Rule" />
<section begin="Label Sets" />[[Label Sets]] are collections of label definitions used in Grooper to identify and extract information from documents. A label set maps document text—such as field names, headers, or column titles—to corresponding [[Data Field]], [[Data Section]], or [[Data Table]] elements in the [[Data Model]]. Label sets are essential for automating extraction and classification, especially in environments where document layouts and terminology may vary.<section end="Label Sets" />
<section end="Data Rule" />


=== Data Section ===
=== URL Endpoints for Review ===
<section begin="Data Section" />
<section begin="URL Endpoints for Review" />Three different URL endpoints can be used to open [[Review (Activity)|Review]] tasks in the '''[[Grooper Web Client]]''', given certain information like the '''Grooper''' Repository ID, {{BatchProcessIcon}} '''[[Batch Process]]''' name, {{BatchIcon}} '''[[Batch]]''' Id and more. This allows Grooper users to link directly to a '''Batch''' in Review with a URL.<section end="URL Endpoints for Review" />
<section end="Data Section" />


=== Data Table ===
=== XML Schema Integration ===
<section begin="Data Table" />
<section begin="XML Schema Integration" />[[XML Schema Integration]] refers to '''Grooper's''' ability to use [https://en.wikipedia.org/wiki/XML_schema XML schemas] to build '''[[Data Model]]s''', extract XML documents, and more.<section end="XML Schema Integration" />
<section end="Data Table" />
</div>


=== Data Type ===
== UI Element ==
<section begin="Data Type" />
<section begin="UI Element" />A UI Element is a portion of the '''Grooper''' interface that allows users to interact with or otherwise receive information about the application.<section end="UI Element" />
<section end="Data Type" />
<div style="padding-left: 1.5em;">
=== Data Inspector ===
<section begin="Data Inspector" />The Grooper [[Data Inspector]] is a UI Element that can be found anywhere there is a [[Document Viewer]] showing extraction results. This UI Element allows a user to inspect the [[Data Instance]] hierarchies of an extracted result.<section end="Data Inspector" />
=== Design Page ===
''GrooperReview.Pages.Design.DesignPage''


=== Document Type ===
<section begin="Design Page" />The [[Design Page]] is the primary user interface for Grooper configuration. It is the central workplace for Grooper designers and administrators. From the Design page, users create, test and administer nodes in a Grooper Repository.<section end="Design Page" />
<section begin="Document Type" />
=== Document Viewer ===
<section end="Document Type" />
<section begin="Document Viewer" />The Grooper [[Document Viewer]] is the portal to your documents. It is the UI that allows you to see a {{BatchFolderIcon}} '''[[Batch Folder]]'s''' (or a {{BatchPageIcon}} '''[[Batch Page]]'s''') image, text content, and more.<section end="Document Viewer" />


=== Field Class ===
=== Node Tree ===
<section begin="Field Class" />
<section begin="Node Tree" />The [[Node Tree]] is the hierarchical list of Grooper node objects found in the left panel in the Design Page.  It is the basis for navigation and creation in the Design Page.<section end="Node Tree" />
<section end="Field Class" />


=== File Store ===
=== Overrides ===
<section begin="File Store" />
<section begin="Overrides" />[[Overrides]] is a tab provided to allow overriding of default properties set to a '''[[Data Element]]'''.<section end="Overrides" />
<section end="File Store" />


=== Form Type ===
=== Search Page ===
<section begin="Form Type" />
<section begin="Search Page" />The [[AI Search and the Search Page|Search Page]] allows users to leverage [[AI Search and the Search Page|AI Search]] indexes to query indexed documents. Both full text and metadata searches are supported, with feature rich querying and filtering capabilities. Users can interact with search results in several ways. They can view documents in the [[Document Viewer]], review documents' extracted data, create new {{IconName|Batch}} [[Batch]]es from the result set, submit [[Activity Processing|processing jobs]], start a conversation with an {{IconName|AI Assistant}} [[AI Assistant]] and more.<section end="Search Page" />
<section end="Form Type" />


=== IP Profile ===
=== Scan Viewer ===
<section begin="IP Profile" />
<section begin="Scan Viewer" />The [[Scan Viewer]] is a user interface that can be added to the user-attended {{IconName|person_search}} [[Review]] step in a {{BatchProcessIcon}} '''[[Batch Process]]'''. It is used to scan documents into {{BatchIcon}} '''[[Batch]]es''' from one or more scanning workstations.<section end="Scan Viewer" />
<section end="IP Profile" />


=== Lexicon ===
=== Summary Tabs ===
<section begin="Lexicon" />
<section begin="Summary Tabs" />{{ContentModelIcon}} '''[[Content Model]]s''' and {{ContentCategoryIcon}} '''[[Content Category|Content Categories]]''' have a [[Summary Tabs|Summary]] tab where you can view "Descendant Node Types", {{DocumentTypeIcon}} '''[[Document Type]]s''', and '''[[Expressions]]'''.<section end="Summary Tabs" />
<section end="Lexicon" />
</div>


=== Machine ===
== Other ==
<section begin="Machine" />
<section end="Machine" />


=== OCR Profile ===
== Concepts ==
<section begin="OCR Profile" />
<section begin="Concept" />There are many objects and properties a user can configure in '''Grooper''', however, gaining an understanding how, why, and when to use these objects and properties is powered by one's understanding of the underlying concepts that define what what these objects and properties are doing and why.<section end="Concept" />
<section end="OCR Profile" />
<div style="padding-left: 1.5em;">
=== Activity Processing ===
<section begin="Activity Processing Concept" />[[Activity Processing]] is the execution of a sequence of configured tasks which are performed within a {{BatchProcessIcon}} '''[[Batch Process]]''' to transform raw data from documents into structured and actionable information. Tasks are defined by Grooper [[Activity|Activities]], configurated to perform document [[Classification (Concept)|classification]], [[Data Extraction (Concept)|extraction]], or data enhancement.<section end="Activity Processing Concept" />


=== Object Library ===
=== CMIS+ ===
<section begin="Object Library" />
<section begin="CMIS+" />[[CMIS+]] is a conceptual term that refers to '''Grooper's''' connectivity architecture to external storage platforms. CMIS+ standardizes connections to a variety of content management system based on the [https://en.wikipedia.org/wiki/Content_Management_Interoperability_Services CMIS] standard. This provides a standardized setup to allow Grooper to interoperate with both CMIS compliant systems and non-CMIS systems. It further provides normalized access to document content and metadata for import ([[CMIS Import]]) and export ([[CMIS Export]]) operations.<section end="CMIS+" />
<section end="Object Library" />


=== Page Type ===
=== CMIS ===
<section begin="Page Type" />
<section begin="CMIS" />[[CMIS]] ([https://en.wikipedia.org/wiki/Content_Management_Interoperability_Services Content Management Interoperability Services]) is open standard allowing different [https://en.wikipedia.org/wiki/Content_management_system content management systems] to "interoperate", sharing files, folders and their metadata as well as programmatic control of the platform over the internet.<section end="CMIS" />
<section end="Page Type" />


=== Processing Queue ===
=== Classification ===
<section begin="Processing Queue" />
<section begin="Classification" />[[Classification]] is the process of identifying and organizing documents into categorical types based on their content or layout. Classification is key for efficient document management and [[Data Extraction (Concept)|data extraction]] workflows. Grooper has different methods for classifying documents. These include methods that use machine learning and text pattern recognition. In a Grooper '''Batch Process''', the [[Classify]] [[Activity]] will assign a '''[[Content Type]]''' to a {{BatchFolderIcon}} '''[[Batch Folder]]'''.<section end="Classification" />
<section end="Processing Queue" />


=== Project ===
=== Code Expressions ===
<section begin="Project" />
<section begin="Code Expressions" />[[Code Expressions]] (not to be confused with [https://en.wikipedia.org/wiki/Regular_expression regular expressions]) are snippets of [https://en.wikipedia.org/wiki/Visual_Basic_(.NET) VB.NET] code that expand '''Grooper's''' core functionality.<section end="Code Expressions" />
<section end="Project" />


=== Review Queue ===
=== Data Context ===
<section begin="Review Queue" />
<section begin="Data Context" />[[Data Context]] refers to contextual information used to extract data, such as a label that identifies the value you want to collect.<section end="Data Context" />
<section end="Review Queue" />


=== Scanner Profile ===
=== Data Extraction ===
<section begin="Scanner Profile" />
<section begin="Data Extraction" />[[Data Extraction]] involves identifying and capturing specific information from documents (represented by {{BatchFolderIcon}} '''[[Batch Folder]]s''' in '''Grooper'''). Extraction is performed by configurable [[Data Extractor]]s, which transform unstructured or semi-structured data into a structured, usable format for processing and analysis.<section end="Data Extraction" />
<section end="Scanner Profile" />


=== Separation Profile ===
=== Data Extractor ===
<section begin="Separation Profile" />
<section begin="Data Extractor" />[[Data Extractor]] (or just "extractor") refers to all [[Value Extractor]]s and [[Extractor Node]]s. Extractors define the logic used to return data from a document's text content, including general data (such as a date) and specific data (such as an agreement date on a contract).<section end="Data Extractor" />
<section end="Separation Profile" />


=== Value Reader ===
=== Data Instance ===
<section begin="Value Reader" />
<section begin="Data Instance" />A [[Data Instance]] is an encapsulation of text data within a document returned by '''Grooper's''' [[Data Extractor|extractors]]. Data instances are the hierarchy of text data created by '''Grooper's''' extractors.<section end="Data Instance" />
<section end="Value Reader" />


== Property ==
=== Expressions ===
<section begin="Property" />
<section begin="Expressions" />[[Expressions]] (not to be confused with [https://en.wikipedia.org/wiki/Regular_expression regular expressions]) are snippets of [https://en.wikipedia.org/wiki/Visual_Basic_(.NET) VB.NET] code that expand '''Grooper's''' core functionality.<section end="Expressions" />
<section end="Property" />


=== Confidence Multiplier and Output Confidence ===
=== Expressions Cookbook ===
<section begin="Confidence Multiplier and Output Confidence" />
<section begin="Expressions Cookbook" />The "[[Expressions Cookbook]]" is a reference list for commonly used [[Code Expressions]] in '''Grooper'''.<section end="Expressions Cookbook" />
<section end="Confidence Multiplier and Output Confidence" />


=== Constrained Wrap ===
=== Field Mapping ===
<section begin="Constrained Wrap" />
<section begin="Field Mapping" />[[Field Mapping]] refers to how logical connections are made between [https://en.wikipedia.org/wiki/Metadata metadata] content in '''Grooper''' and an external storage platform.<section end="Field Mapping" />
<section end="Constrained Wrap" />


=== Content Type Filter ===
=== Five Phases of Grooper ===
<section begin="Content Type Filter" />
<section begin="Five Phases of Grooper" />The "[[Five Phases of Grooper]]" is a conceptual term that seeks to build understanding of how documents are processed through '''Grooper'''.<section end="Five Phases of Grooper" />
<section end="Content Type Filter" />


=== OCR Engine ===
=== Flow Collation ===
<section begin="OCR Engine" />
<section begin="Flow Collation" />"[[Flow Collation]]" refers to the text-flow based layout option used by various [[Collation Provider]]s for{{DataTypeIcon}} '''[[Data Type]]''' extractors.<section end="Flow Collation" />
<section end="OCR Engine" />


=== Output Extractor Key ===
=== Fuzzy RegEx ===
<section begin="Output Extractor Key" />
<section begin="Fuzzy RegEx" />[[Fuzzy RegEx]] is '''Grooper's''' use of [https://en.wikipedia.org/wiki/Fuzzy_logic fuzzy logic] within [[Value Extractor]]s that leverage [https://en.wikipedia.org/wiki/Regular_expression regular expressions] to match patterns. Fuzzy RegEx allows extractors to overcome defects in a document's OCR results to accurately return results. Fuzzy RegEx is enabled by enabling the Fuzzy Matching property.<section end="Fuzzy RegEx" />
<section end="Output Extractor Key" />


=== Paragraph Marking ===
=== GPT Integration ===
<section begin="Paragraph Marking" />
<section begin="GPT Integration" />Grooper's [[GPT Integration]] is refers to the usage of [https://en.wikipedia.org/wiki/OpenAI OpenAI's] [https://en.wikipedia.org/wiki/Generative_pre-trained_transformer GPT] models within '''Grooper''' to enhance the capabilities of [[Data Extractor (Concept)|data extractors]], [[Classification (Concept)|classification]], and lookups.<section end="GPT Integration" />
<section end="Paragraph Marking" />


=== Permission Sets ===
=== Grooper Infrastructure ===
<section begin="Permission Sets" />
<section begin="Grooper Infrastructure" />[[Grooper Infrastructure]] refers to the computing underpinnings of what makes up a [[Grooper Repository]] and the software that allows the Grooper platform to automate tasks and users to interface with it.<section end="Grooper Infrastructure" />
<section end="Permission Sets" />


=== Scope ===
=== Grooper Repository ===
<section begin="Scope" />
<section begin="Grooper Repository" />A '''[[Grooper Repository]]''' is the environment used to create, configure and execute objects in Grooper. It provides the framework to "do work" in Grooper. Fundamentally, a '''Grooper Repository''' is a connection to a database and file store location, which store the node configurations and their associated file content. The Grooper application interacts with the  '''Grooper Repository''' to automate tasks and provide the Grooper user interface.<section end="Grooper Repository" />
<section end="Scope" />


=== Secondary Types ===
=== Image Processing ===
<section begin="Secondary Types" />
<section begin="Image Processing Concept" />"[[Image Processing (Concept)|Image processing]]", as a general term, refers to software techniques that manipulate and enhance images. Image processing removes imperfections and adjusts images to improve [[OCR]] accuracy. In Grooper, images are processed primarily by two [[Activity|Activities]]:
<section end="Secondary Types" />
* [[Image Processing (Activity)|Image Processing]] - This Activity permanently adjusts the image using. It is primarily used to compensate for defects produced by a document scanner (like border artifacts and skewed images). It does so by applying [[IP Commands]] in an {{IPProfileIcon}} [[IP Profile]].
* [[Recognize]] - This Activity performs OCR. When an {{OCRProfileIcon}} [[OCR Profile]] references an {{IPProfileIcon}} IP Profile, the image will be processed temporarily. A temporary image is handed to the OCR engine and discarded once characters are recognized.
:*<li class="fyi-bullet" style="padding-left:20px">Grooper also has "computer vision" capabilities that analyze and interpret images. These capabilities are also executed during Grooper's image processing. For example, Grooper's "[[Line Removal]]" command will locate lines on an image (''computer vision''), remove those artifacts to improve OCR results during Recognize (''image processing'') and store that data for later use in Grooper (''computer vision'').<section end="Image Processing Concept" />


=== Tab Marking ===
=== LINQ to Grooper Objects ===
<section begin="Tab Marking" />
<section begin="LINQ to Grooper Objects" />[https://en.wikipedia.org/wiki/Language_Integrated_Query LINQ] is Microsoft .NET component that provides data querying capabilities to the .NET framework.  In Grooper, you can use the LINQ syntax in [[Code Expressions]] to "[[LINQ to Grooper Objects]]". This allows expressions to access information from collections of data, such as from multi-instance '''Data Sections''' or '''Data Tables'''.<section end="LINQ to Grooper Objects" />
<section end="Tab Marking" />


=== Vertical Wrap ===
=== Layout Data ===
<section begin="Vertical Wrap" />
<section begin="Layout Data" />[[Layout Data]] refers to visual information Grooper certain [[IP Command]]s collect, such as [[Line Detection and Line Removal|lines]], [[Box Detection and Box Removal|checkboxes]], [[Barcode Detection and Barcode Removal|barcodes]], and [[Shape Detection and Shape Removal|detected shapes]]. This data is stored in a "''Grooper.Layout.json''" file attached to {{BatchPageIcon}} '''[[Batch Page]]s'''. Layout data is used by certain extractors and other features that rely on the presence of that data to function.<section end="Layout Data" />
<section end="Vertical Wrap" />


== Section Extract Method ==
=== Microfiche Processing ===
<section begin="Section Extract Method" />
<section begin="Microfiche Processing" />[[Microfiche Processing]] refers to Grooper's suite of specialized [[Activity|Activities]] and [[IP Command]]s that process [https://en.wikipedia.org/wiki/Microform#Microfiche microfiche] documents.<section end="Microfiche Processing" />
<section end="Section Extract Method" />


=== Nested Table ===
=== Microsoft Office Integration ===
<section begin="Nested Table" />
<section begin="Microsoft Office Integration" />Grooper's [[Microsoft Office Integration]] allows the platform to easily convert [https://en.wikipedia.org/wiki/Microsoft_Word Microsoft Word] and [https://en.wikipedia.org/wiki/Microsoft_Excel Microsoft Excel] files into formats that Grooper can read natively (PDF and CSV).<section end="Microsoft Office Integration" />
<section end="Nested Table" />


=== Transaction Detection ===
=== Mixed Classification ===
<section begin="Transaction Detection" />
<section begin="Mixed Classification" />"[[Mixed Classification]]" refers to leveraging a [[Classify Method]] and "rules" defined on a {{DocumentTypeIcon}} '''[[Document Type]]''' to overcome the shortcomings of an individual method.<section end="Mixed Classification" />
<section end="Transaction Detection" />


== Separation Provider ==
=== OCR ===
<section begin="Separation Provider" />
<section begin="OCR" />[[OCR]] is stands for [https://en.wikipedia.org/wiki/Optical_character_recognition Optical Character Recognition]. It allows text on paper documents to be digitized, in order to be searched or edited by other software applications. OCR converts typed or printed text from digital images of physical documents into machine readable, encoded text.<section end="OCR" />
<section end="Separation Provider" />


=== Separation Provider ===
=== OCR Synthesis ===
<section begin="Separation Provider" />
<section begin="OCR Synthesis" />[[OCR Synthesis]] refers to a suite of [[OCR]] related functionality unique to Grooper. The OCR Synthesis suite will pre-process and re-process raw results from the [[OCR Engine]] and synthesize its results into a single, more accurate OCR result.<section end="OCR Synthesis" />
<section end="Separation Provider" />


=== Change in Value Separation ===
=== Object Nomenclature ===
<section begin="Change in Value Separation" />
<section begin="Object Nomenclature" />The Grooper Wiki's [[Object Nomenclature]] defines how Grooper users categorize and refer to different types of Node Objects in a '''[[Grooper Repository]]'''. Knowing what objects can be added to the Grooper [[Node Tree]] and how they are related is a critical part of understanding Grooper itself.<section end="Object Nomenclature" />
<section end="Change in Value Separation" />


=== Control Sheet Separation ===
=== PDF Page Types ===
<section begin="Control Sheet Separation" />
<section begin="PDF Page Types" />[https://en.wikipedia.org/wiki/PDF PDF] pages can be one of several [[PDF Page Types]]. "Page types" describe the kind of content in a PDF page. This informs '''Grooper''' how certain [[Activity|Activities]] should process the page. For example, "single image" pages are [[OCR|OCR'd]] by the [[Recognize]] activity, where "text only" pages have their native text extracted by Recognize.<section end="PDF Page Types" />
<section end="Control Sheet Separation" />


=== EPI Separation ===
=== Prompt Engineering ===
<section begin="EPI Separation" />
<section begin="Prompt Engineering" />"[[Prompt Engineering]]" is the process of designing and refining prompts to interact more effectively with [https://en.wikipedia.org/wiki/Large_language_model large language models (LLMs)] like [https://en.wikipedia.org/wiki/GPT-4 GPT-4]. The goal is to guide the model to produce desired outputs by carefully crafting the input queries.<section end="Prompt Engineering" />
<section end="EPI Separation" />


=== ESP Auto Separation ===
=== Regular Expression ===
<section begin="ESP Auto Separation" />
<section begin="Regular Expression" />[[Regular Expression]] (or [https://en.wikipedia.org/wiki/Regular_expression regex]) is a standard syntax designed to parse [https://en.wikipedia.org/wiki/String_(computer_science) text strings]. This is a way of finding information in text. It is the primary method by which '''Grooper''' extracts and returns data from documents.<section end="Regular Expression" />
<section end="ESP Auto Separation" />


=== Event-Based Separation ===
=== Separation ===
<section begin="Event-Based Separation" />
<section begin="Separation" />[[Separation]] is the process of taking an unorganized {{BatchIcon}} '''[[Batch]]''' of loose {{BatchPageIcon}} '''[[Batch Page]]s''' and organizing them into documents represented by {{BatchFolderIcon}} '''[[Batch Folder]]s''' in Grooper. This is done so Grooper can later assign a {{DocumentTypeIcon}} '''[[Document Type]]''' to each document folder in a process known as "[[Classification (Concept)|classification]]".<section end="Separation" />
<section end="Event-Based Separation" />


=== Multi Separator ===
=== TF-IDF ===
<section begin="Multi Separator" />
<section begin="TF-IDF" />[[TF-IDF]] stands for [https://en.wikipedia.org/wiki/Tf%E2%80%93idf term frequency-inverse document frequency]. It is a statistical calculation intended to reflect how important a word is to a document within a document set (or "corpus"). It is how '''Grooper''' uses [https://en.wikipedia.org/wiki/Machine_learning machine learning] for training-based document [[Classification (Concept)|classification]] (via the [[Lexical]] method) and [[Data Extraction (Concept)|data extraction]] (via the {{FieldClassIcon}} [[Field Class]] extractor).<section end="TF-IDF" />
<section end="Multi Separator" />


=== Pattern-Based Separation ===
=== Table Extraction ===
<section begin="Pattern-Based Separation" />
<section begin="Table Extraction" />"[[Table Extraction]]" refers to '''Grooper's''' ability to extract data from cells in tables on documents.  This is accomplished by configuring the {{DataTableIcon}} '''[[Data Table]]''' and its child {{DataColumnIcon}} '''[[Data Column]]''' elements in a {{DataModelIcon}} '''[[Data Model]]'''.<section end="Table Extraction" />
<section end="Pattern-Based Separation" />


=== Undo Separation ===
=== Thread ===
<section begin="Undo Separation" />
<section begin="Thread" />A [[Thread]] is the smallest unit of processing that can be performed within an [https://en.wikipedia.org/wiki/Operating_system operating system]. In Grooper, threads are allocated for processing by [[Activity Processing]] services.<section end="Thread" />
<section end="Undo Separation" />


== Service ==
=== Training-Based Approaches to Document Classification ===
<section begin="Service" />
<section begin="Training-Based Approaches to Document Classification" />"[[Training-Based Approaches to Document Classification]]" refers to Grooper [[Classify Method]]s that classify {{BatchFolderIcon}} '''[[Batch Folder]]s''' using document examples for each {{DocumentTypeIcon}} '''[[Document Type]]'''. The Classify activity then assigns unclassified '''Batch Folders''' a '''Document Type''' based on how similar it is to the '''Document Type's''' training data.<section end="Training-Based Approaches to Document Classification" />
<section end="Service" />


=== API Services ===
=== Training Batch ===
<section begin="API Services" />
<section begin="Training Batch" />The '''[[Training Batch]]''' is a special {{BatchIcon}} '''[[Batch]]''' created when training document examples using the [[Lexical]] classification method. The '''Training Batch''' service two purposes: (1) It is a '''Batch''' that holds all previously trained {{BatchFolderIcon}} '''[[Batch Folder]]s'''. Designers can go to this '''Batch''' to view these documents and copy and paste them into other '''Batches''' if needed. (2) '''Batch Folders''' in the '''Training Batch''' will be used to re-train the '''Content Model's''' classification data when the Rebuild Training command is executed.<section end="Training Batch" />
<section end="API Services" />


=== Activity Processing ===
=== UNC Path ===
<section begin="Activity Processing" />
<section begin="UNC Path" />[[UNC Path]] is a conceptual term that refers to [https://en.wikipedia.org/wiki/Path_(computing)#UNC UNC (Universal Naming Convention)] which is a standard used in [https://en.wikipedia.org/wiki/Microsoft_Windows Microsoft Windows] for accessing [https://en.wikipedia.org/wiki/Shared_resource shared network folders].<section end="UNC Path" />
<section end="Activity Processing" />


=== Grooper Licensing ===
=== Waterfall Classification ===
<section begin="Grooper Licensing" />
<section begin="Waterfall Classification" />[[Waterfall Classification]] is a [[Classification (Concept)|classification]] technique in Grooper that prioritizes training similarity over classification "rules" set by a {{DocumentTypeIcon}} '''Document Type's''' Positive Extractor. This can be helpful in scenarios where {{BatchFolderIcon}} '''[[Batch Folder]]s''' get misclassified and simply retraining won't help.<section end="Waterfall Classification" />
<section end="Grooper Licensing" />
</div>


== Table Extract Method ==
== Disambiguation ==
<section begin="Table Extract Method" />
<section end="Table Extract Method" />


=== Delimited Extract ===
=== Repository ===
<section begin="Delimited Extract" />
<section begin="Repository" />A "repository" is a general term in computer science referring to where files and/or data is stored and managed. In Grooper, the term "repository" may refer to:
<section end="Delimited Extract" />
* PRIMARILY a '''[[Grooper Repository]]'''. This is most commonly what people are referring to when they simply say "repository".
* Less commonly a '''[[CMIS Repository]]'''<section end="Repository" />


=== Fluid Layout ===
== Base Types ==
<section begin="Fluid Layout" />
<section end="Fluid Layout" />


=== Grid Layout ===
=== Grooper Object ===
<section begin="Grid Layout" />
<section end="Grid Layout" />


=== Row Match ===
{{HelpLink|Grooper Object}}
<section begin="Row Match" />
<section end="Row Match" />


=== Tabular Layout ===
=== Connected Object ===
<section begin="Tabular Layout" />
<section end="Tabular Layout" />


== UI Element ==
{{HelpLink|Connected Object}}
<section begin="UI Element" />
<section end="UI Element" />


=== Document Viewer ===
=== Database Row ===
<section begin="Document Viewer" />
<section end="Document Viewer" />


=== Node Tree ===
{{HelpLink|Database Row}}
<section begin="Node Tree" />
<section end="Node Tree" />


=== Overrides ===
=== Embedded Object ===
<section begin="Overrides" />
<section end="Overrides" />


=== Summary Tabs ===
{{HelpLink|Embedded Object}}
<section begin="Summary Tabs" />
<section end="Summary Tabs" />

Latest revision as of 09:39, 27 October 2025

This glossary seeks to educate readers on various Grooper terms, objects and other entities. Glossary entries will be short paragraphs describing the topic. For each glossary entry, you will find links to a full article about the entry as well as articles on associated terms.

Each entry is organized according to what major Grooper entity they belong to. For example, "Classify" is an "Activity". It is found in the "Activity" section of the Glossary.

Application

Grooper is an intelligent document processing platform that uses an array of sophisticated techniques to automate end-to-end content capture and delivery. From a technical standpoint, Grooper consists of a Grooper Repository and the applications that the support management and execution of configuration assets.

  • A Grooper Repository consists of two things: (1) A series of tables in a SQL database (containing configuration nodes and their properties) and (2) a File Store (containing files associated to nodes in the database).

The Grooper applications are as follows:

  • Grooper - The primary program files for the Grooper platform. This application will need to be installed on any Grooper web server hosting the Grooper UI and processing servers running Activity Processing services to automate task processing.
  • Grooper Command Console - This is an administrative utility that gets installed with the Grooper application.
  • Grooper Web Client - This application installs the Grooper user interface. It will need to be installed on the Grooper web server. The Grooper web server hosts the Grooper web app which is accessed via a URL.
  • Grooper Desktop - This is a lightweight application required to scan documents using the Grooper web app. It runs in the background and helps operate the Scan Viewer in Grooper. It needs to be installed on any workstation connected to a document scanner.

Grooper Command Console

Grooper Command Console is a command-line interface that performs system configuration and administration tasks within Grooper.

Grooper Web Client

The Grooper user interface is accessed using a web browser from a URL. The Grooper Web Client is the application that installs the Grooper website on a web server.

Node Types

Grooper.GrooperNode

Nodes are the main configuration objects in Grooper. They are created and accessed in the Node Tree from the Design page. The different types of nodes ("Node Types") serve different functions in Grooper. For example, "Batch" nodes are the primary container for document content. They contain "Batch Folder" nodes which represent documents and "Batch Page" nodes which represent individual pages of documents.

AI Analyst

BE AWARE: AI Analysts are obsolete as of version 2025. See AI Assistant for the new and improved version of AI Analyst. An AI Analyst facilitates the ability to interact with a document as you might with an AI chatbot.

AI Assistant

Grooper.GPT.AIAssistant

AI Assistants are Grooper's conversational AI personas. They answer questions about resources it can access (including content from documents, databases and/or web services). This greatly increases an AI's ability to answer domain-specific questions that require access to these resources.

Batch Objects

Grooper.Core.BatchObject

Batch Objects are the foundational elements of Grooper's document processing system, providing a unified structure for organizing, processing, and reviewing document content within a inventory_2 Batch. Every item within a Batch—whether a document, folder, or page—is represented as a Batch Object (and Batches themselves are Batch Objects too).

Batch

Grooper.Core.Batch

inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

Batch Folder

Grooper.Core.BatchFolder

The folder Batch Folder is an organizational unit within a inventory_2 Batch, allowing for a structured approach to managing and processing a collection of documents. Batch Folder nodes serve two purposes in a Batch. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other Batch Folders and/or contract Batch Page nodes as children.

  • Batch Folders are frequently referred to simply as "documents" or "folders" depending on how they are used in the Batch.

Batch Page

Grooper.Core.BatchPage

contract Batch Page nodes represent individual pages within a inventory_2 Batch. Batch Pages are created in one of two ways: (1) When images are scanned into a Batch using the Scan Viewer. (2) Or, when split from a PDF or TIFF file using the Split Pages activity.

  • Batch Pages are frequently referred to simply as "pages".

Batch Process

Grooper.Core.BatchProcess

settings Batch Process nodes are crucial components in Grooper's architecture. A Batch Process is the step-by-step processing instructions given to a inventory_2 Batch. Each step is comprised of a "Code Activity" or a Review activity. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Processes by themselves do nothing. Instead, they execute edit_document Batch Process Steps which are added as children nodes.
  • A Batch Process is often referred to as simply a "process".

Batch Process Step

Grooper.Core.BatchProcessStep

edit_document Batch Process Steps are specific actions within a settings Batch Process sequence. Each Batch Process Step performs an "Activity" specific to some document processing task. These Activities will either be a "Code Activity" or "Review" activities. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Process Steps are frequently referred to as simply "steps".
  • Because a single Batch Process Step executes a single Activity configuration, they are often referred to by their referenced Activity as well. For example, a "Recognize step".

CMIS Connection

Grooper.CMIS.CmisConnection

cloud CMIS Connections provide a standardized way of connecting to various content management systems (CMS). CMIS Connections allow Grooper to communicate with multiple external storage platforms, enabling access to documents and document metadata that reside outside of Grooper's immediate environment.

  • For those that support the CMIS standard, the CMIS Connection connects to the CMS using the CMIS standard.
  • For those that do not, the CMIS Connection normalizes connection and transfer protocol as if they were a CMIS platform.

CMIS Repository

Grooper.CMIS.CmisRepository

settings_system_daydream CMIS Repository nodes provide document access in external storage platforms through a cloud CMIS Connection. With a CMIS Repository, users can manage and interact with those documents within Grooper. They are used primarily for import using Import Descendants and Import Query Results and for export using CMIS Export.

  • CMIS Repositories are create as a child node of a CMIS Connection using the "Import Repository" command.

Content Types

Grooper.Core.ContentType

Content Types are a class of node types used used to classify folder Batch Folders. They represent categories of documents (stacks Content Models and collections_bookmark Content Categories) or distinct types of documents (description Document Types). Content Types serve an important role in defining Data Elements and Behaviors that apply to a document.

Content Model

Grooper.Core.ContentType

stacks Content Model nodes define a classification taxonomy for document sets in Grooper. This taxonomy is defined by the collections_bookmark Content Categories and description Document Types they contain. Content Models serve as the root of a Content Type hierarchy, which defines Data Element inheritance and Behavior inheritance. Content Models are crucial for organizing documents for data extraction and more.

Content Category

Grooper.Core.ContentCategory

collections_bookmark A Content Category is a container for other Content Category or description Document Type nodes in a stacks Content Model. Content Categories are often used simply as organizational buckets for Content Models with large numbers of Document Types. However, Content Categories are also necessary to create branches in a Content Model's classification taxonomy, allowing for more complex Data Element inheritance and Behavior inheritance.

Document Type

Grooper.Core.DocumentType

description Document Type nodes represent a distinct type of document, such as an invoice or a contract. Document Types are created as child nodes of a stacks Content Model or a collections_bookmark Content Category. They serve three primary purposes:

  1. They are used to classify documents. Documents are considered "classified" when the folder Batch Folder is assigned a Content Type (most typically, a Document Type).
  2. The Document Type's data_table Data Model defines the Data Elements extracted by the Extract activity (including any Data Elements inherited from parent Content Types).
  3. The Document Type defines all "Behaviors" that apply (whether from the Document Type's Behavior settings or those inherited from a parent Content Type).

Form Type

Grooper.Core.FormType

two_pager Form Types represent trained variations of a description Document Type. These nodes store machine learning training data for Lexical and Visual document classification methods.

Page Type

Grooper.Core.PageType

article Page Types represent individual pages of a two_pager Form Type. These nodes store page-level machine learning training data for Lexical and Visual document classification methods. Page Types are used by ESP Auto Separation to make document separation decisions based on page classification.

Control Sheet

Grooper.Capture.ControlSheet

document_scanner Control Sheets are printable pages used to automate document separation at scan time. Control Sheets are placed before each new document before loading pages into the scanner. Then, when pages are scanned using the Scan Viewer and Control Sheet Separation is executed, a new folder Batch Folder is created for every Control Sheet scanned. Control Sheets can also be configured to assign the Batch Folder a description Document Type, thus classifying the document at scan time as well.

Data Connection

Grooper.Core.DataConnection

database Data Connections connect Grooper to Microsoft SQL and supported ODBC databases. Once configured, Data Connections can be used to export data extracted from a document to a database, perform database lookups to validate data Grooper collects and other actions related to database management systems (DBMS).

  • Grooper supports MS SQL Server connectivity with the "SQL Server" connection method.
  • Grooper supports Oracle, PostgreSQL, Db2, and MySQL connectivity with the "ODBC" connection method.

Data Elements

Grooper.Core.DataElement

Data Elements are a class of node types used to collect data from a document. These include: data_table Data Models, insert_page_break Data Sections, variables Data Fields, table Data Tables, and view_column Data Columns.

Data Model

Grooper.Core.DataModel

data_table Data Models are leveraged during the Extract activity to collect data from documents (folder Batch Folders). Data Models are the root of a Data Element hierarchy. The Data Model and its child Data Elements define a schema for data present on a document. The Data Model's configuration (and its child Data Elements' configuration) define data extraction logic and settings for how data is reviewed in a Data Viewer.

Data Field

Grooper.Core.DataField

variables Data Fields represent a single value targeted for data extraction on a document. Data Fields are created as child nodes of a data_table Data Model and/or insert_page_break Data Sections.

  • Data Fields are frequently referred to simply as "fields".

Data Section

Grooper.Core.DataSection

A insert_page_break Data Section is a container for Data Elements in a data_table Data Model. variables They can contain Data Fields, table Data Tables, and even Data Sections as child nodes and add hierarchy to a Data Model. They serve two main purposes:

  1. They can simply act as organizational buckets for Data Elements in larger Data Models.
  2. By configuring its "Extract Method", a Data Section can subdivide larger and more complex documents into smaller parts to assist in extraction.
    • "Single Instance" sections define a division (or "record") that appears only once on a document.
    • "Multi-Instance" sections define collection of repeating divisions (or "records").

Data Table

Grooper.Core.DataTable

A table Data Table is a Data Element specialized in extracting tabular data from documents (i.e. data formatted in rows and columns).

  • The Data Table itself defines the "Table Extract Method". This is configured to determine the logic used to locate and return the table's rows.
  • The table's columns are defined by adding view_column Data Column nodes to the Data Table (as its children).

Data Column

Grooper.Core.DataColumn

view_column Data Columns represent columns in a table extracted from a document. They are added as child nodes of a table Data Table. They define the type of data each column holds along with its data extraction properties.

  • Data Columns are frequently referred to simply as "columns".
  • In the context of reviewing data in a Data Viewer, a single Data Column instance in a single Data Table row, is most frequently called a "cell".

Data Field Container and Data Element Container

Grooper.Core.DataFieldContainer
Grooper.Core.DataElementContainer

Data Field Container and Data Element Container are two base types in Grooper from which "container" Data Elements are derived. Container Data Elements (data_table Data Models, insert_page_break, Data Sections table Data Tables) serve an important function in organizing and defining behavior and extraction logic for the variables Data Fields and view_column Data Columns they contain.

  • While "Data Field Container" and "Data Element Container" are distinct classes in the Grooper Object Model, they are closely related. While Grooper scripters/experts should know the difference, for most practical purposes, the terms are used interchangeably (or they're just called "containers" or "container elements"). See Object Model info for more.

Data Rule

Grooper.Core.DataRule

flowsheet Data Rules are used to normalize or otherwise prepare data collected in a data_table Data Model for downstream processes. Data Rules define data manipulation logic for data extracted from documents (folder Batch Folders) to ensure data conforms to expected formats or meets certain standards.

  • Each Data Rule executes a "Data Action" which do things like computing a field's value, parse a field into other fields, perform lookups, and more.
  • Data Actions can be conditionally executed based on a Data Rule's "Trigger" expression.
  • A hierarchy of Data Rules can be created to execute multiple Data Actions and perform complex data transformation tasks.
  • Data Rules can be applied by:
    • The Apply Rules activity (must be done after data is collected by the Extract activity)
    • The Extract activity (will run after the Data Model extraction)
    • The Convert Data activity when converting document to another Document Type
    • They can be applied manually in a Data Viewer with the "Run Rule" command.

Extractor Nodes

Grooper.Core.ExtractorNode

Data Type

Grooper.Extract.DataType

pin Data Types are nodes used to extract text data from a document. Data Types have more capabilities than quick_reference_all Value Readers. Data Types can collect results from multiple extractor sources, including a locally defined extractor, child extractor nodes, and referenced extractor nodes. Data Types can also collate results using Collation Providers to combine, sift and manipulate results further.

Value Reader

Grooper.Extract.ValueReader

quick_reference_all Value Reader nodes define a single data extraction operation. Each Value Reader executes a single Value Extractor configuration. The Value Extractor determines the logic for returning data from a text-based document or page. (Example: Pattern Match is a Value Extractor that returns data using regular expressions).

  • Value Readers are can be used on their own or in conjunction with pin Data Types for more complex data extraction and collation.

Field Class

Grooper.Extract.FieldClass

input Field Classes are NLP (natural language processing) based extractor nodes. They find values based on some natural language context near that value. Values are positively or negatively associated with text-based "features" nearby by training the extractor. During extraction, the extractor collects values based on these training weightings.

  • Field Classes are most useful when attempting to find values within the flow of natural language.
  • Field Classes can be configured to distinguish values within highly structured documents, but this type of extraction is better suited to simpler "extractor nodes" like quick_reference_all Value Readers or pin Data Types.
  • Advances in large-language models (LLMs) have largely made Field Classes obsolete. LLM-based extraction methods in Grooper (such as AI Extract) can achieve similar results with nowhere near the amount of set up.

File Store

Grooper.FileStore

hard_drive File Store nodes are a key part of Grooper's "database and file store" architecture. They define a storage location where file content associated with Grooper nodes are saved. This allows processing tasks to create, store and manipulate content related to documents, images, and other "files".

  • Not every node in Grooper will have files associated with it, but if it does, those files are stored in the Windows folder location defined by the File Store node.

Folder

Grooper.Folder

Batches Folder

Grooper.Core.BatchesFolder

Projects Folder

Grooper.ProjectsFolder

Machines Folder

Grooper.MachinesFolder

Local Resources Folder

Grooper.Core.LocalResourcesFolder

IP Elements

Grooper.IP.IpElement

IP Group

Grooper.IP.IpGroup

gallery_thumbnail IP Groups are containers of image IP Steps and/or IP Groups that can be added to perm_media IP Profiles. IP Groups add hierarchy to IP Profiles. They serve two primary purposes:

  1. They can be used simply to organize IP Steps for IP Profiles with large numbers of steps.
  2. They are often used with "Should Execute Expressions" and "Next Step Expressions" to conditionality execute a sequence of IP Steps.

IP Profile

Grooper.IP.IpProfile

perm_media IP Profiles are a step-by-step list of image processing operations (IP Commands). They are used for several image processing related operations, but primarily for:

  1. Permanently enhancing an image during the Image Processing activity (usually to get rid of defects in a scanned image, such as skewing or borders).
  2. Cleaning up an image in-memory during the Recognize activity without altering the image to improve OCR accuracy.
  3. Computer vision operations that collect layout data (table line locations, OMR checkboxes, barcode value and more) utilized in data extraction.

IP Step

Grooper.IP.IpStep

image IP Steps are the basic units of an perm_media IP Profile. They define a single image processing operation, called an IP Command in Grooper.

Lexicon

Grooper.Core.Lexicon

dictionary Lexicons are dictionaries used throughout Grooper to store lists of words, phrases, weightings for Fuzzy RegEx, and more. Users can add entries to a Lexicon, Lexicons can import entries from other Lexicons by referencing them, and entries can be dynamically imported from a database using a database Data Connection. Lexicons are commonly used to aid in data extraction, with the "List Match" and "Word Match" extractors utilizing them most commonly.

Machine

Grooper.Machine

computer Machine nodes represent servers that have connected to the Grooper Repository. They are essential for distributing task processing loads across multiple servers. Grooper creates Machine nodes automatically whenever a server makes a new connection to a Grooper Repository's database. Once added, Machine nodes can be used to view server information and to manage Grooper Service instances.

OCR Profile

Grooper.OCR.OcrProfile

library_books OCR Profiles store configuration settings for optical character recognition (OCR). They are used by the Recognize activity to convert images of text on contract Batch Pages into machine-encoded text. OCR Profiles are highly configurable, allowing fine-grained control over how OCR occurs, how pre-OCR image cleanup occurs, and how Grooper's OCR Synthesis occurs. All this works to the end goal of highly accurate OCR text data, which is used to classify documents, extract data and more.

Object Library

Grooper.ObjectLibrary

extension Object Library nodes are .NET libraries that contain code files for customizing the Grooper's functionality. These libraries are used for a range of customization and integration tasks, allowing users to extend Grooper's capabilities.

Examples include:
  • Adding custom Activities that execute within Batch Processes
  • Creating custom commands available during the Review activity and in the Design page.
  • Defining custom methods that can be called from code expressions on Data Field and Batch Process Step objects.
  • Creating custom Connection Types for CMIS Connections for import/export operations from/to CMS systems.
  • Establish custom Grooper Services that perform automated background tasks at regular intervals

Project

Grooper.Project

package_2 Projects are the primary containers for configuration nodes within Grooper. The Project is where various processing objects such as stacks Content Models, settings Batch Processes, profile objects are stored. This makes resources easier to manage, easier to save, and simplifies how node references are made in a Grooper Repository.

Resource File

Grooper.ResourceFile

Resource Files are nodes you can add to a package_2 Project and store any kind of file. Each Resource File stores one file. While you can use Resource Files to store any kind of file in a Project, there are several areas in Grooper that can reference Resource Files to one end or another, including XML schema files used for Grooper's XML Schema Integration.

Root

Grooper.GrooperRoot

The Grooper database Root node is the topmost element of the Grooper Repository. All other nodes in a Grooper Repository are its children/descendants. The Grooper Root also stores several settings that apply to the Grooper Repository, including the license serial number or license service URL and Repository Options.

Scanner Profile

Grooper.Capture.ScannerProfile

scanner Scanner Profiles store configuration settings for operating a document scanner. Scanner Profiles provide users operating the Scan Viewer in the Review activity a quick way to select pre-saved scanner configurations.

Separation Profile

Grooper.Capture.SeparationProfile insert_page_break Separation Profiles store settings that determine how contract Batch Pages are separated into folder Batch Folders. Separation Profiles can be referenced in two ways:

  • In a Review activity's Scan Viewer settings to control how pages are separated in real time during scanning.
  • In a Separate activity as an alternative to configuring separation settings locally.

Work Queue

Grooper.Core.WorkQueue

Processing Queue

Grooper.Core.ThreadPool

memory Processing Queues help automate "machine performed tasks" (Those are Code Activity tasks performed by computer Machines and their Activity Processing services). Processing Queues are assigned to Batch Process Steps to distribute tasks, control the maximum processing rate, and set the "concurrency mode" (specifying if and how parallelism can occur across one or more servers).

  • Processing Queues are used to dedicate Activity Processing services with a capped number of processing threads to resource intensive activities, such as Recognize. That way, these compute hungry tasks won't gobble up all available system resources.
  • Processing Queues are also used to manage activities, such as Render, who can only have one activity instance running per machine (This is done by changing the queue's Concurrency Mode from "Maximum" to "Per Machine").
  • Processing Queues are also used to throttle Export tasks in scenarios where the export destination can only accept one document at a time.

Review Queue

Grooper.Core.ReviewQueue

person_play Review Queues help organize and filter human-performed Review activity tasks. User groups are assigned to each Review Queue, which is then set either on a settings Batch Process or a Review step. Based on a user's membership in Review Queues, this will affect how inventory_2 Batches are distributed in the Batches page and how Review tasks are distributed in the Tasks page.

Core Configuration Types

In Grooper, nodes are configured by editing their property settings. The following are configurable items that are considered a "core" part of Grooper. These objects are designed to be part of a larger configuration.

  • These "core configuration types" are found most commonly in the property settings on a node in the Grooper node tree.
  • However, they may also be configured when configuring commands or as part of a larger property configuration.

This includes:

  • Scripting/Advanced user info: These objects inherit from a base class called "Embedded Object". This is includes a large number of objects that exist as configurable properties.

Activity

Grooper.Core.BatchProcessingActivity

Grooper Activities define specific document processing operations done to a inventory_2 Batch, folder Batch Folder, or contract Batch Page. In a settings Batch Process, each edit_document Batch Process Step executes a single Activity (determined by the step's "Activity" property).

  • Batch Process Steps are frequently referred by the name of their configured Activity followed by the word "step". For example: "Classify step".

Attended Activities

Grooper.Core.AttendedActivity

Attended Activities are type of Activity in Grooper that require direct user interaction within a settings Batch Process workflow. Attended Activities are designed for steps where human review, validation or intervention is necessary (or automated processing is simply insufficient). The only current Attended Activity in Grooper is person_search Review.

Review

Grooper.Activities.Review

person_search Review is an Activity that allows user attended review of Grooper's results. This allows human operators to validate processed contract Batch Page and folder Batch Folder content using specialized user interfaces called "Viewers". Different kinds of Viewers assist users in reviewing Grooper's image processing, document classification, data extraction and operating document scanners.

Code Activities

Grooper.Core.CodeActivity

AI Dialogue

BE AWARE: AI Analysts and AI Dialogue are obsolete as of version 2025. This Activity only exists in version 2024. network_intelligence_update AI Dialogue is an Activity that executes a scripted conversation with an psychology AI Analyst and saves the resulting conversion on the document as a JSON file.

Apply Rules

Grooper.Activities.ApplyRules

flowsheet Apply Rules is an Activity that runs flowsheet Data Rules on data that has previously been extracted from documents (folder Batch Folders).

  • The Apply Rules activity will always need to run after an Extract activity runs (An Extract step must come before an Apply Rules step in the order of edit_document Batch Process Steps in a settings Batch Process).

Attach

Grooper.GPT.Attach

file_present Attach is an Activity that physically moves and nests documents within a folder Batch Folder based on attachment markers set by the attach_file_add Mark Attachments activity. It consolidates related documents—such as addenda or supporting documents—under their host documents, updating the inventory_2 Batch hierarchy for downstream processing.

Batch Transfer

Grooper.Activities.BatchTransfer

Template:BatchTransferIcon Batch Transfer is an Activity that

Burst Book

Grooper.Microform.BurstBook

auto_stories Burst Book is an Activity that

Classify

Grooper.Activities.ClassifyFolders

unknown_document Classify is an Activity that "classifies" folder Batch Folders in a inventory_2 Batch by assigning them a description Document Type.

  • Classification is key to Grooper's document processing. It affects how data is extracted from a document (during the Extract activity) and how Behaviors are applied.
  • Classification logic is controlled by a Content Model's "Classify Method". These methods include using text patterns, previously trained document examples, and Label Sets to identify documents.

Clip Frames

view_module Clip Frames is a specialized Activity for processing microfiche in Grooper. It extracts defined areas from microfiche card images, creating new image frames or layers for focused analysis or processing.

Convert Data

switch_access_2 Convert Data is an Activity that converts a document (folder Batch Folder) to another description Document Type using Data Actions to copy and convert Data Elements from the source Document Type to those in the target Document Type. Convert Data is a specialized Activity for use cases requiring a great deal of data transformation before export.

Correct

abc Correct is an Activity that performs spell correction. It can correct a folder Batch Folder's text content or specific Data Element values to resolve OCR errors, deidentify data or otherwise enhance text data.

Deduplicate

Template:DeduplicateIcon Deduplicate is an Activity that

Detect Frames

view_module Detect Frames is a specialized Activity for processing microfiche in Grooper. It locates and identifies frame lines on microfiche card images, enabling the isolation of areas within the frames for further data extraction or processing.

Detect Language

Grooper.GPT.DetectLanguage

travel_explore Detect Language is an Activity that uses a large language model (LLM) to determine the primary language (English, Spanish, French, etc.) of a document. Activities executed downstream, such as export_notes Extract, can use this information to apply language specific logic.

Execute

tv_options_edit_channels Execute is an Activity that runs one or more specified object commands. This gives access to a variety of Grooper commands in a settings Batch Process for which there is no Activity, such as the "Sort Children" command for Batch Folders or the "Expand Attachments" command for email attachments.

Export

output Export is an Activity that transfers documents and extracted information to external file systems and content management systems, completing the data processing workflow.

Extract

export_notes Extract is an Activity that retrieves information from folder Batch Folder documents, as defined by Data Elements in a data_table Data Model. This is how Grooper locates unstructured data on your documents and collects it in a structured, usable format.

Image Processing

wallpaper Image Processing is an Activity that enhances contract Batch Page images and optimizes them for better OCR text recognition and data extraction results.

Initialize Card

view_module Initialize Card is a specialized Activity for processing microfiche in Grooper. It prepares and configures microfiche card images for further processing.

Launch Process

Template:LaunchProcessIcon Launch Process is an Activity that

Mark Attachments

Grooper.GPT.MarkAttachments

attach_file_add Mark Attachments is an Activity that analyzes documents (folder Batch Folders) to determine attachment relationships using configurable rules ("Attachment Rules"). It sets attachment markers on documents—indicating whether they should be attached to neighboring Batch Folders. These markers are then used by the Attach activity to group and nest related documents.

Merge

file_save Merge is an Activity that creates a PDF, TIF, XML or ZIP file from the page and data content of a Batch Folder and saves it to that Batch Folder.

Recognize

format_letter_spacing_wide Recognize is an Activity that obtains machine-readable text from contract Batch Pages and folder Batch Folders. When properly configured with an library_booksOCR Profile, Recognize will selectively perform OCR for images and native-text extraction for digital text in PDFs. Recognize can also reference an perm_mediaIP Profile to collect "layout data" like lines, checkboxes, and barcodes. Other Activities then use this machine-readable text and layout data for document analysis and data extraction.

Redact

format_ink_highlighter Redact is an Activity that visibly obscures (or "redacts") text information on an page based on results returned from a extractor. Be aware, Redact does not alter the text data. It only alters the image.

Remove Level

account_tree Remove Level is an Activity that

Render

print Render is an Activity that converts files of various formats to PDF. It does this by digitally printing the file to PDF using the Grooper Render Printer. This normalizes electronic document content from file formats Grooper cannot read natively to PDF (which it can read natively), allowing Grooper to extract the text via the format_letter_spacing_wide Recognize Activity.

Route

alt_route Route is an Activity that

Send Mail

forward_to_inbox Send Mail is an Activity automates email notifications from Grooper based on events and conditions set by a settings Batch Process. Optionally, documents in the inventory_2 Batch may be attached to the generated email.

Separate

insert_page_break Separate is an Activity that sorts contract Batch Pages into individual folder Batch Folders. This distinguishes "loose pages" from the documents formed by those pages. Once loose pages are separated into Batch Folder documents, they can be further processed by unknown_document Classify, export_notes Extract, output Export and other Activities that need to run on the folder (i.e. document) level.

Spawn Batch

inventory_2 Spawn Batch is an Activity that

Split Pages

Multi-page PDF and TIF files come into Grooper as files attached to single folder Batch Folders. Split Pages is an Activity that creates child contract Batch Pages for each page in the PDF or TIF. This allows Grooper to process and handle these pages as individual objects.

Split Text

receipt Split Text is an Activity that

Text Transform

insert_text Text Transform is an Activity that

Train Lexicon

book_2 Train Lexicon is an Activity that

Translate

translate Translate is an Activity that

XML Transform

code_blocks XML Transform is an Activity that applies XSLT stylesheets to XML data to modify or reformat the output structure for various purposes.

Behavior

A "Behavior" is one of several features applied to a Content Type (such as a description Document Type). Behaviors affect how certain Activities and Commands are executed, based how a document (folder Batch Folder) is classified. They behave differently, according to their Document Type. This includes how they are exported (how Export behaves), if and how they are added to a document search index (how the various indexing commands behave), and if and how Label Sets are used (how Classify and Extract behave in the presence of Label Sets).

  • Each Behavior is enabled by adding it to a Content Type. They are configured in the Behaviors editor.
  • Behaviors extend to descendent Content Types, if the descendent Content Types has no Behavior configuration of its own.
    • For example, all Document Types will inherit their parent Content Model's Behaviors.
    • However, if a Document Type has its own Behavior configuration, it will be used instead.

Export Behavior

An Export Behavior defines the parameters for exporting classified folder Batch Folder content from Grooper to other systems. This includes where they are exported to (what content management system, file system, database etc), what content is exported (attached files, images, and/or data), how it is formatted (PDF, CSV, XML etc), folder pathing, file naming and data mappings (for Data Export and CMIS Export).

Import Behavior

An Import Behavior defines how data is mapped from files in an external content management system to Batch Folders created on import when using CMIS Import.

Indexing Behavior

An Indexing Behavior allows documents (folder Batch Folders) to be indexed via AI Search. Once indexed, users can search for and retrieve documents from the Search Page.

Labeling Behavior

A Labeling Behavior extends "label set" functionality to description Document Types. This allows you to collect field labels and other labels present on a document and use them in a variety of ways. This includes functionality for classification, field extraction, table extraction, and section extraction.

PDF Data Mapping

PDF Data Mapping is a Behavior that enhances PDF files generated by the Merge or Export activities with metadata, bookmarks, annotations and/or different kinds of widgets.

Text Rendering

Text Rendering is a Behavior that causes text documents (e.g. TXT files) to be interpreted and displayed as paginated documents rather than a raw text stream.

  • By default, this renders TXT files to an 8.5 by 11 inch page format, but this can be altered in the Text Rendering settings.

Classify Method

"Classify Methods" define classification logic used by stacks Content Models during the unknown_document Classify activity. Classify Methods organize document content in Grooper by assigning folder Batch Folders a description Document Type.

  • Classify Methods analyze documents (Batch Folders) to determine what kind of document it is.
  • Each Classify Methods analyzes documents according to different methodologies to organize documents accurately. This includes text-based pattern matching, computer vision, machine learning models, label sets and more.
  • Classify Methods are configured by setting and configuring a Content Model's "Classification Method" property.

GPT Embeddings

BE AWARE: GPT Embeddings is obsolete as of version 2025. The LLM Classifier and Search Classifier methods are the new and improved AI-enabled classification methods. GPT Embeddings is a Classify Method that uses an OpenAI embeddings model and trained document samples to tell one document from another.

Labelset-Based

"Labelset-Based" is a Classify Method that leverages the labels defined via a Labeling Behavior to classify folder Batch Folders.

Lexical

"Lexical" is a Classify Method that classifies folder Batch Folders based on the text content of trained document examples. This is achieved through the statistical analysis of word frequencies that identify description Document Types.

LLM Classifier

"LLM Classifier" is a Classify Method that classifies documents (folder Batch Folders) by asking a large language model (LLM) to select its description Document Type from a list.

Rules-Based

"Rules-Based" is a Classify Method that employs "rules" defined on each description Document Type to classify folder Batch Folders. Positive Extractor and Negative Extractor properties are configured for each Document Type to positively or negatively associate a Batch Folder based on predefined criteria.

  • Where the Positive and Negative Extractors will impact all Classify Method results, the Rules-Based method classifies using only these properties and nothing else.

Search Classifier

"Search Classifier" is a Classify Method that classifies documents (folder Batch Folders) by finding similar documents in a document search index. The Search Classifier method uses an embeddings model and vector similarity to give an unclassified document the same description Document Type as its closest match in the search index.

Visual

"Visual" is a Classify Method that uses image analysis instead of text data to determine the description Document Type assigned to a folder Batch Folder during classification. Instead of using text-based extractors, an "Extract Features" IP Command in an perm_media IP Profile is used to collect image-based data from a Batch Folder's image(s). This image-based data is compared against that of previously trained document examples of each Document Type to classify the Batch Folder.

IP Command

IP Commands specify an image processing (IP) operation (such as image cleanup, format conversion or feature detection) and are used to construct image IP Steps in an IP Profile. IP Commands are configured using an IP Step's Command property.

Barcode Detection

Barcode Detection is an IP Command that detects and reads barcode data. The detected barcode information is stored as part of the page's layout data.

Barcode Removal

Barcode Removal is an IP Command that detects, reads and digitally removes barcodes from an image. The detected barcode information is stored as part of the page's layout data.

Binarize

Binarize is an IP Command that converts a color or grayscale image to a bi-tonal (black and white) image using various thresholding methods.

Box Detection

Box Detection is an IP Command that detects checkboxes and determines their check state (checked or unchecked). The detected checkbox information is stored as part of the page's layout data.

Box Removal

Box Removal is an IP Command that detects checkboxes, determines their check state (checked or unchecked) and digitally removes them from an image. The detected checkbox information is stored as part of the page's layout data.

Extract Page

Extract Page is an IP Command that removes an image from a carrier image while simultaneously removing any image warping or skewing.

Line Detection

Line Detection is an IP Command that locates horizontal and vertical lines on documents. The detected line locations are stored as part of page's layout data.

Line Removal

Line Removal is an IP Command that locates and removes horizontal and vertical lines from documents. The detected line locations are stored as part of page's layout data.

Scratch Removal

Scratch Removal is an IP Command detects and removes or repairs scratches from film-based images.

Shape Detection

Shape Detection is an IP Command that locates shapes on a document that match one or more sample images. Common shapes targeted by this command are stamps, seals, logos or other graphical marks that can serve as triggers for document separation or anchors for data extraction. Shapes The detected shapes' locations are stored as part of page's layout data.

Shape Removal

Shape Removal is an IP Command detects and removes shapes from documents. Common shapes targeted by this command are stamps, seals, logos or other graphical marks that interfere with OCR and/or can serve as triggers for document separation or anchors for data extraction. The detected shapes' locations are stored as part of page's layout data.

OCR Engine

An "OCR engine" is the part of OCR software that recognizes text from images. OCR engines analyze the image's pixels to determine where text is on the page and what each character is. In Grooper, OCR engines are selected when configuring an OCR Profile's OCR Engine property.

Azure OCR

Azure OCR is an OCR Engine option for OCR Profiles that utilizes Microsoft Azure's Read API. Azure's Read engine is an AI-based text recognition software that uses a convolutional neural network (CNN) to recognize text. Compared to traditional OCR engines, it yields superior results, especially for handwritten text and poor quality images. Furthermore, Grooper supplements Azure's results with those from a traditional OCR engine in areas where traditional OCR is better than the Read engine.

Repository Option

Repository Options are optional features that affect the entire repository. These optional features enable functionality that otherwise do not work without first establishing the connections these options provide. Repository Options are added to a Grooper Repository and configured using the database Root node's Options property.

LLM Connector

LLM Connector is a Repository Option that enables large language model (LLM) powered AI features for a Grooper Repository.

AI Search

AI Search is a Repository Option that enables Grooper's document search and retrieval features in the Search page. Once enabled, Indexing Behaviors can be added to Content Types (such as stacks Content Models), which will allow users to submit documents to a search index. Once indexed, documents can be retrieved by full text and metadata searches in the Search Page.

Separation Provider

The Provider property of the Separate Activity defines the type of separation to be performed at the designated Scope.

Change in Value Separation

The Change in Value Separation Separation Provider creates a new folder and separates every time an extracted value changes from one contract Batch Page to another.

Control Sheet Separation

Control Sheet Separation is a Separation Provider that uses Grooper document_scanner Control Sheets to separate documents.

EPI Separation

The EPI Separation Separation Provider uses embedded page information ("EPI") to Separate loose pages into document folders. A Data Extractor is used to find page numbers from the text on a page and Grooper uses this information to separate the pages.

ESP Auto Separation

ESP Auto Separation is a Separation Provider used for document separation. It is unique in that it both separates and classifies documents at the same time. It uses page-level classification training examples (among other things) to determine where to insert document folders in a inventory_2 Batch.

Event-Based Separation

Event-Based Separation is a Separation Provider that Separates documents using one or more "Separation Events". Each Separation Event triggers the creation of a new folder.

Multi Separator

The Multi Separator Separation Provider performs separation using multiple Separation Providers. It allows users to create a list of any of the other Separation Providers. If the first provider on the list fails to separate a page (or, as more often is the case, a series of pages), the next one will be applied. If that fails, the next, and so on.

Pattern-Based Separation

Pattern-Based Separation is a Separation Provider that creates a new document folder every time a value returned by a defined pattern is encountered on a page.

Undo Separation

Undo Separation is a Separation Provider. Instead of putting loose contract Batch Pages into folder Batch Folders, this Separation Provider removes Batch Folders, leaving only loose pages.

Service

Grooper.ServiceInstance

Grooper Services are various executable programs that run as a Windows Service to facilitate Grooper processing. Service instances are installed, configured, started and stopped using Grooper Command Console (or in older Grooper versions, Grooper Config).

Activity Processing

Grooper.Services.ActivityProcessing

Activity Processing is a Grooper Service that executes Activities assigned to edit_document Batch Process Steps in a settings Batch Process. This allows Grooper to automate Batch Steps that do not require a human operator.

API Services

Grooper.Services.ApiServices

You can perform inventory_2 Batch processing via REST API web calls by installing API Services.

  • As of version 2025, the Grooper Web Services (GWS) web app hosts additional API endpoints. Some of these endpoints overlap with the API Services endpoints. Refer to the GWS documentation for more information on its endpoint offerings. You can locate the GWS documentation for your Grooper install at https://{webserver-name-or-domain-name}/GWS

Grooper Licensing

Grooper.Services.LicenseService

Grooper Licensing is a Grooper Service that distributes licenses to multiple workstations running Grooper applications.

Import Watcher

Grooper.Services.ImportWatcher

An Import Watcher is a Grooper Service that schedules and runs Import Jobs. It uses an Import Provider to query files in a file system or content management system that meet specified criteria according to a defined schedule (every minute, every day, only on Sundays, etc.). These files are imported into Grooper as documents (folder Batch Folders) in a new inventory_2 Batch.

  • Afterward, the imported files can be (and should be) moved, deleted, or modified to prevent repeat imports in the next polling cycle.

Indexing Service

Grooper.GPT.IndexingService

An Indexing Service is a Grooper Service that periodically polls the Grooper database to automate AI Search indexing. It checks to see if any documents in a Grooper Repository are classified as a Document Type that inherit from a Content Type configured with an Indexing Behavior. If there are any, and they need to be added, updated, or deleted to/from the search index, the Indexing Service will submit an "Indexing Job" to be picked up by an Activity Processing service.

Extraction Related Types

These are configuration objects in Grooper that relate to extracting data from documents. These objects include specialized items such as "Table Extract Methods" which pertain only to configuring Data Table nodes. These also include more general items such as Value Extractors which are used by various extractor related properties on a variety of node types in Grooper.

These "extraction related types" are always found when configuring properties of:


This includes:

  • Scripting/Advanced user info: These objects inherit from a base class called "Embedded Object". This is includes a large number of objects that exist as configurable properties.

Collation Provider

The Collation property of a pin Data Type defines the method for converting its raw results into a final result set. It is configured by selecting a Collation Provider. The Collation Provider governs how initial matches from the Data Type's extractor(s) are combined and interpreted to produce the Data Type's final output.

AND

AND is a Collation Provider option for pin Data Type extractors. AND returns results only when each of its referenced or child extractors gets at least one hit, thus acting as a logical “AND” operator across multiple extractors.

Array

Array is a Collation Provider option for pin Data Type extractors. Array matches a list of values arranged in horizontal, vertical, or text-flow order, combining instances that qualify into a single result.

Combine

Combine is a Collation Provider option for pin Data Type extractors. Combine combines instances from returned results based on a specified grouping, controlling how extractor results are assembled together for output.

Key-Value List

Key-Value List is a Collation Provider option for pin Data Type extractors. Key-Value List matches instances where a key and a list of one or more values appear together on the document, adhering to a specific layout pattern.

Key-Value Pair

Key-Value Pair is a Collation Provider option for pin Data Type extractors. Key-Value Pair matches instances where a key is paired with a value on the document in a specific layout. Note: Key-Value Pair is an older technique in Grooper. In most cases, the Labeled Value extractor is preferable to Key-Value Pair collation.

Multi-Column

Multi-Column is a Collation Provider option for pin Data Type extractors. Multi-Column combines multiple columns on a page into a single column for extraction.

Ordered Array

Ordered Array is a Collation Provider option for pin Data Type extractors. Ordered Array finds sequences of values where one result is present for each extractor, in the order they appear, according to a specified horizontal, vertical or text-flow layout.

Pattern-Based

Pattern-Based is a Collation Provider option for pin Data Type extractors. Pattern-Based uses regular expressions to sequence returned results into a final result set.

Split

Split is a Collation Provider option for pin Data Type extractors. Split separates a data instance at each match returned by the Data Type. The results are used as anchor points to "split" text into one or more smaller parts.

Fill Method

Fill Methods provide various mechanisms for populating child Data Elements of a data_table Data Model, insert_page_break Data Section or table Data Table. Fill Methods can be added to these nodes using their "Fill Methods" property and editor.

  • Fill Methods are secondary extraction operations. They populate descendant Data Elements after normal extraction when the export_notes Extract activity runs.

AI Extract

Grooper.GPT.AIExtract

AI Extract is a Fill Method that leverages a Large Language Model (LLM) to return extraction results to Data Elements in a data_table Data Model or insert_page_break Data Section. This mechanism provides powerful AI-based data extraction with minimal setup.

Fill Descendants

Grooper.GPT.FillDescendants

Fill Descendants is a Fill Method that executes any Fill Methods on child Data Elements in parallel. This has been shown to dramatically increase efficiency on larger data_table Data Models with multiple insert_page_break Data Sections using AI Extract.

Run Child Extractors

Grooper.Core.RunChildExtractors

Run Child Extractors is a Fill Method that executes extraction for a subset of child Data Elements. This allows you to selectively run extraction logic for one or more Data Elements in a data_table Data Model, insert_page_break Data Section, or table Data Table.

Section Extract Method

The Extract Method property of a insert_page_break Data Section defines a "Section Extract Method" which specifies how section instances will be identified and extracted.

Clause Detection

Clause Detection is a insert_page_break Data Section Extract Method. It leverages LLM text embedding models to compare supplied samples of text against the text of a document to return what the AI determines is the "chunk" of text that most closely resembles the supplied samples.

Nested Table

Nested Table is a insert_page_break Data Section Extract Method. This method divides a document into sections by extracting table data within those sections. This gives Grooper users a method for extracting hierarchical tables as well as dividing up a document into sections where each of those sections have the same table (or at least tabular data which can be extracted by a single table Data Table object).

Transaction Detection

Transaction Detection is a insert_page_break Data Section Extract Method. This extraction method produces section instances by detecting repeating patterns of text around the Data Section's child variables Data Fields.

Lookup Specification

A Lookup Specification defines a "lookup operation", where existing Grooper fields (called "lookup fields") are used to query an external data source, such as a database. The results of the lookup can be used to validate or populate field values (called "target fields") in Grooper. Lookup Specifications are created on "container elements" (data_table Data Models, insert_page_break Data Sections and table Data Tables) using their Lookups property. Lookups may query using all single-instance fields relative to the container element (including those defined on parent elements up to the root Data Model), but cannot be used to populate a field value on a parent of the container element.

CMIS Lookup

CMIS Lookup is a Lookup Specification that performs a lookup against a settings_system_daydream CMIS Repository via a "CMISQL query" (a specialized query language based on SQL database queries).

Database Lookup

Database Lookup is a Lookup Specification that performs a lookup against a database Data Connection via a SQL query.

GPT Lookup

PLEASE NOTE: GPT Lookup is obsolete as of version 2025. Much of its functionality was replaced by newer and better LLM-based extraction methods, such as AI Extract. If absolutely necessary, its functionality could also be replicated with a Web Service Lookup implementation. GPT Lookup is a Lookup Specification that performs a lookup using an OpenAI GPT model.

Lexicon Lookup

Lexicon Lookup is a Lookup Specification that performs a lookup against a dictionary Lexicon.

Web Service Lookup

Web Service Lookup is a Lookup Specification that looks up external data at an API endpoint by calling a web service.

XML Lookup

XML Lookup is a Lookup Specification that performs a lookup against an XML file stored as a draft Resource File in the package_2 Project. XML Lookups use XPath expressions to select XML nodes and map XML attributes or an XML element's text to Grooper fields.

Table Extract Method

A Table Extract Method defines the settings and logic for a table Data Table to perform extraction. It is set by configuring the Extract Method property of the Data Table.

Delimited Extract

The Delimited Extract Table Extract Method extracts tabular data from a delimiter-separated text file, such as a CSV file.

Fluid Layout

The Fluid Layout Table Extract Method will choose between Tabular Layout and Flow Layout configurations, depending on how labels are collected for a description Document Type.

Grid Layout

The Grid Layout Table Extract Method uses the positional location of row and column headers to interpret where a tabular grid would be around each value in a table and extract values from each cell in the interpreted grid.

Row Match

The Row Match Table Extract Method uses regular expression pattern matching to determine a tables structure based on the pattern of each row and extract cell data from each column.

Tabular Layout

The Tabular Layout Table Extract Method uses column header values determined by the view_column Data Columns Header Extractor results (or labels collected for the Data Columns when a Labeling Behavior is enabled) as well as Data Column Value Extractor results to model a table's structure and return its values.

Value Extractor

Grooper.Core.ValueExtractor

Value Extractors define an operation that reads data from the text (and sometimes visual) content of a page or document. There are over 20 unique Value Extractors, each using specialized logic to return results. Value Extractors are consumed by multiple higher-level objects in Grooper (such as Data Elements, Extractor Nodes, various Activities and more) to perform a diverse set of document processing duties.

  • Value Extractors return a list of one or more "data instances". Data instances contain both the value and its page location, which allows Grooper to highlight results in a Document Viewer.

Ask AI

Grooper.GPT.OpenAI.Chat.AskAI

Ask AI is a Value Extractor that executes a chat completion using a large language model (LLM), such as OpenAI's GPT models. It uses a document's text content and user-defined instructions (a question about the document) in the chat prompt. Ask AI then returns the response as the extractor's result. Ask AI is a powerful, LLM-based extraction method, that can be used anywhere in Grooper a Value Extractor is referenced. It can complete a wide array of tasks in Grooper with simple text prompts.

Detect Signature

Grooper.Extract.DetectSignature

Detect Signature is a Value Extractor that cant detect if a handwritten signature is present on a document. It detects signatures within a specified rectangular region on a document page by measuring the "fill percentage" (what percentage of pixels are filled in the region).

Field Match

Grooper.Extract.FieldMatch

Field Match is a Value Extractor that matches the value stored in a previously-extracted variables Data Field or view_column Data Column.

Find Barcode

Grooper.Extract.FindBarcode

Find Barcode is a Value Extractor that searches for and returns barcode values previously stored in a folder Batch Folder or contract Batch Page's layout data.

  • Note: Find Barcode differs slightly from Read Barcode. Read Barcode performs barcode recognition when the extractor executes. Find Barcode can only look up barcode data stored in the document or page's layout data. Find Barcode runs quicker than Read Barcode, but barcode values must have previously been collected in the Batch Process by the Image Processing or Recognize activities.

GPT Complete

Removed in version 2025

GPT Complete is a Value Extractor that leverages Open AI's GPT models to generate chat completions for inputs, returning one hit for each result choice provided by the model's response.

PLEASE NOTE: GPT Complete is a deprecated Value Extractor. It uses an outdated method to call the OpenAI API. Please use the Ask AI extractor going forward.

Highlight Zone

Grooper.Extract.HighlightZone

Highlight Zone is a Value Extractor that sets a highlight region on a document without performing any actual data extraction. This "extractor" is used to mark areas of interest or importance for Review users or for uncommon scenarios where a data instance location is needed with no actual value.

Label Match

Grooper.Extract.LabelMatch

Label Match is a Value Extractor that matches a list of one or more values using matching options defined by a Labeling Behavior. It is similar to List Match but uses shared settings defined in a Labeling Behavior for Fuzzy Matching, Vertical Wrap, and Constrained Wrap.

Labeled OMR

Grooper.Extract.LabeledOMR

Labeled OMR is a Value Extractor used to output OMR checkbox labels. It determines whether labeled checkboxes are checked or not. If checked, it outputs the label(s) or a Boolean true/false value as the result.

Labeled Value

Grooper.Extract.LabeledValue

Labeled Value is a Value Extractor that identifies and extracts a value next to a label. This is one of the most commonly used extractors to extract data from structured documents (such as a standardized form) and static values on semi-structured documents (such as the header details on an invoice).

List Match

Grooper.Extract.ListMatch

List Match is a Value Extractor designed to return values matching one or more items in a defined list. By default, the List Match extractor does not use or require regular expression, but can be configured to utilize regular expression syntax.

Ordered OMR

Grooper.Extract.OrderedOMR

Ordered OMR is a Value Extractor used to return OMR check box information. Ordered OMR returns information for multiple check boxes within a defined zone based on their order and layout. The zone may be optionally fixed on the page or anchored to a static text value (such as a label).

Pattern Match

Grooper.Extract.PatternMatch

Pattern Match is a Value Extractor that extracts values from a document that match a specified regular expression, providing data collection following a known format or pattern.

Query HTML

Grooper.Messaging.QueryHTML

Query HTML is a Value Extractor specialized for HTML documents. It uses either CSS or XPath selectors to return the inner text or an attribute of an HTML element.

Read Barcode

Grooper.Extract.ReadBarcode

Read Barcode is a Value Extractor that uses barcode recognition technology to read and extract values from barcodes found in the document content.

  • Note: Read Barcode differs slightly from Find Barcode. Read Barcode performs barcode recognition when the extractor executes. Find Barcode can only look up barcode data stored in the document or page's layout data. Find Barcode runs quicker than Read Barcode, but barcode values must have previously been collected in the Batch Process by the Image Processing or Recognize activities.

Read Metadata

Grooper.Extract.ReadMetaData

Read Metadata is a Value Extractor retrieves metadata values associated with a document. Read Metadata can return metadata from a folder Batch Folder's attachment file based on its MIME type, such as PDF, Word and Mail Message ('message/rfc822' or 'application/vnd.ms-outlook'). It can also return data using a Document Link in Grooper, such as a File System Link or a CMIS Document Link.

Read Zone

Grooper.Extract.ReadZone

Read Zone is a Value Extractor that allows you to extract text data in a rectangular region (called an "extraction zone" or just "zone") on a document. This can be a fixed zone, extracting text from the same location on a document, or a zone relative to a text value (such as a label) or a shape location on the document.

Reference

Grooper.Extract.ReferenceExtractor

Reference is a Value Extractor used to reference an Extractor Node. This allows users to create re-usable extractors and use the more complex pin Data Type and input Field Class extractors throughout Grooper.

Word Match

Grooper.Extract.WordMatch

Word Match is a Value Extractor that extracts individual words or phrases from documents. It is used for n-gram extraction. Each gram may be optionally executed against a dictionary Lexicon to ensure words and phrases only match a set vocabulary.

Zonal OMR

Grooper.Extract.ZonalOMR

Zonal OMR is a Value Extractor that reads one or more OMR checkboxes using manually-configured zones. The zone may be optionally fixed on the page or anchored to a static text value (such as a label).

BE AWARE: Zonal OMR is outdated compared to Labeled OMR and Ordered OMR. It requires the most manual setup of any OMR extractor to configure. Use this as a last resort when other OMR extractor options have been exhausted.

Import and Export Related Types

These are configuration objects in Grooper that relate to importing documents into Grooper, exporting processed content (files and data) out of Grooper, and otherwise accessing document content linked in Grooper to external file systems and content management systems.

This includes:

Please Note: Import Behavior and Export Behavior are obviously import and export related. Because their parent type is "Behavior", they are found in the Core Configuration Types portion of this Glossary.

  • Scripting/Advanced user info: These objects inherit from a base class called "Embedded Object". This is includes a large number of objects that exist as configurable properties.

CMIS Binding

CMIS Bindings are the platform connection types for cloud CMIS Connections. The CMIS Binding establishes the communication protocols used to connect Grooper with content management systems (CMS) and file systems.

CMIS Bindings use the CMIS standard as a model to define connectivity. Even when connecting to CMS platforms that are not truly CMIS systems (such as a Windows file system), Grooper normalizes connection to them as if they were. This allows Grooper to use CMIS Import and CMIS Export for all storage platforms.

  • You will commonly hear CMIS Binding referred to as a "CMIS connection type", "connection type", or just "connection", as in an "Exchange connection".

AppXtender

AppXtender is a connection option for cloud CMIS Connections. It allows Grooper to connect to the AppEnhancer (formerly ApplicationXtender) content management system for import and export operations.

Box

Box is a connection option for cloud CMIS Connections. It Grooper to the Box content management system for import and export operations.

CMIS

CMIS is a connection option for cloud CMIS Connections. It connects Grooper to a CMIS 1.0 or CMIS 1.1 server for import and export operations. This can be used to connect to CMS platforms that implement the CMIS protocol such as these.

Exchange

Exchange is a connection option for cloud CMIS Connections. It connects Grooper to Microsoft Exchange email servers (including Outlook servers) for import and export operations.

FTP

FTP is a connection option for cloud CMIS Connections. It connects Grooper to FTP directories for import and export operations.

IMAP

IMAP is a connection option for cloud CMIS Connections. It connects Grooper to email messages and folders through an IMAP email server for import and export operations.

NTFS

NTFS is a connection option for cloud CMIS Connections. It connects Grooper to files and folders in the Microsoft Windows NTFS file system for import and export operations.

OneDrive

OneDrive is a connection option for cloud CMIS Connections. It connects Grooper to Microsoft OneDrive cloud services for import and export operations.

SFTP

SFTP is a connection option for cloud CMIS Connections. It connects Grooper to SFTP directories for import and export operations.

SharePoint

SharePoint is a connection option for cloud CMIS Connections. It Grooper to Microsoft SharePoint, providing access to content stored in "document libraries" and "picture libraries" for import and export operations.

Content Link

Grooper.Core.ContentLink

Content Links define references to files or folders stored outside of Grooper, such as in a Windows folder or in a CMIS Repository.

  • Content Link has two sub-types: Document Link and Folder Link. There are 9 types of "Document Link" and only 1 type of "Folder Link". Due to this, Document Link is a more common term than "Content Link".

Document Links

Grooper.Core.DocumentLink

CMIS Document Link

Grooper.CMIS.CmisLink

File System Link

Grooper.Core.FileSystemLink

FTP Link

Grooper.Messaging.FtpLink

HTTP Link

Grooper.Messaging.HTTPLink

Mail Link

Grooper.Messaging.MailLink

PST Link

Grooper.Office.PstLink

SFTP Link

Grooper.Messaging.SftpLink

Subfile Link

Grooper.Core.SubfileLink

ZIP Link

Grooper.Messaging.FtpLink

Folder Links

Grooper.Core.FolderLink

CMIS Folder Link

Grooper.CMIS.CmisFolderLink

Export Definition

Export Behaviors are defined by adding and configuring one or more Export Definitions (See Export Definition Types or the Export Definitions section of the Export article). An Export Definition defines export parameters to external systems, such as file systems, content management repositories, databases, or mail servers.

CMIS Export

CMIS Export is an Export Definition available when configuring an Export Behavior. It exports content over a cloud CMIS Connection, allowing users to export documents and their metadata to various on-premise and cloud-based storage platforms.

Data Export

Data Export is an Export Definition available when configuring an Export Behavior. It exports extracted document data over a database Data Connection, allowing users to export data to a Microsoft SQL Server or ODBC compliant database.

Import Provider

Grooper.Core.ImportProvider

Import Providers enable Grooper to import file-based content from numerous sources, including Windows file systems, SFTP file systems, mail servers and various content management systems (CMS). An Import Provider is selected and configured when configuring "Import Jobs". Import Jobs are submitted in one of two ways:

  • By a user from the Imports page: Ad-hoc or "user directed" Import Jobs are submitted from the Imports Page, using the "Submit Import Job" button.
  • From an Import Watcher service: Automated or "scheduled" Import Jobs are submitted by an Import Watcher service according to its Poling Loop or Specific Times specification.

In both cases, an Import Provider is selected and configured using using the "Provider" property.

CMIS Import

Grooper.CMIS.CmisImportBase

CMIS Import refers to two Import Providers used to import content from settings_system_daydream CMIS Repositories: Import Descendants and Import Query Results. CMIS Imports allow users to import from various on-premise and cloud based storage platforms (including Windows folders, Outlook inboxes, Box accounts, AppEnhancer applications and more).

Import Descendants

Grooper.CMIS.ImportDescendants

Import Descendants is one of two Import Providers that use cloud CMIS Connections to import document content into Grooper. Import Descendants imports files from a settings_system_daydream CMIS Repository folder location, including any files in any sub-folders (i.e. all "descendant" files).

Import Query Results

Grooper.CMIS.ImportQueryResults

Import Query Results is one of two Import Providers that use cloud CMIS Connections to import document content into Grooper. Import Query Results imports files from a settings_system_daydream CMIS Repository that match a "CMISQL query" (a specialized query language based on SQL database queries).

File System Import

Grooper.Core.FileSystemImport

File System Import refers to a Legacy Import Provider used to import documents directly from your Windows File System into Grooper.

HTTP Import

Grooper.Messaging.HTTPImport

HTTP Import is an Import Provider used to import web-based content (web pages and files hosted on an HTTP server). HTTP Import can be used to ingest individual web pages, defined portions of a website or entire websites into Grooper.

Test Batch

Grooper.Core.TestBatchImport

"Test Batch" is a specialized Import Provider designed to facilitate the import of content from an existing inventory_2 Batch in the test environment. This provider is most commonly used for testing, development, and validation scenarios, and is not intended for production use.

  • Looking for information on "production" vs "test" Batches in Grooper? See here.

Misc Properties and Other Configuration Types

AI Generator/Generators

AI Generators create custom documents using the results of a Search Page query and a large language model (LLM). Both document content and instructions are fed to the LLM to produce a text-based file.

  • AI Generators are added and configured using an Indexing Behavior's "Generators" property and editor. They are executed from the Search Page using the "Download" command and "Download Custom" format.

CMISQL Query/CMIS Query

Grooper.CMIS.CmisQuery

A CMISQL Query (aka CMIS Query) is Grooper's way of searching for documents in CMIS Repositories. Commonly, CMISQL Queries are used by Import Query Results to import documents from a CMIS Repository. CMISQL Queries are also used by CMIS Lookup to lookup data from a CMIS Repository. CMISQL Queries are based on a subset of the SQL-92 syntax for querying databases, with some specialized extensions added to support querying CMIS sources.

  • CMISQL Queries are configured using the "CMIS Query" property found in "Import Query Results" and "CMIS Lookup".

Paragraph Marker/Paragraph Marking

Grooper.Core.ParagraphMarker

Paragraph Marking is a component of Grooper's Text Preprocessor. It enables the "Paragraph Marker", which detects paragraph boundaries and marks them by altering the normal carriage return and new line feed pairs at the end of each line. Instead of placing like breaks at the end of each line, the Paragraph Marker places them at the end of each paragraph. This produces a normalized text flow, making it easier to extract values that span lines.

  • "Paragraph Marker" is the embedded object that actually performs paragraph detection and marking in Grooper. "Paragraph Marking" is the property that enables the Paragraph Marker and allows users to configure it.

Preprocessing/Text Preprocessor

Grooper.Core.TextPreprocessor

Grooper's "Text Preprocessor" adjusts how raw text is formatted before extraction. It manipulates control characters (such as CR/LF pairs) to allow regular expression patterns to match (or ignore) structural elements, such as line breaks, paragraph boundaries and tab markers. The Text Preprocessor executes the following:

Permission Set/Permission Sets

Grooper.PermissionSet

Permission Sets define security permissions in a Grooper Repository for a user or group. This allows you to restrict user access to specified Grooper pages (such as the Design Page) and Grooper Commands.

  • "Permission Set" is the embedded object that defines security principles. They are added to a Grooper Repository and configured using the "Permission Sets" property found on the database Root node.

Quoting Method/Document Quoting

Grooper.GPT.QuotingMethod

Quoting Methods provide various mechanisms to feed "quotes" from a document to an AI model for Grooper's LLM-based features. Quoting Methods control what text is fed to the AI, allowing users to feed the AI only the necessary context needed to respond or reduce costs by reducing the amount of input tokens sent to the LLM service. Depending on which Quoting Method is selected and configured, the quote may be the entire document text, a portion of a document's text, data extracted from the document, layout data, or a combination of this data.

  • "Quoting Method" is class of embedded objects that feed quotes to an LLM. Quoting Methods are selected and configured by various items (including AI Extract) using a "Document Quoting" property.

Variable Definition

Grooper.Core.VariableDefinition

Variable Definitions define a variable with a computed value that can be called by various code expressions. Variable Definitions are added to Data Models, Data Sections and Data Tables using their "Variables" property

Used By: Data Model, Data Section, Data Table

Vertical Wrap Detection/Vertical Wrap

Vertical Wrap Detection enables simplified extraction of multi-line text segments that are stacked vertically within a document. Vertical Wrap Detection can be used by Content Types configured with a Labeling Behavior and by the List Match and Label Match Value Extractors.

  • "Vertical Wrap Detection" is the embedded object that actually performs wrap detection in Grooper. Vertical Wrap Detection is enabled and configured with the "Vertical Wrap" property found in configuration items that support it.

Properties

A property is a mechanism by which an object in Grooper is configured that affects how the object performs its function.

Alignment

"Alignment" refers to how Grooper highlights text from an AI response on a document in a Document Viewer. Alignment properties can be configured to alter how Grooper highlights results when using LLM-based extraction methods, such as AI Extract.

Confidence Multiplier and Output Confidence

Some results carry more weight than others. The Confidence Multiplier and Output Confidence properties allow you to manually adjust an extraction result's confidence.

Constrained Wrap

The Constrained Wrap property allows certain Value Extractors and the Labeling Behavior to match values which wrap from one line to the next inside a box (such as a table cell).

Content Type Filter

The Content Type Filter property restricts Activities to specific collections_bookmark Content Categories and/or description Document Types.

Import Mode

Import Mode is a configurable property for CMIS Import providers. This controls how file content is loaded into a Grooper Repository during an Import Job. This property is key to setting up a "Sparse" import in Grooper.

Output Extractor Key

The Output Extractor Key property is another weapon in the arsenal of powerful Grooper classification techniques. It allows pin Data Types to return results normalized in a way more beneficial to document classification.

Parameters

Parameters is a collection of properties used in the configuration of LLM constructs. Temperature, TopP, Presence Penalty, and Frequency Penalty are parameters that influence text generation in models. Temperature and TopP control the diversity and probability distribution of generated text, while Presence Penalty and Frequency Penalty help manage repetition by discouraging the reuse of words or phrases.

Scope

The Scope property of a edit_document Batch Process Step, as it relates to an Activity, determines at which level in a inventory_2 Batch hierarchy the Activity runs.

Secondary Types

Secondary Types allow the application of multiple Content Types to a single folder Batch Folder.

Tab Marking

Tab Marking allows you to insert tab characters into a document's text data.

Misc Features and Functionality

CSS Data Viewer Styling

CSS Data Viewer Styling refers to using CSS to custom style the Review activity's Data Viewer interface. This gives you a great deal of control over a data_table Data Model's appearance and layout during document review.

EDI Integration

EDI Integration refers to Grooper's ability to process EDI files.

Fine-Tuning for AI Extract

Fine-tuning is the process of further training a large language model (LLM) on a specific dataset to make it more specialized for a particular task or domain. This allows the model to adapt its general language understanding to better handle the unique vocabulary, style, and structure of the domain it's fine-tuned on.
In Grooper, you can easily start fine-tuning a model based on a data_table Data Model that will facilitate better extraction when using AI Extract.

Footer Rows and Footer Modes

A "Footer Row" is a row at the bottom of a table Data Table that displays sum totals for numerical view_column Data Columns. This can help Data Viewer users validate data Grooper extracts for one or more Data Columns. The Data Column's "Footer Mode" controls if a sum calculation is performed or not (and if Tabular Layout's "Capture Footer Row" creates the Footer Row if and how document data is used to capture and validate the footer value).

Label Sets

Label Sets are collections of label definitions used in Grooper to identify and extract information from documents. A label set maps document text—such as field names, headers, or column titles—to corresponding Data Field, Data Section, or Data Table elements in the Data Model. Label sets are essential for automating extraction and classification, especially in environments where document layouts and terminology may vary.

URL Endpoints for Review

Three different URL endpoints can be used to open Review tasks in the Grooper Web Client, given certain information like the Grooper Repository ID, settings Batch Process name, inventory_2 Batch Id and more. This allows Grooper users to link directly to a Batch in Review with a URL.

XML Schema Integration

XML Schema Integration refers to Grooper's ability to use XML schemas to build Data Models, extract XML documents, and more.

UI Element

A UI Element is a portion of the Grooper interface that allows users to interact with or otherwise receive information about the application.

Data Inspector

The Grooper Data Inspector is a UI Element that can be found anywhere there is a Document Viewer showing extraction results. This UI Element allows a user to inspect the Data Instance hierarchies of an extracted result.

Design Page

GrooperReview.Pages.Design.DesignPage

The Design Page is the primary user interface for Grooper configuration. It is the central workplace for Grooper designers and administrators. From the Design page, users create, test and administer nodes in a Grooper Repository.

Document Viewer

The Grooper Document Viewer is the portal to your documents. It is the UI that allows you to see a folder Batch Folder's (or a contract Batch Page's) image, text content, and more.

Node Tree

The Node Tree is the hierarchical list of Grooper node objects found in the left panel in the Design Page. It is the basis for navigation and creation in the Design Page.

Overrides

Overrides is a tab provided to allow overriding of default properties set to a Data Element.

Search Page

The Search Page allows users to leverage AI Search indexes to query indexed documents. Both full text and metadata searches are supported, with feature rich querying and filtering capabilities. Users can interact with search results in several ways. They can view documents in the Document Viewer, review documents' extracted data, create new inventory_2 Batches from the result set, submit processing jobs, start a conversation with an psychology AI Assistant and more.

Scan Viewer

The Scan Viewer is a user interface that can be added to the user-attended person_search Review step in a settings Batch Process. It is used to scan documents into inventory_2 Batches from one or more scanning workstations.

Summary Tabs

stacks Content Models and collections_bookmark Content Categories have a Summary tab where you can view "Descendant Node Types", description Document Types, and Expressions.

Other

Concepts

There are many objects and properties a user can configure in Grooper, however, gaining an understanding how, why, and when to use these objects and properties is powered by one's understanding of the underlying concepts that define what what these objects and properties are doing and why.

Activity Processing

Activity Processing is the execution of a sequence of configured tasks which are performed within a settings Batch Process to transform raw data from documents into structured and actionable information. Tasks are defined by Grooper Activities, configurated to perform document classification, extraction, or data enhancement.

CMIS+

CMIS+ is a conceptual term that refers to Grooper's connectivity architecture to external storage platforms. CMIS+ standardizes connections to a variety of content management system based on the CMIS standard. This provides a standardized setup to allow Grooper to interoperate with both CMIS compliant systems and non-CMIS systems. It further provides normalized access to document content and metadata for import (CMIS Import) and export (CMIS Export) operations.

CMIS

CMIS (Content Management Interoperability Services) is open standard allowing different content management systems to "interoperate", sharing files, folders and their metadata as well as programmatic control of the platform over the internet.

Classification

Classification is the process of identifying and organizing documents into categorical types based on their content or layout. Classification is key for efficient document management and data extraction workflows. Grooper has different methods for classifying documents. These include methods that use machine learning and text pattern recognition. In a Grooper Batch Process, the Classify Activity will assign a Content Type to a folder Batch Folder.

Code Expressions

Code Expressions (not to be confused with regular expressions) are snippets of VB.NET code that expand Grooper's core functionality.

Data Context

Data Context refers to contextual information used to extract data, such as a label that identifies the value you want to collect.

Data Extraction

Data Extraction involves identifying and capturing specific information from documents (represented by folder Batch Folders in Grooper). Extraction is performed by configurable Data Extractors, which transform unstructured or semi-structured data into a structured, usable format for processing and analysis.

Data Extractor

Data Extractor (or just "extractor") refers to all Value Extractors and Extractor Nodes. Extractors define the logic used to return data from a document's text content, including general data (such as a date) and specific data (such as an agreement date on a contract).

Data Instance

A Data Instance is an encapsulation of text data within a document returned by Grooper's extractors. Data instances are the hierarchy of text data created by Grooper's extractors.

Expressions

Expressions (not to be confused with regular expressions) are snippets of VB.NET code that expand Grooper's core functionality.

Expressions Cookbook

The "Expressions Cookbook" is a reference list for commonly used Code Expressions in Grooper.

Field Mapping

Field Mapping refers to how logical connections are made between metadata content in Grooper and an external storage platform.

Five Phases of Grooper

The "Five Phases of Grooper" is a conceptual term that seeks to build understanding of how documents are processed through Grooper.

Flow Collation

"Flow Collation" refers to the text-flow based layout option used by various Collation Providers forpin Data Type extractors.

Fuzzy RegEx

Fuzzy RegEx is Grooper's use of fuzzy logic within Value Extractors that leverage regular expressions to match patterns. Fuzzy RegEx allows extractors to overcome defects in a document's OCR results to accurately return results. Fuzzy RegEx is enabled by enabling the Fuzzy Matching property.

GPT Integration

Grooper's GPT Integration is refers to the usage of OpenAI's GPT models within Grooper to enhance the capabilities of data extractors, classification, and lookups.

Grooper Infrastructure

Grooper Infrastructure refers to the computing underpinnings of what makes up a Grooper Repository and the software that allows the Grooper platform to automate tasks and users to interface with it.

Grooper Repository

A Grooper Repository is the environment used to create, configure and execute objects in Grooper. It provides the framework to "do work" in Grooper. Fundamentally, a Grooper Repository is a connection to a database and file store location, which store the node configurations and their associated file content. The Grooper application interacts with the Grooper Repository to automate tasks and provide the Grooper user interface.

Image Processing

"Image processing", as a general term, refers to software techniques that manipulate and enhance images. Image processing removes imperfections and adjusts images to improve OCR accuracy. In Grooper, images are processed primarily by two Activities:

  • Image Processing - This Activity permanently adjusts the image using. It is primarily used to compensate for defects produced by a document scanner (like border artifacts and skewed images). It does so by applying IP Commands in an perm_media IP Profile.
  • Recognize - This Activity performs OCR. When an library_books OCR Profile references an perm_media IP Profile, the image will be processed temporarily. A temporary image is handed to the OCR engine and discarded once characters are recognized.
  • Grooper also has "computer vision" capabilities that analyze and interpret images. These capabilities are also executed during Grooper's image processing. For example, Grooper's "Line Removal" command will locate lines on an image (computer vision), remove those artifacts to improve OCR results during Recognize (image processing) and store that data for later use in Grooper (computer vision).

LINQ to Grooper Objects

LINQ is Microsoft .NET component that provides data querying capabilities to the .NET framework. In Grooper, you can use the LINQ syntax in Code Expressions to "LINQ to Grooper Objects". This allows expressions to access information from collections of data, such as from multi-instance Data Sections or Data Tables.

Layout Data

Layout Data refers to visual information Grooper certain IP Commands collect, such as lines, checkboxes, barcodes, and detected shapes. This data is stored in a "Grooper.Layout.json" file attached to contract Batch Pages. Layout data is used by certain extractors and other features that rely on the presence of that data to function.

Microfiche Processing

Microfiche Processing refers to Grooper's suite of specialized Activities and IP Commands that process microfiche documents.

Microsoft Office Integration

Grooper's Microsoft Office Integration allows the platform to easily convert Microsoft Word and Microsoft Excel files into formats that Grooper can read natively (PDF and CSV).

Mixed Classification

"Mixed Classification" refers to leveraging a Classify Method and "rules" defined on a description Document Type to overcome the shortcomings of an individual method.

OCR

OCR is stands for Optical Character Recognition. It allows text on paper documents to be digitized, in order to be searched or edited by other software applications. OCR converts typed or printed text from digital images of physical documents into machine readable, encoded text.

OCR Synthesis

OCR Synthesis refers to a suite of OCR related functionality unique to Grooper. The OCR Synthesis suite will pre-process and re-process raw results from the OCR Engine and synthesize its results into a single, more accurate OCR result.

Object Nomenclature

The Grooper Wiki's Object Nomenclature defines how Grooper users categorize and refer to different types of Node Objects in a Grooper Repository. Knowing what objects can be added to the Grooper Node Tree and how they are related is a critical part of understanding Grooper itself.

PDF Page Types

PDF pages can be one of several PDF Page Types. "Page types" describe the kind of content in a PDF page. This informs Grooper how certain Activities should process the page. For example, "single image" pages are OCR'd by the Recognize activity, where "text only" pages have their native text extracted by Recognize.

Prompt Engineering

"Prompt Engineering" is the process of designing and refining prompts to interact more effectively with large language models (LLMs) like GPT-4. The goal is to guide the model to produce desired outputs by carefully crafting the input queries.

Regular Expression

Regular Expression (or regex) is a standard syntax designed to parse text strings. This is a way of finding information in text. It is the primary method by which Grooper extracts and returns data from documents.

Separation

Separation is the process of taking an unorganized inventory_2 Batch of loose contract Batch Pages and organizing them into documents represented by folder Batch Folders in Grooper. This is done so Grooper can later assign a description Document Type to each document folder in a process known as "classification".

TF-IDF

TF-IDF stands for term frequency-inverse document frequency. It is a statistical calculation intended to reflect how important a word is to a document within a document set (or "corpus"). It is how Grooper uses machine learning for training-based document classification (via the Lexical method) and data extraction (via the input Field Class extractor).

Table Extraction

"Table Extraction" refers to Grooper's ability to extract data from cells in tables on documents. This is accomplished by configuring the table Data Table and its child view_column Data Column elements in a data_table Data Model.

Thread

A Thread is the smallest unit of processing that can be performed within an operating system. In Grooper, threads are allocated for processing by Activity Processing services.

Training-Based Approaches to Document Classification

"Training-Based Approaches to Document Classification" refers to Grooper Classify Methods that classify folder Batch Folders using document examples for each description Document Type. The Classify activity then assigns unclassified Batch Folders a Document Type based on how similar it is to the Document Type's training data.

Training Batch

The Training Batch is a special inventory_2 Batch created when training document examples using the Lexical classification method. The Training Batch service two purposes: (1) It is a Batch that holds all previously trained folder Batch Folders. Designers can go to this Batch to view these documents and copy and paste them into other Batches if needed. (2) Batch Folders in the Training Batch will be used to re-train the Content Model's classification data when the Rebuild Training command is executed.

UNC Path

UNC Path is a conceptual term that refers to UNC (Universal Naming Convention) which is a standard used in Microsoft Windows for accessing shared network folders.

Waterfall Classification

Waterfall Classification is a classification technique in Grooper that prioritizes training similarity over classification "rules" set by a description Document Type's Positive Extractor. This can be helpful in scenarios where folder Batch Folders get misclassified and simply retraining won't help.

Disambiguation

Repository

A "repository" is a general term in computer science referring to where files and/or data is stored and managed. In Grooper, the term "repository" may refer to:

Base Types

Grooper Object

Grooper.GrooperObject

Connected Object

Grooper.ConnectedObject

Database Row

Grooper.DatabaseRow

Embedded Object

Grooper.EmbeddedObject