Glossary: Difference between revisions
Dgreenwood (talk | contribs) |
|||
| (85 intermediate revisions by 3 users not shown) | |||
| Line 1: | Line 1: | ||
This glossary seeks to educate readers on various Grooper terms, objects and other entities. Glossary entries will be short paragraphs describing the topic. For each glossary entry, you will find links to a full article about the entry as well as articles on associated terms. | This glossary seeks to educate readers on various Grooper terms, objects and other entities. Glossary entries will be short paragraphs describing the topic. For each glossary entry, you will find links to a full article about the entry as well as articles on associated terms. | ||
| Line 198: | Line 196: | ||
''{{TypeName|Data Type}}'' | ''{{TypeName|Data Type}}'' | ||
<section begin="Data Type" />{{DataTypeIcon}} '''[[Data Type]]s''' are nodes used to extract text data from a document. '''Data Types''' have more capabilities than {{ValueReaderIcon}} | <section begin="Data Type" />{{DataTypeIcon}} '''[[Data Type]]s''' are nodes used to extract text data from a document. '''Data Types''' have more capabilities than {{ValueReaderIcon}} [[Value Reader]]s. Data Types can collect results from multiple extractor sources, including a locally defined extractor, child extractor nodes, and referenced extractor nodes. '''Data Types''' can also collate results using [[Collation Provider]]s to combine, sift and manipulate results further.<section end="Data Type" /> | ||
==== Value Reader ==== | ==== Value Reader ==== | ||
''{{TypeName|Value Reader}}'' | ''{{TypeName|Value Reader}}'' | ||
<section begin="Value Reader" />{{ | <section begin="Value Reader" />{{IconName|Value Reader}} [[Value Reader]] nodes define a single [[Data Extraction (Concept)|data extraction]] operation. Each Value Reader executes a single [[Value Extractor]] configuration. The Value Extractor determines the logic for returning data from a text-based document or page. (Example: [[Pattern Match]] is a Value Extractor that returns data using regular expressions). | ||
*<li class="fyi-bullet"> Value Readers are can be used on their own or in conjunction with {{IconName|Data Type}} [[Data Type]]s for more complex data extraction and collation.<section end="Value Reader" /> | |||
==== Field Class ==== | ==== Field Class ==== | ||
| Line 332: | Line 331: | ||
In Grooper, nodes are configured by editing their property settings. The following are configurable items that are considered a "core" part of Grooper. These objects are designed to be part of a larger configuration. | In Grooper, nodes are configured by editing their property settings. The following are configurable items that are considered a "core" part of Grooper. These objects are designed to be part of a larger configuration. | ||
* | * These "core configuration types" are found most commonly in the property settings on a node in the Grooper node tree. | ||
* However, they may also be configured when configuring commands or as part of a larger property configuration. | |||
* However, they | |||
This includes: | This includes: | ||
| Line 341: | Line 338: | ||
* [[#Behaviors|Behaviors]] | * [[#Behaviors|Behaviors]] | ||
* [[#Classify Method|Classify Methods]] | * [[#Classify Method|Classify Methods]] | ||
* [[#IP Command|IP Commands]] | |||
* [[#OCR Engine|OCR Engines]] | |||
* [[#Repository Option|Repository Options]] | |||
* [[#Separation Provider|Separation Providers]] | |||
* [[#Service|Services]] | |||
*<li class="fyi-bullet"> Scripting/Advanced user info: These objects inherit from a base class called "Embedded Object". This is includes a large number of objects that exist as configurable properties. | |||
== Activity == | == Activity == | ||
| Line 352: | Line 356: | ||
Attended Activities are type of [[Activity]] in Grooper that require direct user interaction within a {{BatchProcessIcon}} [[Batch Process]] workflow. Attended Activities are designed for steps where human review, validation or intervention is necessary (or automated processing is simply insufficient). The only current Attended Activity in Grooper is {{IconName|person_search}} [[Review]]. | Attended Activities are type of [[Activity]] in Grooper that require direct user interaction within a {{BatchProcessIcon}} [[Batch Process]] workflow. Attended Activities are designed for steps where human review, validation or intervention is necessary (or automated processing is simply insufficient). The only current Attended Activity in Grooper is {{IconName|person_search}} [[Review]]. | ||
<div style="padding-left: 1.5em;"> | |||
=== Review ==== | ==== Review ==== | ||
''{{TypeName|Review}}'' | ''{{TypeName|Review}}'' | ||
<section begin="Review" />{{IconName|person_search}} [[Review]] is an [[Activity]] that allows user attended review of Grooper's results. This allows human operators to validate processed {{BatchPageIcon}} '''[[Batch Page]]''' and {{BatchFolderIcon}} '''[[Batch Folder]]''' content using specialized user interfaces called "Viewers". Different kinds of Viewers assist users in reviewing Grooper's image processing, document classification, data extraction and operating document scanners.<section end="Review" /> | <section begin="Review" />{{IconName|person_search}} [[Review]] is an [[Activity]] that allows user attended review of Grooper's results. This allows human operators to validate processed {{BatchPageIcon}} '''[[Batch Page]]''' and {{BatchFolderIcon}} '''[[Batch Folder]]''' content using specialized user interfaces called "Viewers". Different kinds of Viewers assist users in reviewing Grooper's image processing, document classification, data extraction and operating document scanners.<section end="Review" /> | ||
</div> | </div> | ||
=== Code Activities === | === Code Activities === | ||
''{{TypeName|Code Activity}}'' | ''{{TypeName|Code Activity}}'' | ||
<div style="padding-left: 1.5em;"> | |||
==== AI Dialogue ==== | ==== AI Dialogue ==== | ||
<section begin="AI Dialogue" />''BE AWARE: AI Analysts and AI Dialogue are obsolete as of version 2025. This Activity only exists in version 2024.'' {{AIDialogueIcon}} [[AI Dialogue]] is an [[Activity]] that executes a scripted conversation with an {{AIAnalystIcon}} '''[[AI Analyst]]''' and saves the resulting conversion on the document as a [https://en.wikipedia.org/wiki/JSON JSON] file.<section end="AI Dialogue" /> | <section begin="AI Dialogue" />''BE AWARE: AI Analysts and AI Dialogue are obsolete as of version 2025. This Activity only exists in version 2024.'' {{AIDialogueIcon}} [[AI Dialogue]] is an [[Activity]] that executes a scripted conversation with an {{AIAnalystIcon}} '''[[AI Analyst]]''' and saves the resulting conversion on the document as a [https://en.wikipedia.org/wiki/JSON JSON] file.<section end="AI Dialogue" /> | ||
| Line 401: | Line 404: | ||
==== Correct ==== | ==== Correct ==== | ||
<section begin="Correct" />{{ | <section begin="Correct" />{{IconName|Correct}} [[Correct]] is an [[Activity]] that performs spell correction. It can correct a {{BatchFolderIcon}} '''[[Batch Folder|Batch Folder's]]''' text content or specific '''[[Data Element]]''' values to resolve OCR errors, deidentify data or otherwise enhance text data.<section end="Correct" /> | ||
==== Deduplicate ==== | ==== Deduplicate ==== | ||
| Line 481: | Line 484: | ||
==== XML Transform ==== | ==== XML Transform ==== | ||
<section begin="XML Transform" />{{ | <section begin="XML Transform" />{{IconName|XML Transform}} [[XML Transform]] is an [[Activity]] that applies [https://en.wikipedia.org/wiki/XSLT XSLT] stylesheets to XML data to modify or reformat the output structure for various purposes.<section end="XML Transform" /> | ||
</div> | </div> | ||
| Line 578: | Line 581: | ||
</div> | </div> | ||
== | == OCR Engine == | ||
<section begin=" | <section begin="OCR Engine" />An "[[OCR Engine|OCR engine]]" is the part of [[OCR]] software that recognizes text from images. OCR engines analyze the image's pixels to determine where text is on the page and what each character is. In Grooper, OCR engines are selected when configuring an '''[[OCR Profile]]'s''' OCR Engine property.<section end="OCR Engine" /> | ||
<div style="padding-left: 1.5em;"> | <div style="padding-left: 1.5em;"> | ||
=== | === Azure OCR === | ||
<section begin=" | <section begin="Azure OCR" />[[Azure OCR]] is an [[OCR Engine]] option for '''[[OCR Profile|OCR Profiles]]''' that utilizes Microsoft Azure's Read API. Azure's Read engine is an AI-based text recognition software that uses a convolutional neural network (CNN) to recognize text. Compared to traditional OCR engines, it yields superior results, especially for handwritten text and poor quality images. Furthermore, Grooper supplements Azure's results with those from a traditional OCR engine in areas where traditional OCR is better than the Read engine.<section end="Azure OCR" /> | ||
</div> | </div> | ||
== | == Repository Option == | ||
<section begin=" | <section begin="Repository Option" />[[Repository Option]]s are optional features that affect the entire repository. These optional features enable functionality that otherwise do not work without first establishing the connections these options provide. Repository Options are added to a '''Grooper Repository''' and configured using the {{GrooperRootIcon}} '''[[Root]]''' node's Options property. <section end="Repository Option" /> | ||
<div style="padding-left: 1.5em;"> | <div style="padding-left: 1.5em;"> | ||
=== LLM Connector === | |||
<section begin="LLM Connector" />[[LLM Connector]] is a [[Repository Option]] that enables large language model (LLM) powered AI features for a Grooper Repository.<section end="LLM Connector" /> | |||
=== | === AI Search === | ||
<section begin=" | <section begin="AI Search" />[[AI Search and the Search Page|AI Search]] is a [[Repository Option]] that enables Grooper's document search and retrieval features in the Search page. Once enabled, [[Indexing Behavior]]s can be added to [[Content Type]]s (such as {{IconName|Content Model}} [[Content Model]]s), which will allow users to submit documents to a search index. Once indexed, documents can be retrieved by full text and metadata searches in the [[AI Search and the Search Page|Search Page]].<section end="AI Search" /> | ||
</div> | </div> | ||
| Line 632: | Line 626: | ||
<section begin="Undo Separation" />[[Undo Separation]] is a [[Separation Provider]]. Instead of putting loose {{BatchPageIcon}} '''[[Batch Page]]s''' into {{BatchFolderIcon}} '''[[Batch Folder]]s''', this Separation Provider removes '''Batch Folders''', leaving only loose pages.<section end="Undo Separation" /> | <section begin="Undo Separation" />[[Undo Separation]] is a [[Separation Provider]]. Instead of putting loose {{BatchPageIcon}} '''[[Batch Page]]s''' into {{BatchFolderIcon}} '''[[Batch Folder]]s''', this Separation Provider removes '''Batch Folders''', leaving only loose pages.<section end="Undo Separation" /> | ||
</div> | </div> | ||
== Service == | == Service == | ||
''{{TypeName|Service Instance}}'' | ''{{TypeName|Service Instance}}'' | ||
| Line 668: | Line 663: | ||
These are configuration objects in Grooper that relate to extracting data from documents. These objects include specialized items such as "Table Extract Methods" which pertain only to configuring Data Table nodes. These also include more general items such as Value Extractors which are used by various extractor related properties on a variety of node types in Grooper. | These are configuration objects in Grooper that relate to extracting data from documents. These objects include specialized items such as "Table Extract Methods" which pertain only to configuring Data Table nodes. These also include more general items such as Value Extractors which are used by various extractor related properties on a variety of node types in Grooper. | ||
These "extraction related types" are always found when configuring properties of: | |||
* Extractor Nodes ([[Data Type]], [[Value Reader]] and [[Field Class]]) | |||
* Data Elements ([[Data Model]], [[Data Field]], [[Data Section]], [[Data Table]] and [[Data Column]]) | |||
This includes: | This includes: | ||
| Line 673: | Line 673: | ||
* [[#Colation Provider|Colation Providers]] | * [[#Colation Provider|Colation Providers]] | ||
* [[#Fill Method|Fill Methods]] | * [[#Fill Method|Fill Methods]] | ||
* [[#Lookup Specification|Lookup Specifications]] | |||
* [[#Section Extract Method|Section Extract Methods]] | * [[#Section Extract Method|Section Extract Methods]] | ||
* [[#Table Extract Method|Table Extract Methods]] | * [[#Table Extract Method|Table Extract Methods]] | ||
*<li class="fyi-bullet"> Scripting/Advanced user info: These objects inherit from a base class called "Embedded Object". This is includes a large number of objects that exist as configurable properties. | |||
== Collation Provider == | == Collation Provider == | ||
| Line 692: | Line 695: | ||
===Key-Value Pair === | ===Key-Value Pair === | ||
<section begin="Key-Value Pair" />[[Key-Value Pair]] is a [[Collation Provider]] option for {{DataTypeIcon}} '''[[Data Type]]''' extractors. Key-Value Pair matches instances where a key is paired with a value on the document in a specific layout. ''Note: Key-Value Pair is an older technique in Grooper. In most cases, the [[Labeled Value]] extractor | <section begin="Key-Value Pair" />[[Key-Value Pair]] is a [[Collation Provider]] option for {{DataTypeIcon}} '''[[Data Type]]''' extractors. Key-Value Pair matches instances where a key is paired with a value on the document in a specific layout. ''Note: Key-Value Pair is an older technique in Grooper. In most cases, the [[Labeled Value]] extractor is preferable to Key-Value Pair collation.<section end="Key-Value Pair" /> | ||
=== Multi-Column === | === Multi-Column === | ||
| Line 704: | Line 707: | ||
=== Split === | === Split === | ||
<section begin="Split" />[[Split]] is a [[Collation Provider]] option for {{DataTypeIcon}} '''[[Data Type]]''' extractors. Split separates a [[Data Instance | <section begin="Split" />[[Split]] is a [[Collation Provider]] option for {{DataTypeIcon}} '''[[Data Type]]''' extractors. Split separates a [[Data Instance|data instance]] at each match returned by the '''Data Type'''. The results are used as anchor points to "split" text into one or more smaller parts.<section end="Split" /> | ||
</div> | |||
== Fill Method == | |||
<section begin="Fill Method" />[[Fill Method]]s provide various mechanisms for populating child [[Data Element]]s of a {{IconName|Data Model}} [[Data Model]], {{IconName|Data Section}} [[Data Section]] or {{IconName|Data Table}} [[Data Table]]. Fill Methods can be added to these nodes using their "Fill Methods" property and editor. | |||
*<li class="attn-bullet"> Fill Methods are secondary extraction operations. They populate descendant Data Elements '''after''' normal extraction when the {{IconName|Extract}} [[Extract]] activity runs.<section end="Fill Method" /> | |||
<div style="padding-left: 1.5em;"> | |||
=== AI Extract === | |||
''{{TypeName|AI Extract}}'' | |||
<section begin="AI Extract" />[[AI Extract]] is a [[Fill Method]] that leverages a [https://en.wikipedia.org/wiki/Large_language_model Large Language Model (LLM)] to return extraction results to [[Data Element]]s in a {{DataModelIcon}} [[Data Model]] or {{DataSectionIcon}} [[Data Section]]. This mechanism provides powerful AI-based data extraction with minimal setup.<section end="AI Extract" /> | |||
=== Fill Descendants === | |||
''{{TypeName|Fill Descendants}}'' | |||
<section begin="Fill Descendants" />[[Fill Descendants]] is a [[Fill Method]] that executes any Fill Methods on child [[Data Element]]s in parallel. This has been shown to dramatically increase efficiency on larger {{IconName|Data Model}} [[Data Model]]s with multiple {{IconName|Data Section}} [[Data Section]]s using [[AI Extract]].<section end="Fill Descendants" /> | |||
=== Run Child Extractors === | |||
''{{TypeName|Run Child Extractors}}'' | |||
<section begin="Run Child Extractors" />[[Run Child Extractors]] is a [[Fill Method]] that executes extraction for a subset of child [[Data Element]]s. This allows you to selectively run extraction logic for one or more Data Elements in a {{IconName|Data Model}} [[Data Model]], {{IconName|Data Section}} [[Data Section]], or {{IconName|Data Table}} [[Data Table]].<section end="Run Child Extractors" /> | |||
</div> | </div> | ||
| Line 718: | Line 741: | ||
=== Transaction Detection === | === Transaction Detection === | ||
<section begin="Transaction Detection" />[[Transaction Detection]] is a {{DataSectionIcon}} '''[[Data Section]]''' Extract Method. This [[Data Extraction (Concept)|extraction]] method produces section instances by detecting repeating patterns of text around the '''Data Section's''' child {{DataFieldIcon}} '''[[Data Field]]s'''.<section end="Transaction Detection" /> | <section begin="Transaction Detection" />[[Transaction Detection]] is a {{DataSectionIcon}} '''[[Data Section]]''' Extract Method. This [[Data Extraction (Concept)|extraction]] method produces section instances by detecting repeating patterns of text around the '''Data Section's''' child {{DataFieldIcon}} '''[[Data Field]]s'''.<section end="Transaction Detection" /> | ||
</div> | |||
== Lookup Specification == | |||
<section begin="Lookup" />A [[Lookup Specification]] defines a "lookup operation", where existing Grooper fields (called "lookup fields") are used to query an external data source, such as a database. The results of the lookup can be used to validate or populate field values (called "target fields") in Grooper. Lookup Specifications are created on "container elements" ({{DataModelIcon}} '''[[Data Model]]s''', {{DataSectionIcon}} '''[[Data Section]]s''' and {{DataTableIcon}} '''[[Data Table]]s''') using their Lookups property. Lookups may query using all single-instance fields relative to the container element (including those defined on parent elements up to the root '''Data Model'''), but ''cannot'' be used to populate a field value on a parent of the container element.<section end="Lookup" /> | |||
<div style="padding-left: 1.5em;"> | |||
=== CMIS Lookup === | |||
<section begin="CMIS Lookup" />[[CMIS Lookup]] is a [[Lookup Specification]] that performs a lookup against a {{CMISRepositoryIcon}} '''[[CMIS Repository]]''' via a "[[CMIS Query|CMISQL query]]" (a specialized query language based on SQL database queries).<section end="CMIS Lookup" /> | |||
=== Database Lookup === | |||
<section begin="Database Lookup" />[[Database Lookup]] is a [[Lookup Specification]] that performs a lookup against a {{DataConnectionIcon}} '''[[Data Connection]]''' via a [https://en.wikipedia.org/wiki/SQL SQL query].<section end="Database Lookup" /> | |||
=== GPT Lookup === | |||
<section begin="GPT Lookup" />''PLEASE NOTE: GPT Lookup is obsolete as of version 2025. Much of its functionality was replaced by newer and better LLM-based extraction methods, such as [[AI Extract]]. If absolutely necessary, its functionality could also be replicated with a [[Web Service Lookup]] implementation.'' [[GPT Lookup]] is a [[Lookup Specification]] that performs a lookup using an [https://en.wikipedia.org/wiki/OpenAI OpenAI] [https://en.wikipedia.org/wiki/Generative_pre-trained_transformer GPT] model.<section end="GPT Lookup" /> | |||
=== Lexicon Lookup === | |||
<section begin="Lexicon Lookup" />[[Lexicon Lookup]] is a [[Lookup Specification]] that performs a lookup against a {{LexiconIcon}} '''[[Lexicon]]'''.<section end="Lexicon Lookup" /> | |||
=== Web Service Lookup === | |||
<section begin="Web Service Lookup" />[[Web Service Lookup]] is a [[Lookup Specification]] that looks up external data at an [https://en.wikipedia.org/wiki/API API] endpoint by calling a [https://en.wikipedia.org/wiki/Web_service web service].<section end="Web Service Lookup" /> | |||
=== XML Lookup === | |||
<section begin="XML Lookup" />[[XML Lookup]] is a [[Lookup Specification]] that performs a lookup against an XML file stored as a {{ResourceFileIcon}} [[Resource File]] in the {{ProjectIcon}} [[Project]]. XML Lookups use XPath expressions to select XML nodes and map XML attributes or an XML element's text to Grooper fields.<section end="XML Lookup" /> | |||
</div> | </div> | ||
| Line 740: | Line 785: | ||
== Value Extractor == | == Value Extractor == | ||
<section begin="Extractor | ''{{TypeName|Value Extractor}}'' | ||
<section begin="Value Extractor" />[[Value Extractor]]s define an operation that reads data from the text (and sometimes visual) content of a page or document. There are over 20 unique Value Extractors, each using specialized logic to return results. Value Extractors are consumed by multiple higher-level objects in Grooper (such as [[Data Element]]s, [[Extractor Node]]s, various [[Activity|Activities]] and more) to perform a diverse set of document processing duties. | |||
:*<li class="fyi-bullet">Value Extractors return a list of one or more "[[Data Instance|data instances]]". Data instances contain both the value and its page location, which allows Grooper to highlight results in a Document Viewer.<section end="Value Extractor" /> | |||
<div style="padding-left: 1.5em;"> | <div style="padding-left: 1.5em;"> | ||
=== Ask AI === | === Ask AI === | ||
<section begin="Ask AI" />[[Ask AI]] is | ''{{TypeName|Ask AI}}'' | ||
<section begin="Ask AI" />[[Ask AI]] is a [[Value Extractor]] that executes a chat completion using a large language model (LLM), such as OpenAI's GPT models. It uses a document's text content and user-defined instructions (a question about the document) in the chat prompt. Ask AI then returns the response as the extractor's result. Ask AI is a powerful, LLM-based extraction method, that can be used anywhere in Grooper a Value Extractor is referenced. It can complete a wide array of tasks in Grooper with simple text prompts.<section end="Ask AI" /> | |||
=== Detect Signature === | === Detect Signature === | ||
<section begin="Detect Signature" />[[Detect Signature]] is | ''{{TypeName|Detect Signature}}'' | ||
<section begin="Detect Signature" />[[Detect Signature]] is a [[Value Extractor]] that cant detect if a handwritten signature is present on a document. It detects signatures within a specified rectangular region on a document page by measuring the "fill percentage" (what percentage of pixels are filled in the region).<section end="Detect Signature" /> | |||
=== Field Match === | === Field Match === | ||
<section begin="Field Match" />[[Field Match]] is | ''{{TypeName|Field Match}}'' | ||
<section begin="Field Match" />[[Field Match]] is a [[Value Extractor]] that matches the value stored in a previously-extracted {{DataFieldIcon}} '''[[Data Field]]''' or {{DataColumnIcon}} '''[[Data Column]]'''.<section end="Field Match" /> | |||
=== Find Barcode === | === Find Barcode === | ||
''{{TypeName|Find Barcode}}'' | |||
''Note: Find Barcode differs slightly from [[Read Barcode]]. Read Barcode performs barcode recognition when the extractor executes. Find Barcode can only look up barcode data stored in the document or page's layout data. Find Barcode runs quicker than Read Barcode, but barcode values must have previously been collected in the Batch Process by the Image Processing or Recognize activities.''<section end="Find Barcode" /> | <section begin="Find Barcode" />[[Find Barcode]] is a [[Value Extractor]] that searches for and returns barcode values previously stored in a {{BatchFolderIcon}} [[Batch Folder]] or {{BatchPageIcon}} [[Batch Page]]'s [[Layout Data (Concept)|layout data]]. | ||
*<li class="fyi-bullet">''Note: Find Barcode differs slightly from [[Read Barcode]]. Read Barcode performs barcode recognition when the extractor executes. Find Barcode can only look up barcode data stored in the document or page's layout data. Find Barcode runs quicker than Read Barcode, but barcode values must have previously been collected in the Batch Process by the Image Processing or Recognize activities.''<section end="Find Barcode" /> | |||
=== GPT Complete === | === GPT Complete === | ||
<section begin="GPT Complete" /> | ''Removed in version 2025'' | ||
[[GPT Complete]] is | |||
:<span style="color: red;">PLEASE NOTE</span>: GPT Complete is a deprecated | <section begin="GPT Complete" />[[GPT Complete]] is a [[Value Extractor]] that leverages Open AI's [https://en.wikipedia.org/wiki/Generative_pre-trained_transformer GPT] models to generate chat completions for inputs, returning one hit for each result choice provided by the model's response. | ||
:<span style="color: red;">PLEASE NOTE</span>: GPT Complete is a deprecated Value Extractor. It uses an outdated method to call the [https://en.wikipedia.org/wiki/OpenAI OpenAI] API. Please use the [[Ask AI]] extractor going forward.<section end="GPT Complete" /> | |||
=== Highlight Zone === | === Highlight Zone === | ||
<section begin="Highlight Zone" />[[Highlight Zone]] is | ''{{TypeName|Highlight Zone}}'' | ||
<section begin="Highlight Zone" />[[Highlight Zone]] is a [[Value Extractor]] that sets a highlight region on a document without performing any actual [[Data Extraction (Concept)|data extraction]]. This "extractor" is used to mark areas of interest or importance for '''Review''' users or for uncommon scenarios where a [[Data Instance|data instance]] location is needed with no actual value.<section end="Highlight Zone" /> | |||
=== Label Match === | === Label Match === | ||
<section begin="Label Match" />[[Label Match]] is | ''{{TypeName|Label Match}}'' | ||
<section begin="Label Match" />[[Label Match]] is a [[Value Extractor]] that matches a list of one or more values using matching options defined by a [[Labeling Behavior]]. It is similar to [[List Match]] but uses shared settings defined in a Labeling Behavior for [[Fuzzy RegEx|Fuzzy Matching]], [[Vertical Wrap]], and [[Constrained Wrap]].<section end="Label Match" /> | |||
=== Labeled OMR === | === Labeled OMR === | ||
<section begin="Labeled OMR" />[[Labeled OMR]] is | ''{{TypeName|Labeled OMR}}'' | ||
<section begin="Labeled OMR" />[[Labeled OMR]] is a [[Value Extractor]] used to output [https://en.wikipedia.org/wiki/Optical_mark_recognition OMR] checkbox labels. It determines whether labeled checkboxes are checked or not. If checked, it outputs the label(s) or a Boolean true/false value as the result.<section end="Labeled OMR" /> | |||
=== Labeled Value === | === Labeled Value === | ||
<section begin="Labeled Value" />[[Labeled Value]] is | ''{{TypeName|Labeled Value}}'' | ||
<section begin="Labeled Value" />[[Labeled Value]] is a [[Value Extractor]] that identifies and extracts a value next to a label. This is one of the most commonly used extractors to extract data from structured documents (such as a standardized form) and static values on semi-structured documents (such as the header details on an invoice).<section end="Labeled Value" /> | |||
=== List Match === | === List Match === | ||
<section begin="List Match" />[[List Match]] is | ''{{TypeName|List Match}}'' | ||
<section begin="List Match" />[[List Match]] is a [[Value Extractor]] designed to return values matching one or more items in a defined list. By default, the List Match extractor does not use or require [https://en.wikipedia.org/wiki/Regular_expression regular expression], but can be configured to utilize regular expression syntax.<section end="List Match" /> | |||
=== Ordered OMR === | === Ordered OMR === | ||
<section begin="Ordered OMR" />[[Ordered OMR]] is | ''{{TypeName|Ordered OMR}}'' | ||
<section begin="Ordered OMR" />[[Ordered OMR]] is a [[Value Extractor]] used to return [https://en.wikipedia.org/wiki/Optical_mark_recognition OMR] check box information. Ordered OMR returns information for multiple check boxes within a defined zone based on their order and layout. The zone may be optionally fixed on the page or anchored to a static text value (such as a label).<section end="Ordered OMR" /> | |||
=== Pattern Match === | === Pattern Match === | ||
<section begin="Pattern Match" />[[Pattern Match]] is | ''{{TypeName|Pattern Match}}'' | ||
<section begin="Pattern Match" />[[Pattern Match]] is a [[Value Extractor]] that extracts values from a document that match a specified [https://en.wikipedia.org/wiki/Regular_expression regular expression], providing data collection following a known format or pattern.<section end="Pattern Match" /> | |||
=== Query HTML === | === Query HTML === | ||
<section begin="Query HTML" />[[Query HTML]] is | ''{{TypeName|Query HTML}}'' | ||
<section begin="Query HTML" />[[Query HTML]] is a [[Value Extractor]] specialized for [https://www.w3schools.com/html/ HTML] documents. It uses either [https://www.w3schools.com/css/ CSS] or [https://www.w3schools.com/xml/xpath_intro.asp XPath] selectors to return the inner text or an attribute of an HTML element.<section end="Query HTML" /> | |||
=== Read Barcode === | === Read Barcode === | ||
<section begin="Read Barcode" />[[Read Barcode]] is | ''{{TypeName|Read Barcode}}'' | ||
<section begin="Read Barcode" />[[Read Barcode]] is a [[Value Extractor]] that uses barcode recognition technology to read and extract values from barcodes found in the document content. | |||
*<li class="fyi-bullet">''Note: Read Barcode differs slightly from [[Find Barcode]]. Read Barcode performs barcode recognition when the extractor executes. Find Barcode can only look up barcode data stored in the document or page's [[Layout Data|layout data]]. Find Barcode runs quicker than Read Barcode, but barcode values must have previously been collected in the Batch Process by the Image Processing or Recognize activities.''<section end="Read Barcode" /> | |||
'' | === Read Metadata === | ||
''{{TypeName|Read Metadata}}'' | |||
<section begin="Read Meta Data" />[[Read Metadata]] is a [[Value Extractor]] retrieves metadata values associated with a document. Read Metadata can return metadata from a {{BatchFolderIcon}} '''[[Batch Folder]]'s''' attachment file based on its MIME type, such as PDF, Word and Mail Message ('message/rfc822' or 'application/vnd.ms-outlook'). It can also return data using a Document Link in Grooper, such as a File System Link or a CMIS Document Link.<section end="Read Meta Data" /> | |||
<section begin="Read Meta Data" />[[Read | |||
=== Read Zone === | === Read Zone === | ||
<section begin="Read Zone" />[[Read Zone]] is | ''{{TypeName|Read Zone}}'' | ||
<section begin="Read Zone" />[[Read Zone]] is a [[Value Extractor]] that allows you to extract text data in a rectangular region (called an "extraction zone" or just "zone") on a document. This can be a fixed zone, extracting text from the same location on a document, or a zone relative to a text value (such as a label) or a shape location on the document.<section end="Read Zone" /> | |||
=== Reference === | === Reference === | ||
<section begin="Reference" />[[Reference]] is | ''{{TypeName|Reference}}'' | ||
<section begin="Reference" />[[Reference]] is a [[Value Extractor]] used to reference an [[Extractor Node]]. This allows users to create re-usable extractors and use the more complex {{DataTypeIcon}} '''[[Data Type]]''' and {{FieldClassIcon}} '''[[Field Class]]''' extractors throughout Grooper.<section end="Reference" /> | |||
=== Word Match === | === Word Match === | ||
<section begin="Word Match" />[[Word Match]] is | ''{{TypeName|Word Match}}'' | ||
<section begin="Word Match" />[[Word Match]] is a [[Value Extractor]] that extracts individual words or phrases from documents. It is used for [https://en.wikipedia.org/wiki/N-gram n-gram] extraction. Each gram may be optionally executed against a {{LexiconIcon}} '''[[Lexicon]]''' to ensure words and phrases only match a set vocabulary.<section end="Word Match" /> | |||
=== Zonal OMR === | === Zonal OMR === | ||
<section begin="Zonal OMR" />[[Zonal OMR]] is | ''{{TypeName|Zonal OMR}}'' | ||
<section begin="Zonal OMR" />[[Zonal OMR]] is a [[Value Extractor]] that reads one or more [https://en.wikipedia.org/wiki/Optical_mark_recognition OMR] checkboxes using manually-configured zones. The zone may be optionally fixed on the page or anchored to a static text value (such as a label). | |||
BE AWARE: Zonal OMR is outdated compared to [[Labeled OMR]] and [[Ordered OMR]]. It requires the most manual setup of any OMR extractor to configure. Use this as a last resort when other OMR extractor options have been exhausted.<section end="Zonal OMR" /> | BE AWARE: Zonal OMR is outdated compared to [[Labeled OMR]] and [[Ordered OMR]]. It requires the most manual setup of any OMR extractor to configure. Use this as a last resort when other OMR extractor options have been exhausted.<section end="Zonal OMR" /> | ||
| Line 813: | Line 896: | ||
This includes: | This includes: | ||
* [[#CMIS Bindings (aka "connection types")]] | * [[#CMIS Binding|CMIS Bindings (aka "connection types")]] | ||
* [[#Content Link]] | * [[#Content Link|Content Links]] | ||
* [[#Export Definition]] | * [[#Export Definition|Export Definitions]] | ||
* [[#Import Provider]] | * [[#Import Provider|Import Providers]] | ||
''Please Note: [[Import Behavior]] and [[Export Behavior]] are obviously import and export related. Because their parent type is "Behavior", they are found in the [[#Core Configuration Types|Core Configuration Types]] portion of this Glossary.'' | |||
== CMIS | *<li class="fyi-bullet"> Scripting/Advanced user info: These objects inherit from a base class called "Embedded Object". This is includes a large number of objects that exist as configurable properties. | ||
== CMIS Binding == | |||
<section begin="CMIS Binding" />[[CMIS Binding]]s are the platform connection types for {{CMISConnectionIcon}} '''[[CMIS Connection]]s'''. The CMIS Binding establishes the communication protocols used to connect Grooper with content management systems (CMS) and file systems. | <section begin="CMIS Binding" />[[CMIS Binding]]s are the platform connection types for {{CMISConnectionIcon}} '''[[CMIS Connection]]s'''. The CMIS Binding establishes the communication protocols used to connect Grooper with content management systems (CMS) and file systems. | ||
| Line 891: | Line 978: | ||
== Export Definition == | == Export Definition == | ||
<section begin="Export Definition" />[[Export Behavior]]s are defined by adding and configuring one or more [[Export Definition]] | <section begin="Export Definition" />[[Export Behavior]]s are defined by adding and configuring one or more '''Export Definitions''' (See [[Export Definition Types]] or the [[Export#Export Definitions|Export Definitions]] section of the Export article). An Export Definition defines export parameters to external systems, such as [https://en.wikipedia.org/wiki/File_system file systems], [https://en.wikipedia.org/wiki/Content_management_system content management repositories], [https://en.wikipedia.org/wiki/Database databases], or [https://en.wikipedia.org/wiki/Message_transfer_agent mail servers].<section end="Export Definition" /> | ||
<div style="padding-left: 1.5em;"> | <div style="padding-left: 1.5em;"> | ||
=== CMIS Export === | === CMIS Export === | ||
| Line 942: | Line 1,029: | ||
</div> | </div> | ||
== Misc Configuration == | == Misc Properties and Other Configuration Types == | ||
<div style="padding-left: 1.5em;"> | |||
=== AI Generator/Generators === | |||
<section begin="AI Generator" />[[AI Generator]]s create custom documents using the results of a [[Search Page]] query and a large language model (LLM). Both document content and instructions are fed to the LLM to produce a text-based file. | |||
*<li class="fyi-bullet">AI Generators are added and configured using an [[Indexing Behavior]]'s "Generators" property and editor. They are executed from the [[Search Page]] using the "Download" command and "Download Custom" format. <section end="AI Generator" /> | |||
=== CMISQL Query/CMIS Query === | |||
''{{TypeName|CMISQL Query}}'' | |||
<section begin="CMIS Query" />A [[CMISQL Query]] (aka CMIS Query) is Grooper's way of searching for documents in [[CMIS Repository|CMIS Repositories]]. Commonly, CMISQL Queries are used by [[Import Query Results]] to import documents from a CMIS Repository. CMISQL Queries are also used by [[CMIS Lookup]] to lookup data from a CMIS Repository. CMISQL Queries are based on a subset of the SQL-92 syntax for querying databases, with some specialized extensions added to support querying CMIS sources. | |||
*<li class="fyi-bullet"> CMISQL Queries are configured using the "CMIS Query" property found in "Import Query Results" and "CMIS Lookup".<section end="CMIS Query" /> | |||
=== Paragraph Marker/Paragraph Marking === | |||
''{{TypeName|Paragraph Marker}}'' | |||
== | <section begin="Paragraph Marking" />[[Paragraph Marking]] is a component of Grooper's [[Text Preprocessor]]. It enables the "Paragraph Marker", which detects paragraph boundaries and marks them by altering the normal [https://en.wikipedia.org/wiki/Carriage_return carriage return] and [https://en.wikipedia.org/wiki/Newline new line feed] pairs at the end of each line. Instead of placing like breaks at the end of each line, the Paragraph Marker places them at the end of each paragraph. This produces a normalized text flow, making it easier to extract values that span lines. | ||
*<li class="fyi-bullet"> "Paragraph Marker" is the embedded object that actually performs paragraph detection and marking in Grooper. "Paragraph Marking" is the property that enables the Paragraph Marker and allows users to configure it.<section end="Paragraph Marking" /> | |||
=== Preprocessing/Text Preprocessor === | |||
''{{TypeName|Text Preprocessor}}'' | |||
<section begin="Text Preprocessor" />Grooper's "[[Text Preprocessor]]" adjusts how raw text is formatted before extraction. It manipulates control characters (such as CR/LF pairs) to allow regular expression patterns to match (or ignore) structural elements, such as line breaks, paragraph boundaries and tab markers. The Text Preprocessor executes the following: | |||
* [[Paragraph Marking]] | |||
* [[Tab Marking]] | |||
* [[Text Preprocessor#Vertical Tab Marking|Vertical Tab Marking]] | |||
* [[Text Preprocessor#Ignore Control Characters|Ignore Control Characters]] | |||
*<li class="fyi-bullet"> "Text Preprocessor" is the embedded object that actually performs paragraph detection and marking in Grooper. The Text Preprocessor can be enabled and configured by various items (mostly extractors such as [[Pattern Match]]) using either a "Preprocessing" or "Preprocessing Options" property.<section end="Text Preprocessor" /> | |||
=== Permission Set/Permission Sets === | |||
''{{TypeName|Permission Set}}'' | |||
<section begin="Permission Sets" />[[Permission Sets]] define security permissions in a Grooper Repository for a user or group. This allows you to restrict user access to specified Grooper pages (such as the [[Design Page]]) and Grooper [[Command]]s. | |||
*<li class="fyi-bullet">"Permission Set" is the embedded object that defines security principles. They are added to a Grooper Repository and configured using the "Permission Sets" property found on the {{IconName|Root}} Root node.<section end="Permission Sets" /> | |||
=== Quoting Method/Document Quoting === | |||
''{{TypeName|Quoting Method}}'' | |||
<section begin="Quoting Method" />[[Quoting Method]]s provide various mechanisms to feed "quotes" from a document to an AI model for Grooper's LLM-based features. Quoting Methods control what text is fed to the AI, allowing users to feed the AI only the necessary context needed to respond or reduce costs by reducing the amount of input tokens sent to the LLM service. Depending on which Quoting Method is selected and configured, the quote may be the entire document text, a portion of a document's text, data extracted from the document, layout data, or a combination of this data. | |||
*<li class="fyi-bullet"> "Quoting Method" is class of embedded objects that feed quotes to an LLM. Quoting Methods are selected and configured by various items (including [[AI Extract]]) using a "Document Quoting" property.<section end="Quoting Method" /> | |||
=== Variable Definition === | |||
''{{TypeName|Variable Definition}}'' | |||
<section begin="Variable Definition" />'''[[Variable Definition]]s''' define a variable with a computed value that can be called by various code expressions. Variable Definitions are added to Data Models, Data Sections and Data Tables using their "Variables" property | |||
:'''Used By:''' [[Data Model]], [[Data Section]], [[Data Table]]<section end="Variable Definition" /> | |||
=== Vertical Wrap Detection/Vertical Wrap === | |||
<section begin="Vertical Wrap Detection" />[[Vertical Wrap Detection]] enables simplified extraction of multi-line text segments that are stacked vertically within a document. Vertical Wrap Detection can be used by Content Types configured with a [[Labeling Behavior]] and by the [[List Match]] and [[Label Match]] Value Extractors. | |||
*<li class="fyi-bullet"> "Vertical Wrap Detection" is the embedded object that actually performs wrap detection in Grooper. Vertical Wrap Detection is enabled and configured with the "Vertical Wrap" property found in configuration items that support it.<section end="Vertical Wrap Detection" /> | |||
=== Properties === | |||
<section begin="Property" />A property is a mechanism by which an object in '''Grooper''' is configured that affects how the object performs its function.<section end="Property" /> | <section begin="Property" />A property is a mechanism by which an object in '''Grooper''' is configured that affects how the object performs its function.<section end="Property" /> | ||
<div style="padding-left: 1.5em;"> | <div style="padding-left: 1.5em;"> | ||
=== Alignment === | ==== Alignment ==== | ||
<section begin="Alignment" />"[[Alignment]]" refers to how Grooper highlights text from an AI response on a document in a [[Document Viewer]]. Alignment properties can be configured to alter how Grooper highlights results when using LLM-based extraction methods, such as [[AI Extract]].<section end="Alignment" /> | <section begin="Alignment" />"[[Alignment]]" refers to how Grooper highlights text from an AI response on a document in a [[Document Viewer]]. Alignment properties can be configured to alter how Grooper highlights results when using LLM-based extraction methods, such as [[AI Extract]].<section end="Alignment" /> | ||
=== Confidence Multiplier and Output Confidence === | ==== Confidence Multiplier and Output Confidence ==== | ||
<section begin="Confidence Multiplier and Output Confidence" />Some results carry more weight than others. The [[Confidence Multiplier and Output Confidence (Property)|Confidence Multiplier]] and [[Confidence Multiplier and Output Confidence (Property)|Output Confidence]] properties allow you to manually adjust an [[Data Extraction (Concept)|extraction]] result's confidence.<section end="Confidence Multiplier and Output Confidence" /> | <section begin="Confidence Multiplier and Output Confidence" />Some results carry more weight than others. The [[Confidence Multiplier and Output Confidence (Property)|Confidence Multiplier]] and [[Confidence Multiplier and Output Confidence (Property)|Output Confidence]] properties allow you to manually adjust an [[Data Extraction (Concept)|extraction]] result's confidence.<section end="Confidence Multiplier and Output Confidence" /> | ||
=== Constrained Wrap === | ==== Constrained Wrap ==== | ||
<section begin="Constrained Wrap" />The [[Constrained Wrap]] property allows certain [[Extractor | <section begin="Constrained Wrap" />The [[Constrained Wrap]] property allows certain [[Value Extractor]]s and the [[Labeling Behavior]] to match values which wrap from one line to the next inside a box (such as a [https://en.wikipedia.org/wiki/Table_cell table cell]).<section end="Constrained Wrap" /> | ||
=== Content Type Filter === | ==== Content Type Filter ==== | ||
<section begin="Content Type Filter" />The [[Content Type Filter]] property restricts [[Activity|Activities]] to specific {{ContentCategoryIcon}} '''[[Content Category|Content Categories]]''' and/or {{DocumentTypeIcon}} '''[[Document Type]]s'''.<section end="Content Type Filter" /> | <section begin="Content Type Filter" />The [[Content Type Filter]] property restricts [[Activity|Activities]] to specific {{ContentCategoryIcon}} '''[[Content Category|Content Categories]]''' and/or {{DocumentTypeIcon}} '''[[Document Type]]s'''.<section end="Content Type Filter" /> | ||
= | ==== Import Mode ==== | ||
=== Import Mode === | |||
<section begin="Import Mode" />[[Import Mode]] is a configurable property for [[CMIS Import]] providers. This controls how file content is loaded into a [[Grooper Repository]] during an [[Import Job]]. This property is key to setting up a "Sparse" import in Grooper.<section end="Import Mode" /> | <section begin="Import Mode" />[[Import Mode]] is a configurable property for [[CMIS Import]] providers. This controls how file content is loaded into a [[Grooper Repository]] during an [[Import Job]]. This property is key to setting up a "Sparse" import in Grooper.<section end="Import Mode" /> | ||
=== Output Extractor Key === | ==== Output Extractor Key ==== | ||
<section begin="Output Extractor Key" />The [[Output Extractor Key]] property is another weapon in the arsenal of powerful '''Grooper''' [[Classification (Concept)|classification]] techniques. It allows {{DataTypeIcon}} '''[[Data Type]]s''' to return results normalized in a way more beneficial to document classification.<section end="Output Extractor Key" /> | <section begin="Output Extractor Key" />The [[Output Extractor Key]] property is another weapon in the arsenal of powerful '''Grooper''' [[Classification (Concept)|classification]] techniques. It allows {{DataTypeIcon}} '''[[Data Type]]s''' to return results normalized in a way more beneficial to document classification.<section end="Output Extractor Key" /> | ||
== | ==== Parameters ==== | ||
=== | |||
<section begin="Parameters" />[[Parameters]] is a collection of properties used in the configuration of LLM constructs. Temperature, TopP, Presence Penalty, and Frequency Penalty are parameters that influence text generation in models. Temperature and TopP control the diversity and probability distribution of generated text, while Presence Penalty and Frequency Penalty help manage repetition by discouraging the reuse of words or phrases.<section end="Parameters" /> | <section begin="Parameters" />[[Parameters]] is a collection of properties used in the configuration of LLM constructs. Temperature, TopP, Presence Penalty, and Frequency Penalty are parameters that influence text generation in models. Temperature and TopP control the diversity and probability distribution of generated text, while Presence Penalty and Frequency Penalty help manage repetition by discouraging the reuse of words or phrases.<section end="Parameters" /> | ||
== | ==== Scope ==== | ||
=== | |||
<section begin="Scope" />The [[Scope]] property of a {{BatchProcessStepIcon}} '''[[Batch Process Step]]''', as it relates to an [[Activity]], determines at which level in a {{BatchIcon}} '''[[Batch]]''' hierarchy the Activity runs.<section end="Scope" /> | <section begin="Scope" />The [[Scope]] property of a {{BatchProcessStepIcon}} '''[[Batch Process Step]]''', as it relates to an [[Activity]], determines at which level in a {{BatchIcon}} '''[[Batch]]''' hierarchy the Activity runs.<section end="Scope" /> | ||
=== Secondary Types === | ==== Secondary Types ==== | ||
<section begin="Secondary Types" />[[Secondary Types]] allow the application of multiple '''[[Content Type]]s''' to a single {{BatchFolderIcon}} '''[[Batch Folder]]'''.<section end="Secondary Types" /> | <section begin="Secondary Types" />[[Secondary Types]] allow the application of multiple '''[[Content Type]]s''' to a single {{BatchFolderIcon}} '''[[Batch Folder]]'''.<section end="Secondary Types" /> | ||
=== Tab Marking === | ==== Tab Marking ==== | ||
<section begin="Tab Marking" />[[Tab Marking]] allows you to insert [https://en.wikipedia.org/wiki/Tab_key#Tab_characters tab characters] into a document's text data.<section end="Tab Marking" /> | <section begin="Tab Marking" />[[Tab Marking]] allows you to insert [https://en.wikipedia.org/wiki/Tab_key#Tab_characters tab characters] into a document's text data.<section end="Tab Marking" /> | ||
</div> | </div> | ||
== | == Misc Features and Functionality == | ||
<div style="padding-left: 1.5em;"> | <div style="padding-left: 1.5em;"> | ||
=== | === CSS Data Viewer Styling === | ||
<section begin=" | <section begin="CSS Data Viewer Styling" />[[CSS Data Viewer Styling]] refers to using [https://en.wikipedia.org/wiki/CSS CSS] to custom style the Review activity's Data Viewer interface. This gives you a great deal of control over a {{DataModelIcon}} '''[[Data Model]]'s''' appearance and layout during document review.<section end="CSS Data Viewer Styling" /> | ||
== | === EDI Integration === | ||
<section begin=" | <section begin="EDI Integration" />[[EDI Integration]] refers to '''Grooper's''' ability to process [https://en.wikipedia.org/wiki/Electronic_data_interchange EDI] files.<section end="EDI Integration" /> | ||
=== AI | === Fine-Tuning for AI Extract === | ||
<section begin="Fine Tuning" />Fine-tuning is the process of further training a large language model (LLM) on a specific dataset to make it more specialized for a particular task or domain. This allows the model to adapt its general language understanding to better handle the unique vocabulary, style, and structure of the domain it's fine-tuned on. | |||
<br> | |||
In Grooper, you can easily start fine-tuning a model based on a {{DataModelIcon}} [[Data Model]] that will facilitate better extraction when using [[AI Extract]].<section end="Fine Tuning" /> | |||
<section begin=" | |||
=== | === Footer Rows and Footer Modes === | ||
<section begin=" | <section begin="Footer Rows and Footer Modes" />A "[[Footer Rows and Footer Modes|Footer Row]]" is a row at the bottom of a {{DataTableIcon}} [[Data Table]] that displays sum totals for numerical {{DataColumnIcon}} [[Data Column]]s. This can help [[Data Viewer]] users validate data Grooper extracts for one or more Data Columns. The Data Column's "Footer Mode" controls if a sum calculation is performed or not (and if Tabular Layout's "Capture Footer Row" creates the Footer Row if and how document data is used to capture and validate the footer value).<section end="Footer Rows and Footer Modes" /> | ||
=== | === Label Sets === | ||
<section begin=" | <section begin="Label Sets" />[[Label Sets]] are collections of label definitions used in Grooper to identify and extract information from documents. A label set maps document text—such as field names, headers, or column titles—to corresponding [[Data Field]], [[Data Section]], or [[Data Table]] elements in the [[Data Model]]. Label sets are essential for automating extraction and classification, especially in environments where document layouts and terminology may vary.<section end="Label Sets" /> | ||
=== | === URL Endpoints for Review === | ||
<section begin=" | <section begin="URL Endpoints for Review" />Three different URL endpoints can be used to open [[Review (Activity)|Review]] tasks in the '''[[Grooper Web Client]]''', given certain information like the '''Grooper''' Repository ID, {{BatchProcessIcon}} '''[[Batch Process]]''' name, {{BatchIcon}} '''[[Batch]]''' Id and more. This allows Grooper users to link directly to a '''Batch''' in Review with a URL.<section end="URL Endpoints for Review" /> | ||
=== XML Schema Integration === | === XML Schema Integration === | ||
| Line 1,033: | Line 1,145: | ||
=== Data Inspector === | === Data Inspector === | ||
<section begin="Data Inspector" />The Grooper [[Data Inspector]] is a UI Element that can be found anywhere there is a [[Document Viewer]] showing extraction results. This UI Element allows a user to inspect the [[Data Instance]] hierarchies of an extracted result.<section end="Data Inspector" /> | <section begin="Data Inspector" />The Grooper [[Data Inspector]] is a UI Element that can be found anywhere there is a [[Document Viewer]] showing extraction results. This UI Element allows a user to inspect the [[Data Instance]] hierarchies of an extracted result.<section end="Data Inspector" /> | ||
=== Design Page === | |||
''GrooperReview.Pages.Design.DesignPage'' | |||
<section begin="Design Page" />The [[Design Page]] is the primary user interface for Grooper configuration. It is the central workplace for Grooper designers and administrators. From the Design page, users create, test and administer nodes in a Grooper Repository.<section end="Design Page" /> | |||
=== Document Viewer === | === Document Viewer === | ||
<section begin="Document Viewer" />The Grooper [[Document Viewer]] is the portal to your documents. It is the UI that allows you to see a {{BatchFolderIcon}} '''[[Batch Folder]]'s''' (or a {{BatchPageIcon}} '''[[Batch Page]]'s''') image, text content, and more.<section end="Document Viewer" /> | <section begin="Document Viewer" />The Grooper [[Document Viewer]] is the portal to your documents. It is the UI that allows you to see a {{BatchFolderIcon}} '''[[Batch Folder]]'s''' (or a {{BatchPageIcon}} '''[[Batch Page]]'s''') image, text content, and more.<section end="Document Viewer" /> | ||
| Line 1,044: | Line 1,159: | ||
=== Search Page === | === Search Page === | ||
<section begin="Search Page" />The | <section begin="Search Page" />The [[AI Search and the Search Page|Search Page]] allows users to leverage [[AI Search and the Search Page|AI Search]] indexes to query indexed documents. Both full text and metadata searches are supported, with feature rich querying and filtering capabilities. Users can interact with search results in several ways. They can view documents in the [[Document Viewer]], review documents' extracted data, create new {{IconName|Batch}} [[Batch]]es from the result set, submit [[Activity Processing|processing jobs]], start a conversation with an {{IconName|AI Assistant}} [[AI Assistant]] and more.<section end="Search Page" /> | ||
=== Scan Viewer === | === Scan Viewer === | ||
| Line 1,066: | Line 1,181: | ||
=== CMIS === | === CMIS === | ||
<section begin="CMIS" />[[CMIS]] ([https://en.wikipedia.org/wiki/Content_Management_Interoperability_Services Content Management Interoperability Services]) is open standard allowing different [https://en.wikipedia.org/wiki/Content_management_system content management systems] to "interoperate", sharing files, folders and their metadata as well as programmatic control of the platform over the internet.<section end="CMIS" /> | <section begin="CMIS" />[[CMIS]] ([https://en.wikipedia.org/wiki/Content_Management_Interoperability_Services Content Management Interoperability Services]) is open standard allowing different [https://en.wikipedia.org/wiki/Content_management_system content management systems] to "interoperate", sharing files, folders and their metadata as well as programmatic control of the platform over the internet.<section end="CMIS" /> | ||
=== Classification === | === Classification === | ||
| Line 1,086: | Line 1,195: | ||
=== Data Extractor === | === Data Extractor === | ||
<section begin="Data Extractor" />[[Data Extractor]] (or just "extractor") refers to all [[Extractor | <section begin="Data Extractor" />[[Data Extractor]] (or just "extractor") refers to all [[Value Extractor]]s and [[Extractor Node]]s. Extractors define the logic used to return data from a document's text content, including general data (such as a date) and specific data (such as an agreement date on a contract).<section end="Data Extractor" /> | ||
=== Data Instance === | === Data Instance === | ||
| Line 1,107: | Line 1,216: | ||
=== Fuzzy RegEx === | === Fuzzy RegEx === | ||
<section begin="Fuzzy RegEx" />[[Fuzzy RegEx]] is '''Grooper's''' use of [https://en.wikipedia.org/wiki/Fuzzy_logic fuzzy logic] within [[Extractor | <section begin="Fuzzy RegEx" />[[Fuzzy RegEx]] is '''Grooper's''' use of [https://en.wikipedia.org/wiki/Fuzzy_logic fuzzy logic] within [[Value Extractor]]s that leverage [https://en.wikipedia.org/wiki/Regular_expression regular expressions] to match patterns. Fuzzy RegEx allows extractors to overcome defects in a document's OCR results to accurately return results. Fuzzy RegEx is enabled by enabling the Fuzzy Matching property.<section end="Fuzzy RegEx" /> | ||
=== GPT Integration === | === GPT Integration === | ||
| Line 1,181: | Line 1,290: | ||
<section begin="Waterfall Classification" />[[Waterfall Classification]] is a [[Classification (Concept)|classification]] technique in Grooper that prioritizes training similarity over classification "rules" set by a {{DocumentTypeIcon}} '''Document Type's''' Positive Extractor. This can be helpful in scenarios where {{BatchFolderIcon}} '''[[Batch Folder]]s''' get misclassified and simply retraining won't help.<section end="Waterfall Classification" /> | <section begin="Waterfall Classification" />[[Waterfall Classification]] is a [[Classification (Concept)|classification]] technique in Grooper that prioritizes training similarity over classification "rules" set by a {{DocumentTypeIcon}} '''Document Type's''' Positive Extractor. This can be helpful in scenarios where {{BatchFolderIcon}} '''[[Batch Folder]]s''' get misclassified and simply retraining won't help.<section end="Waterfall Classification" /> | ||
</div> | </div> | ||
== Disambiguation == | == Disambiguation == | ||
| Line 1,217: | Line 1,315: | ||
{{HelpLink|Embedded Object}} | {{HelpLink|Embedded Object}} | ||
Latest revision as of 09:39, 27 October 2025
This glossary seeks to educate readers on various Grooper terms, objects and other entities. Glossary entries will be short paragraphs describing the topic. For each glossary entry, you will find links to a full article about the entry as well as articles on associated terms.
Each entry is organized according to what major Grooper entity they belong to. For example, "Classify" is an "Activity". It is found in the "Activity" section of the Glossary.
Application
Grooper is an intelligent document processing platform that uses an array of sophisticated techniques to automate end-to-end content capture and delivery. From a technical standpoint, Grooper consists of a Grooper Repository and the applications that the support management and execution of configuration assets.
- A Grooper Repository consists of two things: (1) A series of tables in a SQL database (containing configuration nodes and their properties) and (2) a File Store (containing files associated to nodes in the database).
The Grooper applications are as follows:
- Grooper - The primary program files for the Grooper platform. This application will need to be installed on any Grooper web server hosting the Grooper UI and processing servers running Activity Processing services to automate task processing.
- Grooper Command Console - This is an administrative utility that gets installed with the Grooper application.
- Grooper Web Client - This application installs the Grooper user interface. It will need to be installed on the Grooper web server. The Grooper web server hosts the Grooper web app which is accessed via a URL.
- Grooper Desktop - This is a lightweight application required to scan documents using the Grooper web app. It runs in the background and helps operate the Scan Viewer in Grooper. It needs to be installed on any workstation connected to a document scanner.
Grooper Command Console
Grooper Command Console is a command-line interface that performs system configuration and administration tasks within Grooper.
Grooper Web Client
The Grooper user interface is accessed using a web browser from a URL. The Grooper Web Client is the application that installs the Grooper website on a web server.
Node Types
Grooper.GrooperNode
Nodes are the main configuration objects in Grooper. They are created and accessed in the Node Tree from the Design page. The different types of nodes ("Node Types") serve different functions in Grooper. For example, "Batch" nodes are the primary container for document content. They contain "Batch Folder" nodes which represent documents and "Batch Page" nodes which represent individual pages of documents.
AI Analyst
BE AWARE: AI Analysts are obsolete as of version 2025. See AI Assistant for the new and improved version of AI Analyst. An AI Analyst facilitates the ability to interact with a document as you might with an AI chatbot.
AI Assistant
Grooper.GPT.AIAssistant
AI Assistants are Grooper's conversational AI personas. They answer questions about resources it can access (including content from documents, databases and/or web services). This greatly increases an AI's ability to answer domain-specific questions that require access to these resources.
Batch Objects
Grooper.Core.BatchObject
Batch Objects are the foundational elements of Grooper's document processing system, providing a unified structure for organizing, processing, and reviewing document content within a inventory_2 Batch. Every item within a Batch—whether a document, folder, or page—is represented as a Batch Object (and Batches themselves are Batch Objects too).
Batch
Grooper.Core.Batch
inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.
Batch Folder
Grooper.Core.BatchFolder
The folder Batch Folder is an organizational unit within a inventory_2 Batch, allowing for a structured approach to managing and processing a collection of documents. Batch Folder nodes serve two purposes in a Batch. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other Batch Folders and/or contract Batch Page nodes as children.
- Batch Folders are frequently referred to simply as "documents" or "folders" depending on how they are used in the Batch.
Batch Page
Grooper.Core.BatchPage
contract Batch Page nodes represent individual pages within a inventory_2 Batch. Batch Pages are created in one of two ways: (1) When images are scanned into a Batch using the Scan Viewer. (2) Or, when split from a PDF or TIFF file using the Split Pages activity.
- Batch Pages are frequently referred to simply as "pages".
Batch Process
Grooper.Core.BatchProcess
settings Batch Process nodes are crucial components in Grooper's architecture. A Batch Process is the step-by-step processing instructions given to a inventory_2 Batch. Each step is comprised of a "Code Activity" or a Review activity. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.
- Batch Processes by themselves do nothing. Instead, they execute edit_document Batch Process Steps which are added as children nodes.
- A Batch Process is often referred to as simply a "process".
Batch Process Step
Grooper.Core.BatchProcessStep
edit_document Batch Process Steps are specific actions within a settings Batch Process sequence. Each Batch Process Step performs an "Activity" specific to some document processing task. These Activities will either be a "Code Activity" or "Review" activities. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.
- Batch Process Steps are frequently referred to as simply "steps".
- Because a single Batch Process Step executes a single Activity configuration, they are often referred to by their referenced Activity as well. For example, a "Recognize step".
CMIS Connection
Grooper.CMIS.CmisConnection
cloud CMIS Connections provide a standardized way of connecting to various content management systems (CMS). CMIS Connections allow Grooper to communicate with multiple external storage platforms, enabling access to documents and document metadata that reside outside of Grooper's immediate environment.
- For those that support the CMIS standard, the CMIS Connection connects to the CMS using the CMIS standard.
- For those that do not, the CMIS Connection normalizes connection and transfer protocol as if they were a CMIS platform.
CMIS Repository
Grooper.CMIS.CmisRepository
settings_system_daydream CMIS Repository nodes provide document access in external storage platforms through a cloud CMIS Connection. With a CMIS Repository, users can manage and interact with those documents within Grooper. They are used primarily for import using Import Descendants and Import Query Results and for export using CMIS Export.
- CMIS Repositories are create as a child node of a CMIS Connection using the "Import Repository" command.
Content Types
Grooper.Core.ContentType
Content Types are a class of node types used used to classify folder Batch Folders. They represent categories of documents (stacks Content Models and collections_bookmark Content Categories) or distinct types of documents (description Document Types). Content Types serve an important role in defining Data Elements and Behaviors that apply to a document.
Content Model
Grooper.Core.ContentType
stacks Content Model nodes define a classification taxonomy for document sets in Grooper. This taxonomy is defined by the collections_bookmark Content Categories and description Document Types they contain. Content Models serve as the root of a Content Type hierarchy, which defines Data Element inheritance and Behavior inheritance. Content Models are crucial for organizing documents for data extraction and more.
Content Category
Grooper.Core.ContentCategory
collections_bookmark A Content Category is a container for other Content Category or description Document Type nodes in a stacks Content Model. Content Categories are often used simply as organizational buckets for Content Models with large numbers of Document Types. However, Content Categories are also necessary to create branches in a Content Model's classification taxonomy, allowing for more complex Data Element inheritance and Behavior inheritance.
Document Type
Grooper.Core.DocumentType
description Document Type nodes represent a distinct type of document, such as an invoice or a contract. Document Types are created as child nodes of a stacks Content Model or a collections_bookmark Content Category. They serve three primary purposes:
- They are used to classify documents. Documents are considered "classified" when the folder Batch Folder is assigned a Content Type (most typically, a Document Type).
- The Document Type's data_table Data Model defines the Data Elements extracted by the Extract activity (including any Data Elements inherited from parent Content Types).
- The Document Type defines all "Behaviors" that apply (whether from the Document Type's Behavior settings or those inherited from a parent Content Type).
Form Type
Grooper.Core.FormType
two_pager Form Types represent trained variations of a description Document Type. These nodes store machine learning training data for Lexical and Visual document classification methods.
Page Type
Grooper.Core.PageType
article Page Types represent individual pages of a two_pager Form Type. These nodes store page-level machine learning training data for Lexical and Visual document classification methods. Page Types are used by ESP Auto Separation to make document separation decisions based on page classification.
Control Sheet
Grooper.Capture.ControlSheet
document_scanner Control Sheets are printable pages used to automate document separation at scan time. Control Sheets are placed before each new document before loading pages into the scanner. Then, when pages are scanned using the Scan Viewer and Control Sheet Separation is executed, a new folder Batch Folder is created for every Control Sheet scanned. Control Sheets can also be configured to assign the Batch Folder a description Document Type, thus classifying the document at scan time as well.
Data Connection
Grooper.Core.DataConnection
database Data Connections connect Grooper to Microsoft SQL and supported ODBC databases. Once configured, Data Connections can be used to export data extracted from a document to a database, perform database lookups to validate data Grooper collects and other actions related to database management systems (DBMS).
- Grooper supports MS SQL Server connectivity with the "SQL Server" connection method.
- Grooper supports Oracle, PostgreSQL, Db2, and MySQL connectivity with the "ODBC" connection method.
Data Elements
Grooper.Core.DataElement
Data Elements are a class of node types used to collect data from a document. These include: data_table Data Models, insert_page_break Data Sections, variables Data Fields, table Data Tables, and view_column Data Columns.
Data Model
Grooper.Core.DataModel
data_table Data Models are leveraged during the Extract activity to collect data from documents (folder Batch Folders). Data Models are the root of a Data Element hierarchy. The Data Model and its child Data Elements define a schema for data present on a document. The Data Model's configuration (and its child Data Elements' configuration) define data extraction logic and settings for how data is reviewed in a Data Viewer.
Data Field
Grooper.Core.DataField
variables Data Fields represent a single value targeted for data extraction on a document. Data Fields are created as child nodes of a data_table Data Model and/or insert_page_break Data Sections.
- Data Fields are frequently referred to simply as "fields".
Data Section
Grooper.Core.DataSection
A insert_page_break Data Section is a container for Data Elements in a data_table Data Model. variables They can contain Data Fields, table Data Tables, and even Data Sections as child nodes and add hierarchy to a Data Model. They serve two main purposes:
- They can simply act as organizational buckets for Data Elements in larger Data Models.
- By configuring its "Extract Method", a Data Section can subdivide larger and more complex documents into smaller parts to assist in extraction.
- "Single Instance" sections define a division (or "record") that appears only once on a document.
- "Multi-Instance" sections define collection of repeating divisions (or "records").
Data Table
Grooper.Core.DataTable
A table Data Table is a Data Element specialized in extracting tabular data from documents (i.e. data formatted in rows and columns).
- The Data Table itself defines the "Table Extract Method". This is configured to determine the logic used to locate and return the table's rows.
- The table's columns are defined by adding view_column Data Column nodes to the Data Table (as its children).
Data Column
Grooper.Core.DataColumn
view_column Data Columns represent columns in a table extracted from a document. They are added as child nodes of a table Data Table. They define the type of data each column holds along with its data extraction properties.
- Data Columns are frequently referred to simply as "columns".
- In the context of reviewing data in a Data Viewer, a single Data Column instance in a single Data Table row, is most frequently called a "cell".
Data Field Container and Data Element Container
Grooper.Core.DataFieldContainer
Grooper.Core.DataElementContainer
Data Field Container and Data Element Container are two base types in Grooper from which "container" Data Elements are derived. Container Data Elements (data_table Data Models, insert_page_break, Data Sections table Data Tables) serve an important function in organizing and defining behavior and extraction logic for the variables Data Fields and view_column Data Columns they contain.
- While "Data Field Container" and "Data Element Container" are distinct classes in the Grooper Object Model, they are closely related. While Grooper scripters/experts should know the difference, for most practical purposes, the terms are used interchangeably (or they're just called "containers" or "container elements"). See Object Model info for more.
Data Rule
Grooper.Core.DataRule
flowsheet Data Rules are used to normalize or otherwise prepare data collected in a data_table Data Model for downstream processes. Data Rules define data manipulation logic for data extracted from documents (folder Batch Folders) to ensure data conforms to expected formats or meets certain standards.
- Each Data Rule executes a "Data Action" which do things like computing a field's value, parse a field into other fields, perform lookups, and more.
- Data Actions can be conditionally executed based on a Data Rule's "Trigger" expression.
- A hierarchy of Data Rules can be created to execute multiple Data Actions and perform complex data transformation tasks.
- Data Rules can be applied by:
- The Apply Rules activity (must be done after data is collected by the Extract activity)
- The Extract activity (will run after the Data Model extraction)
- The Convert Data activity when converting document to another Document Type
- They can be applied manually in a Data Viewer with the "Run Rule" command.
Extractor Nodes
Grooper.Core.ExtractorNode
Data Type
Grooper.Extract.DataType
pin Data Types are nodes used to extract text data from a document. Data Types have more capabilities than quick_reference_all Value Readers. Data Types can collect results from multiple extractor sources, including a locally defined extractor, child extractor nodes, and referenced extractor nodes. Data Types can also collate results using Collation Providers to combine, sift and manipulate results further.
Value Reader
Grooper.Extract.ValueReader
quick_reference_all Value Reader nodes define a single data extraction operation. Each Value Reader executes a single Value Extractor configuration. The Value Extractor determines the logic for returning data from a text-based document or page. (Example: Pattern Match is a Value Extractor that returns data using regular expressions).
- Value Readers are can be used on their own or in conjunction with pin Data Types for more complex data extraction and collation.
Field Class
Grooper.Extract.FieldClass
input Field Classes are NLP (natural language processing) based extractor nodes. They find values based on some natural language context near that value. Values are positively or negatively associated with text-based "features" nearby by training the extractor. During extraction, the extractor collects values based on these training weightings.
- Field Classes are most useful when attempting to find values within the flow of natural language.
- Field Classes can be configured to distinguish values within highly structured documents, but this type of extraction is better suited to simpler "extractor nodes" like quick_reference_all Value Readers or pin Data Types.
- Advances in large-language models (LLMs) have largely made Field Classes obsolete. LLM-based extraction methods in Grooper (such as AI Extract) can achieve similar results with nowhere near the amount of set up.
File Store
Grooper.FileStore
hard_drive File Store nodes are a key part of Grooper's "database and file store" architecture. They define a storage location where file content associated with Grooper nodes are saved. This allows processing tasks to create, store and manipulate content related to documents, images, and other "files".
- Not every node in Grooper will have files associated with it, but if it does, those files are stored in the Windows folder location defined by the File Store node.
Folder
Grooper.Folder
Batches Folder
Grooper.Core.BatchesFolder
Projects Folder
Grooper.ProjectsFolder
Machines Folder
Grooper.MachinesFolder
Local Resources Folder
Grooper.Core.LocalResourcesFolder
IP Elements
Grooper.IP.IpElement
IP Group
Grooper.IP.IpGroup
gallery_thumbnail IP Groups are containers of image IP Steps and/or IP Groups that can be added to perm_media IP Profiles. IP Groups add hierarchy to IP Profiles. They serve two primary purposes:
- They can be used simply to organize IP Steps for IP Profiles with large numbers of steps.
- They are often used with "Should Execute Expressions" and "Next Step Expressions" to conditionality execute a sequence of IP Steps.
IP Profile
Grooper.IP.IpProfile
perm_media IP Profiles are a step-by-step list of image processing operations (IP Commands). They are used for several image processing related operations, but primarily for:
- Permanently enhancing an image during the Image Processing activity (usually to get rid of defects in a scanned image, such as skewing or borders).
- Cleaning up an image in-memory during the Recognize activity without altering the image to improve OCR accuracy.
- Computer vision operations that collect layout data (table line locations, OMR checkboxes, barcode value and more) utilized in data extraction.
IP Step
Grooper.IP.IpStep
image IP Steps are the basic units of an perm_media IP Profile. They define a single image processing operation, called an IP Command in Grooper.
Lexicon
Grooper.Core.Lexicon
dictionary Lexicons are dictionaries used throughout Grooper to store lists of words, phrases, weightings for Fuzzy RegEx, and more. Users can add entries to a Lexicon, Lexicons can import entries from other Lexicons by referencing them, and entries can be dynamically imported from a database using a database Data Connection. Lexicons are commonly used to aid in data extraction, with the "List Match" and "Word Match" extractors utilizing them most commonly.
Machine
Grooper.Machine
computer Machine nodes represent servers that have connected to the Grooper Repository. They are essential for distributing task processing loads across multiple servers. Grooper creates Machine nodes automatically whenever a server makes a new connection to a Grooper Repository's database. Once added, Machine nodes can be used to view server information and to manage Grooper Service instances.
OCR Profile
Grooper.OCR.OcrProfile
library_books OCR Profiles store configuration settings for optical character recognition (OCR). They are used by the Recognize activity to convert images of text on contract Batch Pages into machine-encoded text. OCR Profiles are highly configurable, allowing fine-grained control over how OCR occurs, how pre-OCR image cleanup occurs, and how Grooper's OCR Synthesis occurs. All this works to the end goal of highly accurate OCR text data, which is used to classify documents, extract data and more.
Object Library
Grooper.ObjectLibrary
extension Object Library nodes are .NET libraries that contain code files for customizing the Grooper's functionality. These libraries are used for a range of customization and integration tasks, allowing users to extend Grooper's capabilities.
- Examples include:
- Adding custom Activities that execute within Batch Processes
- Creating custom commands available during the Review activity and in the Design page.
- Defining custom methods that can be called from code expressions on Data Field and Batch Process Step objects.
- Creating custom Connection Types for CMIS Connections for import/export operations from/to CMS systems.
- Establish custom Grooper Services that perform automated background tasks at regular intervals
Project
Grooper.Project
package_2 Projects are the primary containers for configuration nodes within Grooper. The Project is where various processing objects such as stacks Content Models, settings Batch Processes, profile objects are stored. This makes resources easier to manage, easier to save, and simplifies how node references are made in a Grooper Repository.
Resource File
Grooper.ResourceFile
Resource Files are nodes you can add to a package_2 Project and store any kind of file. Each Resource File stores one file. While you can use Resource Files to store any kind of file in a Project, there are several areas in Grooper that can reference Resource Files to one end or another, including XML schema files used for Grooper's XML Schema Integration.
Root
Grooper.GrooperRoot
The Grooper database Root node is the topmost element of the Grooper Repository. All other nodes in a Grooper Repository are its children/descendants. The Grooper Root also stores several settings that apply to the Grooper Repository, including the license serial number or license service URL and Repository Options.
Scanner Profile
Grooper.Capture.ScannerProfile
scanner Scanner Profiles store configuration settings for operating a document scanner. Scanner Profiles provide users operating the Scan Viewer in the Review activity a quick way to select pre-saved scanner configurations.
Separation Profile
Grooper.Capture.SeparationProfile insert_page_break Separation Profiles store settings that determine how contract Batch Pages are separated into folder Batch Folders. Separation Profiles can be referenced in two ways:
- In a Review activity's Scan Viewer settings to control how pages are separated in real time during scanning.
- In a Separate activity as an alternative to configuring separation settings locally.
Work Queue
Grooper.Core.WorkQueue
Processing Queue
Grooper.Core.ThreadPool
memory Processing Queues help automate "machine performed tasks" (Those are Code Activity tasks performed by computer Machines and their Activity Processing services). Processing Queues are assigned to Batch Process Steps to distribute tasks, control the maximum processing rate, and set the "concurrency mode" (specifying if and how parallelism can occur across one or more servers).
- Processing Queues are used to dedicate Activity Processing services with a capped number of processing threads to resource intensive activities, such as Recognize. That way, these compute hungry tasks won't gobble up all available system resources.
- Processing Queues are also used to manage activities, such as Render, who can only have one activity instance running per machine (This is done by changing the queue's Concurrency Mode from "Maximum" to "Per Machine").
- Processing Queues are also used to throttle Export tasks in scenarios where the export destination can only accept one document at a time.
Review Queue
Grooper.Core.ReviewQueue
person_play Review Queues help organize and filter human-performed Review activity tasks. User groups are assigned to each Review Queue, which is then set either on a settings Batch Process or a Review step. Based on a user's membership in Review Queues, this will affect how inventory_2 Batches are distributed in the Batches page and how Review tasks are distributed in the Tasks page.
Core Configuration Types
In Grooper, nodes are configured by editing their property settings. The following are configurable items that are considered a "core" part of Grooper. These objects are designed to be part of a larger configuration.
- These "core configuration types" are found most commonly in the property settings on a node in the Grooper node tree.
- However, they may also be configured when configuring commands or as part of a larger property configuration.
This includes:
- Activities
- Behaviors
- Classify Methods
- IP Commands
- OCR Engines
- Repository Options
- Separation Providers
- Services
- Scripting/Advanced user info: These objects inherit from a base class called "Embedded Object". This is includes a large number of objects that exist as configurable properties.
Activity
Grooper.Core.BatchProcessingActivity
Grooper Activities define specific document processing operations done to a inventory_2 Batch, folder Batch Folder, or contract Batch Page. In a settings Batch Process, each edit_document Batch Process Step executes a single Activity (determined by the step's "Activity" property).
- Batch Process Steps are frequently referred by the name of their configured Activity followed by the word "step". For example: "Classify step".
Attended Activities
Grooper.Core.AttendedActivity
Attended Activities are type of Activity in Grooper that require direct user interaction within a settings Batch Process workflow. Attended Activities are designed for steps where human review, validation or intervention is necessary (or automated processing is simply insufficient). The only current Attended Activity in Grooper is person_search Review.
Review
Grooper.Activities.Review
person_search Review is an Activity that allows user attended review of Grooper's results. This allows human operators to validate processed contract Batch Page and folder Batch Folder content using specialized user interfaces called "Viewers". Different kinds of Viewers assist users in reviewing Grooper's image processing, document classification, data extraction and operating document scanners.
Code Activities
Grooper.Core.CodeActivity
AI Dialogue
BE AWARE: AI Analysts and AI Dialogue are obsolete as of version 2025. This Activity only exists in version 2024. network_intelligence_update AI Dialogue is an Activity that executes a scripted conversation with an psychology AI Analyst and saves the resulting conversion on the document as a JSON file.
Apply Rules
Grooper.Activities.ApplyRules
flowsheet Apply Rules is an Activity that runs flowsheet Data Rules on data that has previously been extracted from documents (folder Batch Folders).
- The Apply Rules activity will always need to run after an Extract activity runs (An Extract step must come before an Apply Rules step in the order of edit_document Batch Process Steps in a settings Batch Process).
Attach
Grooper.GPT.Attach
file_present Attach is an Activity that physically moves and nests documents within a folder Batch Folder based on attachment markers set by the attach_file_add Mark Attachments activity. It consolidates related documents—such as addenda or supporting documents—under their host documents, updating the inventory_2 Batch hierarchy for downstream processing.
Batch Transfer
Grooper.Activities.BatchTransfer
Template:BatchTransferIcon Batch Transfer is an Activity that
Burst Book
Grooper.Microform.BurstBook
auto_stories Burst Book is an Activity that
Classify
Grooper.Activities.ClassifyFolders
unknown_document Classify is an Activity that "classifies" folder Batch Folders in a inventory_2 Batch by assigning them a description Document Type.
- Classification is key to Grooper's document processing. It affects how data is extracted from a document (during the Extract activity) and how Behaviors are applied.
- Classification logic is controlled by a Content Model's "Classify Method". These methods include using text patterns, previously trained document examples, and Label Sets to identify documents.
Clip Frames
view_module Clip Frames is a specialized Activity for processing microfiche in Grooper. It extracts defined areas from microfiche card images, creating new image frames or layers for focused analysis or processing.
Convert Data
switch_access_2 Convert Data is an Activity that converts a document (folder Batch Folder) to another description Document Type using Data Actions to copy and convert Data Elements from the source Document Type to those in the target Document Type. Convert Data is a specialized Activity for use cases requiring a great deal of data transformation before export.
Correct
abc Correct is an Activity that performs spell correction. It can correct a folder Batch Folder's text content or specific Data Element values to resolve OCR errors, deidentify data or otherwise enhance text data.
Deduplicate
Template:DeduplicateIcon Deduplicate is an Activity that
Detect Frames
view_module Detect Frames is a specialized Activity for processing microfiche in Grooper. It locates and identifies frame lines on microfiche card images, enabling the isolation of areas within the frames for further data extraction or processing.
Detect Language
Grooper.GPT.DetectLanguage
travel_explore Detect Language is an Activity that uses a large language model (LLM) to determine the primary language (English, Spanish, French, etc.) of a document. Activities executed downstream, such as export_notes Extract, can use this information to apply language specific logic.
Execute
tv_options_edit_channels Execute is an Activity that runs one or more specified object commands. This gives access to a variety of Grooper commands in a settings Batch Process for which there is no Activity, such as the "Sort Children" command for Batch Folders or the "Expand Attachments" command for email attachments.
Export
output Export is an Activity that transfers documents and extracted information to external file systems and content management systems, completing the data processing workflow.
Extract
export_notes Extract is an Activity that retrieves information from folder Batch Folder documents, as defined by Data Elements in a data_table Data Model. This is how Grooper locates unstructured data on your documents and collects it in a structured, usable format.
Image Processing
wallpaper Image Processing is an Activity that enhances contract Batch Page images and optimizes them for better OCR text recognition and data extraction results.
Initialize Card
view_module Initialize Card is a specialized Activity for processing microfiche in Grooper. It prepares and configures microfiche card images for further processing.
Launch Process
Template:LaunchProcessIcon Launch Process is an Activity that
Mark Attachments
Grooper.GPT.MarkAttachments
attach_file_add Mark Attachments is an Activity that analyzes documents (folder Batch Folders) to determine attachment relationships using configurable rules ("Attachment Rules"). It sets attachment markers on documents—indicating whether they should be attached to neighboring Batch Folders. These markers are then used by the Attach activity to group and nest related documents.
Merge
file_save Merge is an Activity that creates a PDF, TIF, XML or ZIP file from the page and data content of a Batch Folder and saves it to that Batch Folder.
Recognize
format_letter_spacing_wide Recognize is an Activity that obtains machine-readable text from contract Batch Pages and folder Batch Folders. When properly configured with an library_booksOCR Profile, Recognize will selectively perform OCR for images and native-text extraction for digital text in PDFs. Recognize can also reference an perm_mediaIP Profile to collect "layout data" like lines, checkboxes, and barcodes. Other Activities then use this machine-readable text and layout data for document analysis and data extraction.
Redact
format_ink_highlighter Redact is an Activity that visibly obscures (or "redacts") text information on an page based on results returned from a extractor. Be aware, Redact does not alter the text data. It only alters the image.
Remove Level
account_tree Remove Level is an Activity that
Render
print Render is an Activity that converts files of various formats to PDF. It does this by digitally printing the file to PDF using the Grooper Render Printer. This normalizes electronic document content from file formats Grooper cannot read natively to PDF (which it can read natively), allowing Grooper to extract the text via the format_letter_spacing_wide Recognize Activity.
Route
alt_route Route is an Activity that
Send Mail
forward_to_inbox Send Mail is an Activity automates email notifications from Grooper based on events and conditions set by a settings Batch Process. Optionally, documents in the inventory_2 Batch may be attached to the generated email.
Separate
insert_page_break Separate is an Activity that sorts contract Batch Pages into individual folder Batch Folders. This distinguishes "loose pages" from the documents formed by those pages. Once loose pages are separated into Batch Folder documents, they can be further processed by unknown_document Classify, export_notes Extract, output Export and other Activities that need to run on the folder (i.e. document) level.
Spawn Batch
inventory_2 Spawn Batch is an Activity that
Split Pages
Multi-page PDF and TIF files come into Grooper as files attached to single folder Batch Folders. Split Pages is an Activity that creates child contract Batch Pages for each page in the PDF or TIF. This allows Grooper to process and handle these pages as individual objects.
Split Text
receipt Split Text is an Activity that
Text Transform
insert_text Text Transform is an Activity that
Train Lexicon
book_2 Train Lexicon is an Activity that
Translate
translate Translate is an Activity that
XML Transform
code_blocks XML Transform is an Activity that applies XSLT stylesheets to XML data to modify or reformat the output structure for various purposes.
Behavior
A "Behavior" is one of several features applied to a Content Type (such as a description Document Type). Behaviors affect how certain Activities and Commands are executed, based how a document (folder Batch Folder) is classified. They behave differently, according to their Document Type. This includes how they are exported (how Export behaves), if and how they are added to a document search index (how the various indexing commands behave), and if and how Label Sets are used (how Classify and Extract behave in the presence of Label Sets).
- Each Behavior is enabled by adding it to a Content Type. They are configured in the Behaviors editor.
- Behaviors extend to descendent Content Types, if the descendent Content Types has no Behavior configuration of its own.
- For example, all Document Types will inherit their parent Content Model's Behaviors.
- However, if a Document Type has its own Behavior configuration, it will be used instead.
Export Behavior
An Export Behavior defines the parameters for exporting classified folder Batch Folder content from Grooper to other systems. This includes where they are exported to (what content management system, file system, database etc), what content is exported (attached files, images, and/or data), how it is formatted (PDF, CSV, XML etc), folder pathing, file naming and data mappings (for Data Export and CMIS Export).
Import Behavior
An Import Behavior defines how data is mapped from files in an external content management system to Batch Folders created on import when using CMIS Import.
Indexing Behavior
An Indexing Behavior allows documents (folder Batch Folders) to be indexed via AI Search. Once indexed, users can search for and retrieve documents from the Search Page.
Labeling Behavior
A Labeling Behavior extends "label set" functionality to description Document Types. This allows you to collect field labels and other labels present on a document and use them in a variety of ways. This includes functionality for classification, field extraction, table extraction, and section extraction.
PDF Data Mapping
PDF Data Mapping is a Behavior that enhances PDF files generated by the Merge or Export activities with metadata, bookmarks, annotations and/or different kinds of widgets.
Text Rendering
Text Rendering is a Behavior that causes text documents (e.g. TXT files) to be interpreted and displayed as paginated documents rather than a raw text stream.
- By default, this renders TXT files to an 8.5 by 11 inch page format, but this can be altered in the Text Rendering settings.
Classify Method
"Classify Methods" define classification logic used by stacks Content Models during the unknown_document Classify activity. Classify Methods organize document content in Grooper by assigning folder Batch Folders a description Document Type.
- Classify Methods analyze documents (Batch Folders) to determine what kind of document it is.
- Each Classify Methods analyzes documents according to different methodologies to organize documents accurately. This includes text-based pattern matching, computer vision, machine learning models, label sets and more.
- Classify Methods are configured by setting and configuring a Content Model's "Classification Method" property.
GPT Embeddings
BE AWARE: GPT Embeddings is obsolete as of version 2025. The LLM Classifier and Search Classifier methods are the new and improved AI-enabled classification methods. GPT Embeddings is a Classify Method that uses an OpenAI embeddings model and trained document samples to tell one document from another.
Labelset-Based
"Labelset-Based" is a Classify Method that leverages the labels defined via a Labeling Behavior to classify folder Batch Folders.
Lexical
"Lexical" is a Classify Method that classifies folder Batch Folders based on the text content of trained document examples. This is achieved through the statistical analysis of word frequencies that identify description Document Types.
LLM Classifier
"LLM Classifier" is a Classify Method that classifies documents (folder Batch Folders) by asking a large language model (LLM) to select its description Document Type from a list.
Rules-Based
"Rules-Based" is a Classify Method that employs "rules" defined on each description Document Type to classify folder Batch Folders. Positive Extractor and Negative Extractor properties are configured for each Document Type to positively or negatively associate a Batch Folder based on predefined criteria.
- Where the Positive and Negative Extractors will impact all Classify Method results, the Rules-Based method classifies using only these properties and nothing else.
Search Classifier
"Search Classifier" is a Classify Method that classifies documents (folder Batch Folders) by finding similar documents in a document search index. The Search Classifier method uses an embeddings model and vector similarity to give an unclassified document the same description Document Type as its closest match in the search index.
Visual
"Visual" is a Classify Method that uses image analysis instead of text data to determine the description Document Type assigned to a folder Batch Folder during classification. Instead of using text-based extractors, an "Extract Features" IP Command in an perm_media IP Profile is used to collect image-based data from a Batch Folder's image(s). This image-based data is compared against that of previously trained document examples of each Document Type to classify the Batch Folder.
IP Command
IP Commands specify an image processing (IP) operation (such as image cleanup, format conversion or feature detection) and are used to construct image IP Steps in an IP Profile. IP Commands are configured using an IP Step's Command property.
Barcode Detection
Barcode Detection is an IP Command that detects and reads barcode data. The detected barcode information is stored as part of the page's layout data.
Barcode Removal
Barcode Removal is an IP Command that detects, reads and digitally removes barcodes from an image. The detected barcode information is stored as part of the page's layout data.
Binarize
Binarize is an IP Command that converts a color or grayscale image to a bi-tonal (black and white) image using various thresholding methods.
Box Detection
Box Detection is an IP Command that detects checkboxes and determines their check state (checked or unchecked). The detected checkbox information is stored as part of the page's layout data.
Box Removal
Box Removal is an IP Command that detects checkboxes, determines their check state (checked or unchecked) and digitally removes them from an image. The detected checkbox information is stored as part of the page's layout data.
Extract Page
Extract Page is an IP Command that removes an image from a carrier image while simultaneously removing any image warping or skewing.
Line Detection
Line Detection is an IP Command that locates horizontal and vertical lines on documents. The detected line locations are stored as part of page's layout data.
Line Removal
Line Removal is an IP Command that locates and removes horizontal and vertical lines from documents. The detected line locations are stored as part of page's layout data.
Scratch Removal
Scratch Removal is an IP Command detects and removes or repairs scratches from film-based images.
Shape Detection
Shape Detection is an IP Command that locates shapes on a document that match one or more sample images. Common shapes targeted by this command are stamps, seals, logos or other graphical marks that can serve as triggers for document separation or anchors for data extraction. Shapes The detected shapes' locations are stored as part of page's layout data.
Shape Removal
Shape Removal is an IP Command detects and removes shapes from documents. Common shapes targeted by this command are stamps, seals, logos or other graphical marks that interfere with OCR and/or can serve as triggers for document separation or anchors for data extraction. The detected shapes' locations are stored as part of page's layout data.
OCR Engine
An "OCR engine" is the part of OCR software that recognizes text from images. OCR engines analyze the image's pixels to determine where text is on the page and what each character is. In Grooper, OCR engines are selected when configuring an OCR Profile's OCR Engine property.
Azure OCR
Azure OCR is an OCR Engine option for OCR Profiles that utilizes Microsoft Azure's Read API. Azure's Read engine is an AI-based text recognition software that uses a convolutional neural network (CNN) to recognize text. Compared to traditional OCR engines, it yields superior results, especially for handwritten text and poor quality images. Furthermore, Grooper supplements Azure's results with those from a traditional OCR engine in areas where traditional OCR is better than the Read engine.
Repository Option
Repository Options are optional features that affect the entire repository. These optional features enable functionality that otherwise do not work without first establishing the connections these options provide. Repository Options are added to a Grooper Repository and configured using the database Root node's Options property.
LLM Connector
LLM Connector is a Repository Option that enables large language model (LLM) powered AI features for a Grooper Repository.
AI Search
AI Search is a Repository Option that enables Grooper's document search and retrieval features in the Search page. Once enabled, Indexing Behaviors can be added to Content Types (such as stacks Content Models), which will allow users to submit documents to a search index. Once indexed, documents can be retrieved by full text and metadata searches in the Search Page.
Separation Provider
The Provider property of the Separate Activity defines the type of separation to be performed at the designated Scope.
Change in Value Separation
The Change in Value Separation Separation Provider creates a new folder and separates every time an extracted value changes from one contract Batch Page to another.
Control Sheet Separation
Control Sheet Separation is a Separation Provider that uses Grooper document_scanner Control Sheets to separate documents.
EPI Separation
The EPI Separation Separation Provider uses embedded page information ("EPI") to Separate loose pages into document folders. A Data Extractor is used to find page numbers from the text on a page and Grooper uses this information to separate the pages.
ESP Auto Separation
ESP Auto Separation is a Separation Provider used for document separation. It is unique in that it both separates and classifies documents at the same time. It uses page-level classification training examples (among other things) to determine where to insert document folders in a inventory_2 Batch.
Event-Based Separation
Event-Based Separation is a Separation Provider that Separates documents using one or more "Separation Events". Each Separation Event triggers the creation of a new folder.
Multi Separator
The Multi Separator Separation Provider performs separation using multiple Separation Providers. It allows users to create a list of any of the other Separation Providers. If the first provider on the list fails to separate a page (or, as more often is the case, a series of pages), the next one will be applied. If that fails, the next, and so on.
Pattern-Based Separation
Pattern-Based Separation is a Separation Provider that creates a new document folder every time a value returned by a defined pattern is encountered on a page.
Undo Separation
Undo Separation is a Separation Provider. Instead of putting loose contract Batch Pages into folder Batch Folders, this Separation Provider removes Batch Folders, leaving only loose pages.
Service
Grooper.ServiceInstance
Grooper Services are various executable programs that run as a Windows Service to facilitate Grooper processing. Service instances are installed, configured, started and stopped using Grooper Command Console (or in older Grooper versions, Grooper Config).
Activity Processing
Grooper.Services.ActivityProcessing
Activity Processing is a Grooper Service that executes Activities assigned to edit_document Batch Process Steps in a settings Batch Process. This allows Grooper to automate Batch Steps that do not require a human operator.
API Services
Grooper.Services.ApiServices
You can perform inventory_2 Batch processing via REST API web calls by installing API Services.
- As of version 2025, the Grooper Web Services (GWS) web app hosts additional API endpoints. Some of these endpoints overlap with the API Services endpoints. Refer to the GWS documentation for more information on its endpoint offerings. You can locate the GWS documentation for your Grooper install at
https://{webserver-name-or-domain-name}/GWS
Grooper Licensing
Grooper.Services.LicenseService
Grooper Licensing is a Grooper Service that distributes licenses to multiple workstations running Grooper applications.
Import Watcher
Grooper.Services.ImportWatcher
An Import Watcher is a Grooper Service that schedules and runs Import Jobs. It uses an Import Provider to query files in a file system or content management system that meet specified criteria according to a defined schedule (every minute, every day, only on Sundays, etc.). These files are imported into Grooper as documents (folder Batch Folders) in a new inventory_2 Batch.
- Afterward, the imported files can be (and should be) moved, deleted, or modified to prevent repeat imports in the next polling cycle.
Indexing Service
Grooper.GPT.IndexingService
An Indexing Service is a Grooper Service that periodically polls the Grooper database to automate AI Search indexing. It checks to see if any documents in a Grooper Repository are classified as a Document Type that inherit from a Content Type configured with an Indexing Behavior. If there are any, and they need to be added, updated, or deleted to/from the search index, the Indexing Service will submit an "Indexing Job" to be picked up by an Activity Processing service.
Extraction Related Types
These are configuration objects in Grooper that relate to extracting data from documents. These objects include specialized items such as "Table Extract Methods" which pertain only to configuring Data Table nodes. These also include more general items such as Value Extractors which are used by various extractor related properties on a variety of node types in Grooper.
These "extraction related types" are always found when configuring properties of:
- Extractor Nodes (Data Type, Value Reader and Field Class)
- Data Elements (Data Model, Data Field, Data Section, Data Table and Data Column)
This includes:
- Value Extractors
- Colation Providers
- Fill Methods
- Lookup Specifications
- Section Extract Methods
- Table Extract Methods
- Scripting/Advanced user info: These objects inherit from a base class called "Embedded Object". This is includes a large number of objects that exist as configurable properties.
Collation Provider
The Collation property of a pin Data Type defines the method for converting its raw results into a final result set. It is configured by selecting a Collation Provider. The Collation Provider governs how initial matches from the Data Type's extractor(s) are combined and interpreted to produce the Data Type's final output.
AND
AND is a Collation Provider option for pin Data Type extractors. AND returns results only when each of its referenced or child extractors gets at least one hit, thus acting as a logical “AND” operator across multiple extractors.
Array
Array is a Collation Provider option for pin Data Type extractors. Array matches a list of values arranged in horizontal, vertical, or text-flow order, combining instances that qualify into a single result.
Combine
Combine is a Collation Provider option for pin Data Type extractors. Combine combines instances from returned results based on a specified grouping, controlling how extractor results are assembled together for output.
Key-Value List
Key-Value List is a Collation Provider option for pin Data Type extractors. Key-Value List matches instances where a key and a list of one or more values appear together on the document, adhering to a specific layout pattern.
Key-Value Pair
Key-Value Pair is a Collation Provider option for pin Data Type extractors. Key-Value Pair matches instances where a key is paired with a value on the document in a specific layout. Note: Key-Value Pair is an older technique in Grooper. In most cases, the Labeled Value extractor is preferable to Key-Value Pair collation.
Multi-Column
Multi-Column is a Collation Provider option for pin Data Type extractors. Multi-Column combines multiple columns on a page into a single column for extraction.
Ordered Array
Ordered Array is a Collation Provider option for pin Data Type extractors. Ordered Array finds sequences of values where one result is present for each extractor, in the order they appear, according to a specified horizontal, vertical or text-flow layout.
Pattern-Based
Pattern-Based is a Collation Provider option for pin Data Type extractors. Pattern-Based uses regular expressions to sequence returned results into a final result set.
Split
Split is a Collation Provider option for pin Data Type extractors. Split separates a data instance at each match returned by the Data Type. The results are used as anchor points to "split" text into one or more smaller parts.
Fill Method
Fill Methods provide various mechanisms for populating child Data Elements of a data_table Data Model, insert_page_break Data Section or table Data Table. Fill Methods can be added to these nodes using their "Fill Methods" property and editor.
- Fill Methods are secondary extraction operations. They populate descendant Data Elements after normal extraction when the export_notes Extract activity runs.
AI Extract
Grooper.GPT.AIExtract
AI Extract is a Fill Method that leverages a Large Language Model (LLM) to return extraction results to Data Elements in a data_table Data Model or insert_page_break Data Section. This mechanism provides powerful AI-based data extraction with minimal setup.
Fill Descendants
Grooper.GPT.FillDescendants
Fill Descendants is a Fill Method that executes any Fill Methods on child Data Elements in parallel. This has been shown to dramatically increase efficiency on larger data_table Data Models with multiple insert_page_break Data Sections using AI Extract.
Run Child Extractors
Grooper.Core.RunChildExtractors
Run Child Extractors is a Fill Method that executes extraction for a subset of child Data Elements. This allows you to selectively run extraction logic for one or more Data Elements in a data_table Data Model, insert_page_break Data Section, or table Data Table.
Section Extract Method
The Extract Method property of a insert_page_break Data Section defines a "Section Extract Method" which specifies how section instances will be identified and extracted.
Clause Detection
Clause Detection is a insert_page_break Data Section Extract Method. It leverages LLM text embedding models to compare supplied samples of text against the text of a document to return what the AI determines is the "chunk" of text that most closely resembles the supplied samples.
Nested Table
Nested Table is a insert_page_break Data Section Extract Method. This method divides a document into sections by extracting table data within those sections. This gives Grooper users a method for extracting hierarchical tables as well as dividing up a document into sections where each of those sections have the same table (or at least tabular data which can be extracted by a single table Data Table object).
Transaction Detection
Transaction Detection is a insert_page_break Data Section Extract Method. This extraction method produces section instances by detecting repeating patterns of text around the Data Section's child variables Data Fields.
Lookup Specification
A Lookup Specification defines a "lookup operation", where existing Grooper fields (called "lookup fields") are used to query an external data source, such as a database. The results of the lookup can be used to validate or populate field values (called "target fields") in Grooper. Lookup Specifications are created on "container elements" (data_table Data Models, insert_page_break Data Sections and table Data Tables) using their Lookups property. Lookups may query using all single-instance fields relative to the container element (including those defined on parent elements up to the root Data Model), but cannot be used to populate a field value on a parent of the container element.
CMIS Lookup
CMIS Lookup is a Lookup Specification that performs a lookup against a settings_system_daydream CMIS Repository via a "CMISQL query" (a specialized query language based on SQL database queries).
Database Lookup
Database Lookup is a Lookup Specification that performs a lookup against a database Data Connection via a SQL query.
GPT Lookup
PLEASE NOTE: GPT Lookup is obsolete as of version 2025. Much of its functionality was replaced by newer and better LLM-based extraction methods, such as AI Extract. If absolutely necessary, its functionality could also be replicated with a Web Service Lookup implementation. GPT Lookup is a Lookup Specification that performs a lookup using an OpenAI GPT model.
Lexicon Lookup
Lexicon Lookup is a Lookup Specification that performs a lookup against a dictionary Lexicon.
Web Service Lookup
Web Service Lookup is a Lookup Specification that looks up external data at an API endpoint by calling a web service.
XML Lookup
XML Lookup is a Lookup Specification that performs a lookup against an XML file stored as a draft Resource File in the package_2 Project. XML Lookups use XPath expressions to select XML nodes and map XML attributes or an XML element's text to Grooper fields.
Table Extract Method
A Table Extract Method defines the settings and logic for a table Data Table to perform extraction. It is set by configuring the Extract Method property of the Data Table.
Delimited Extract
The Delimited Extract Table Extract Method extracts tabular data from a delimiter-separated text file, such as a CSV file.
Fluid Layout
The Fluid Layout Table Extract Method will choose between Tabular Layout and Flow Layout configurations, depending on how labels are collected for a description Document Type.
Grid Layout
The Grid Layout Table Extract Method uses the positional location of row and column headers to interpret where a tabular grid would be around each value in a table and extract values from each cell in the interpreted grid.
Row Match
The Row Match Table Extract Method uses regular expression pattern matching to determine a tables structure based on the pattern of each row and extract cell data from each column.
Tabular Layout
The Tabular Layout Table Extract Method uses column header values determined by the view_column Data Columns Header Extractor results (or labels collected for the Data Columns when a Labeling Behavior is enabled) as well as Data Column Value Extractor results to model a table's structure and return its values.
Value Extractor
Grooper.Core.ValueExtractor
Value Extractors define an operation that reads data from the text (and sometimes visual) content of a page or document. There are over 20 unique Value Extractors, each using specialized logic to return results. Value Extractors are consumed by multiple higher-level objects in Grooper (such as Data Elements, Extractor Nodes, various Activities and more) to perform a diverse set of document processing duties.
- Value Extractors return a list of one or more "data instances". Data instances contain both the value and its page location, which allows Grooper to highlight results in a Document Viewer.
Ask AI
Grooper.GPT.OpenAI.Chat.AskAI
Ask AI is a Value Extractor that executes a chat completion using a large language model (LLM), such as OpenAI's GPT models. It uses a document's text content and user-defined instructions (a question about the document) in the chat prompt. Ask AI then returns the response as the extractor's result. Ask AI is a powerful, LLM-based extraction method, that can be used anywhere in Grooper a Value Extractor is referenced. It can complete a wide array of tasks in Grooper with simple text prompts.
Detect Signature
Grooper.Extract.DetectSignature
Detect Signature is a Value Extractor that cant detect if a handwritten signature is present on a document. It detects signatures within a specified rectangular region on a document page by measuring the "fill percentage" (what percentage of pixels are filled in the region).
Field Match
Grooper.Extract.FieldMatch
Field Match is a Value Extractor that matches the value stored in a previously-extracted variables Data Field or view_column Data Column.
Find Barcode
Grooper.Extract.FindBarcode
Find Barcode is a Value Extractor that searches for and returns barcode values previously stored in a folder Batch Folder or contract Batch Page's layout data.
- Note: Find Barcode differs slightly from Read Barcode. Read Barcode performs barcode recognition when the extractor executes. Find Barcode can only look up barcode data stored in the document or page's layout data. Find Barcode runs quicker than Read Barcode, but barcode values must have previously been collected in the Batch Process by the Image Processing or Recognize activities.
GPT Complete
Removed in version 2025
GPT Complete is a Value Extractor that leverages Open AI's GPT models to generate chat completions for inputs, returning one hit for each result choice provided by the model's response.
- PLEASE NOTE: GPT Complete is a deprecated Value Extractor. It uses an outdated method to call the OpenAI API. Please use the Ask AI extractor going forward.
Highlight Zone
Grooper.Extract.HighlightZone
Highlight Zone is a Value Extractor that sets a highlight region on a document without performing any actual data extraction. This "extractor" is used to mark areas of interest or importance for Review users or for uncommon scenarios where a data instance location is needed with no actual value.
Label Match
Grooper.Extract.LabelMatch
Label Match is a Value Extractor that matches a list of one or more values using matching options defined by a Labeling Behavior. It is similar to List Match but uses shared settings defined in a Labeling Behavior for Fuzzy Matching, Vertical Wrap, and Constrained Wrap.
Labeled OMR
Grooper.Extract.LabeledOMR
Labeled OMR is a Value Extractor used to output OMR checkbox labels. It determines whether labeled checkboxes are checked or not. If checked, it outputs the label(s) or a Boolean true/false value as the result.
Labeled Value
Grooper.Extract.LabeledValue
Labeled Value is a Value Extractor that identifies and extracts a value next to a label. This is one of the most commonly used extractors to extract data from structured documents (such as a standardized form) and static values on semi-structured documents (such as the header details on an invoice).
List Match
Grooper.Extract.ListMatch
List Match is a Value Extractor designed to return values matching one or more items in a defined list. By default, the List Match extractor does not use or require regular expression, but can be configured to utilize regular expression syntax.
Ordered OMR
Grooper.Extract.OrderedOMR
Ordered OMR is a Value Extractor used to return OMR check box information. Ordered OMR returns information for multiple check boxes within a defined zone based on their order and layout. The zone may be optionally fixed on the page or anchored to a static text value (such as a label).
Pattern Match
Grooper.Extract.PatternMatch
Pattern Match is a Value Extractor that extracts values from a document that match a specified regular expression, providing data collection following a known format or pattern.
Query HTML
Grooper.Messaging.QueryHTML
Query HTML is a Value Extractor specialized for HTML documents. It uses either CSS or XPath selectors to return the inner text or an attribute of an HTML element.
Read Barcode
Grooper.Extract.ReadBarcode
Read Barcode is a Value Extractor that uses barcode recognition technology to read and extract values from barcodes found in the document content.
- Note: Read Barcode differs slightly from Find Barcode. Read Barcode performs barcode recognition when the extractor executes. Find Barcode can only look up barcode data stored in the document or page's layout data. Find Barcode runs quicker than Read Barcode, but barcode values must have previously been collected in the Batch Process by the Image Processing or Recognize activities.
Read Metadata
Grooper.Extract.ReadMetaData
Read Metadata is a Value Extractor retrieves metadata values associated with a document. Read Metadata can return metadata from a folder Batch Folder's attachment file based on its MIME type, such as PDF, Word and Mail Message ('message/rfc822' or 'application/vnd.ms-outlook'). It can also return data using a Document Link in Grooper, such as a File System Link or a CMIS Document Link.
Read Zone
Grooper.Extract.ReadZone
Read Zone is a Value Extractor that allows you to extract text data in a rectangular region (called an "extraction zone" or just "zone") on a document. This can be a fixed zone, extracting text from the same location on a document, or a zone relative to a text value (such as a label) or a shape location on the document.
Reference
Grooper.Extract.ReferenceExtractor
Reference is a Value Extractor used to reference an Extractor Node. This allows users to create re-usable extractors and use the more complex pin Data Type and input Field Class extractors throughout Grooper.
Word Match
Grooper.Extract.WordMatch
Word Match is a Value Extractor that extracts individual words or phrases from documents. It is used for n-gram extraction. Each gram may be optionally executed against a dictionary Lexicon to ensure words and phrases only match a set vocabulary.
Zonal OMR
Grooper.Extract.ZonalOMR
Zonal OMR is a Value Extractor that reads one or more OMR checkboxes using manually-configured zones. The zone may be optionally fixed on the page or anchored to a static text value (such as a label).
BE AWARE: Zonal OMR is outdated compared to Labeled OMR and Ordered OMR. It requires the most manual setup of any OMR extractor to configure. Use this as a last resort when other OMR extractor options have been exhausted.
Import and Export Related Types
These are configuration objects in Grooper that relate to importing documents into Grooper, exporting processed content (files and data) out of Grooper, and otherwise accessing document content linked in Grooper to external file systems and content management systems.
This includes:
Please Note: Import Behavior and Export Behavior are obviously import and export related. Because their parent type is "Behavior", they are found in the Core Configuration Types portion of this Glossary.
- Scripting/Advanced user info: These objects inherit from a base class called "Embedded Object". This is includes a large number of objects that exist as configurable properties.
CMIS Binding
CMIS Bindings are the platform connection types for cloud CMIS Connections. The CMIS Binding establishes the communication protocols used to connect Grooper with content management systems (CMS) and file systems.
CMIS Bindings use the CMIS standard as a model to define connectivity. Even when connecting to CMS platforms that are not truly CMIS systems (such as a Windows file system), Grooper normalizes connection to them as if they were. This allows Grooper to use CMIS Import and CMIS Export for all storage platforms.
- You will commonly hear CMIS Binding referred to as a "CMIS connection type", "connection type", or just "connection", as in an "Exchange connection".
AppXtender
AppXtender is a connection option for cloud CMIS Connections. It allows Grooper to connect to the AppEnhancer (formerly ApplicationXtender) content management system for import and export operations.
Box
Box is a connection option for cloud CMIS Connections. It Grooper to the Box content management system for import and export operations.
CMIS
CMIS is a connection option for cloud CMIS Connections. It connects Grooper to a CMIS 1.0 or CMIS 1.1 server for import and export operations. This can be used to connect to CMS platforms that implement the CMIS protocol such as these.
Exchange
Exchange is a connection option for cloud CMIS Connections. It connects Grooper to Microsoft Exchange email servers (including Outlook servers) for import and export operations.
FTP
FTP is a connection option for cloud CMIS Connections. It connects Grooper to FTP directories for import and export operations.
IMAP
IMAP is a connection option for cloud CMIS Connections. It connects Grooper to email messages and folders through an IMAP email server for import and export operations.
NTFS
NTFS is a connection option for cloud CMIS Connections. It connects Grooper to files and folders in the Microsoft Windows NTFS file system for import and export operations.
OneDrive
OneDrive is a connection option for cloud CMIS Connections. It connects Grooper to Microsoft OneDrive cloud services for import and export operations.
SFTP
SFTP is a connection option for cloud CMIS Connections. It connects Grooper to SFTP directories for import and export operations.
SharePoint is a connection option for cloud CMIS Connections. It Grooper to Microsoft SharePoint, providing access to content stored in "document libraries" and "picture libraries" for import and export operations.
Content Link
Grooper.Core.ContentLink
Content Links define references to files or folders stored outside of Grooper, such as in a Windows folder or in a CMIS Repository.
- Content Link has two sub-types: Document Link and Folder Link. There are 9 types of "Document Link" and only 1 type of "Folder Link". Due to this, Document Link is a more common term than "Content Link".
Document Links
Grooper.Core.DocumentLink
CMIS Document Link
Grooper.CMIS.CmisLink
File System Link
Grooper.Core.FileSystemLink
FTP Link
Grooper.Messaging.FtpLink
HTTP Link
Grooper.Messaging.HTTPLink
Mail Link
Grooper.Messaging.MailLink
PST Link
Grooper.Office.PstLink
SFTP Link
Grooper.Messaging.SftpLink
Subfile Link
Grooper.Core.SubfileLink
ZIP Link
Grooper.Messaging.FtpLink
Folder Links
Grooper.Core.FolderLink
CMIS Folder Link
Grooper.CMIS.CmisFolderLink
Export Definition
Export Behaviors are defined by adding and configuring one or more Export Definitions (See Export Definition Types or the Export Definitions section of the Export article). An Export Definition defines export parameters to external systems, such as file systems, content management repositories, databases, or mail servers.
CMIS Export
CMIS Export is an Export Definition available when configuring an Export Behavior. It exports content over a cloud CMIS Connection, allowing users to export documents and their metadata to various on-premise and cloud-based storage platforms.
Data Export
Data Export is an Export Definition available when configuring an Export Behavior. It exports extracted document data over a database Data Connection, allowing users to export data to a Microsoft SQL Server or ODBC compliant database.
Import Provider
Grooper.Core.ImportProvider
Import Providers enable Grooper to import file-based content from numerous sources, including Windows file systems, SFTP file systems, mail servers and various content management systems (CMS). An Import Provider is selected and configured when configuring "Import Jobs". Import Jobs are submitted in one of two ways:
- By a user from the Imports page: Ad-hoc or "user directed" Import Jobs are submitted from the Imports Page, using the "Submit Import Job" button.
- From an Import Watcher service: Automated or "scheduled" Import Jobs are submitted by an Import Watcher service according to its Poling Loop or Specific Times specification.
In both cases, an Import Provider is selected and configured using using the "Provider" property.
CMIS Import
Grooper.CMIS.CmisImportBase
CMIS Import refers to two Import Providers used to import content from settings_system_daydream CMIS Repositories: Import Descendants and Import Query Results. CMIS Imports allow users to import from various on-premise and cloud based storage platforms (including Windows folders, Outlook inboxes, Box accounts, AppEnhancer applications and more).
Import Descendants
Grooper.CMIS.ImportDescendants
Import Descendants is one of two Import Providers that use cloud CMIS Connections to import document content into Grooper. Import Descendants imports files from a settings_system_daydream CMIS Repository folder location, including any files in any sub-folders (i.e. all "descendant" files).
Import Query Results
Grooper.CMIS.ImportQueryResults
Import Query Results is one of two Import Providers that use cloud CMIS Connections to import document content into Grooper. Import Query Results imports files from a settings_system_daydream CMIS Repository that match a "CMISQL query" (a specialized query language based on SQL database queries).
File System Import
Grooper.Core.FileSystemImport
File System Import refers to a Legacy Import Provider used to import documents directly from your Windows File System into Grooper.
HTTP Import
Grooper.Messaging.HTTPImport
HTTP Import is an Import Provider used to import web-based content (web pages and files hosted on an HTTP server). HTTP Import can be used to ingest individual web pages, defined portions of a website or entire websites into Grooper.
Test Batch
Grooper.Core.TestBatchImport
"Test Batch" is a specialized Import Provider designed to facilitate the import of content from an existing inventory_2 Batch in the test environment. This provider is most commonly used for testing, development, and validation scenarios, and is not intended for production use.
- Looking for information on "production" vs "test" Batches in Grooper? See here.
Misc Properties and Other Configuration Types
AI Generator/Generators
AI Generators create custom documents using the results of a Search Page query and a large language model (LLM). Both document content and instructions are fed to the LLM to produce a text-based file.
- AI Generators are added and configured using an Indexing Behavior's "Generators" property and editor. They are executed from the Search Page using the "Download" command and "Download Custom" format.
CMISQL Query/CMIS Query
Grooper.CMIS.CmisQuery
A CMISQL Query (aka CMIS Query) is Grooper's way of searching for documents in CMIS Repositories. Commonly, CMISQL Queries are used by Import Query Results to import documents from a CMIS Repository. CMISQL Queries are also used by CMIS Lookup to lookup data from a CMIS Repository. CMISQL Queries are based on a subset of the SQL-92 syntax for querying databases, with some specialized extensions added to support querying CMIS sources.
- CMISQL Queries are configured using the "CMIS Query" property found in "Import Query Results" and "CMIS Lookup".
Paragraph Marker/Paragraph Marking
Grooper.Core.ParagraphMarker
Paragraph Marking is a component of Grooper's Text Preprocessor. It enables the "Paragraph Marker", which detects paragraph boundaries and marks them by altering the normal carriage return and new line feed pairs at the end of each line. Instead of placing like breaks at the end of each line, the Paragraph Marker places them at the end of each paragraph. This produces a normalized text flow, making it easier to extract values that span lines.
- "Paragraph Marker" is the embedded object that actually performs paragraph detection and marking in Grooper. "Paragraph Marking" is the property that enables the Paragraph Marker and allows users to configure it.
Preprocessing/Text Preprocessor
Grooper.Core.TextPreprocessor
Grooper's "Text Preprocessor" adjusts how raw text is formatted before extraction. It manipulates control characters (such as CR/LF pairs) to allow regular expression patterns to match (or ignore) structural elements, such as line breaks, paragraph boundaries and tab markers. The Text Preprocessor executes the following:
- Paragraph Marking
- Tab Marking
- Vertical Tab Marking
- Ignore Control Characters
- "Text Preprocessor" is the embedded object that actually performs paragraph detection and marking in Grooper. The Text Preprocessor can be enabled and configured by various items (mostly extractors such as Pattern Match) using either a "Preprocessing" or "Preprocessing Options" property.
Permission Set/Permission Sets
Grooper.PermissionSet
Permission Sets define security permissions in a Grooper Repository for a user or group. This allows you to restrict user access to specified Grooper pages (such as the Design Page) and Grooper Commands.
- "Permission Set" is the embedded object that defines security principles. They are added to a Grooper Repository and configured using the "Permission Sets" property found on the database Root node.
Quoting Method/Document Quoting
Grooper.GPT.QuotingMethod
Quoting Methods provide various mechanisms to feed "quotes" from a document to an AI model for Grooper's LLM-based features. Quoting Methods control what text is fed to the AI, allowing users to feed the AI only the necessary context needed to respond or reduce costs by reducing the amount of input tokens sent to the LLM service. Depending on which Quoting Method is selected and configured, the quote may be the entire document text, a portion of a document's text, data extracted from the document, layout data, or a combination of this data.
- "Quoting Method" is class of embedded objects that feed quotes to an LLM. Quoting Methods are selected and configured by various items (including AI Extract) using a "Document Quoting" property.
Variable Definition
Grooper.Core.VariableDefinition
Variable Definitions define a variable with a computed value that can be called by various code expressions. Variable Definitions are added to Data Models, Data Sections and Data Tables using their "Variables" property
- Used By: Data Model, Data Section, Data Table
Vertical Wrap Detection/Vertical Wrap
Vertical Wrap Detection enables simplified extraction of multi-line text segments that are stacked vertically within a document. Vertical Wrap Detection can be used by Content Types configured with a Labeling Behavior and by the List Match and Label Match Value Extractors.
- "Vertical Wrap Detection" is the embedded object that actually performs wrap detection in Grooper. Vertical Wrap Detection is enabled and configured with the "Vertical Wrap" property found in configuration items that support it.
Properties
A property is a mechanism by which an object in Grooper is configured that affects how the object performs its function.
Alignment
"Alignment" refers to how Grooper highlights text from an AI response on a document in a Document Viewer. Alignment properties can be configured to alter how Grooper highlights results when using LLM-based extraction methods, such as AI Extract.
Confidence Multiplier and Output Confidence
Some results carry more weight than others. The Confidence Multiplier and Output Confidence properties allow you to manually adjust an extraction result's confidence.
Constrained Wrap
The Constrained Wrap property allows certain Value Extractors and the Labeling Behavior to match values which wrap from one line to the next inside a box (such as a table cell).
Content Type Filter
The Content Type Filter property restricts Activities to specific collections_bookmark Content Categories and/or description Document Types.
Import Mode
Import Mode is a configurable property for CMIS Import providers. This controls how file content is loaded into a Grooper Repository during an Import Job. This property is key to setting up a "Sparse" import in Grooper.
Output Extractor Key
The Output Extractor Key property is another weapon in the arsenal of powerful Grooper classification techniques. It allows pin Data Types to return results normalized in a way more beneficial to document classification.
Parameters
Parameters is a collection of properties used in the configuration of LLM constructs. Temperature, TopP, Presence Penalty, and Frequency Penalty are parameters that influence text generation in models. Temperature and TopP control the diversity and probability distribution of generated text, while Presence Penalty and Frequency Penalty help manage repetition by discouraging the reuse of words or phrases.
Scope
The Scope property of a edit_document Batch Process Step, as it relates to an Activity, determines at which level in a inventory_2 Batch hierarchy the Activity runs.
Secondary Types
Secondary Types allow the application of multiple Content Types to a single folder Batch Folder.
Tab Marking
Tab Marking allows you to insert tab characters into a document's text data.
Misc Features and Functionality
CSS Data Viewer Styling
CSS Data Viewer Styling refers to using CSS to custom style the Review activity's Data Viewer interface. This gives you a great deal of control over a data_table Data Model's appearance and layout during document review.
EDI Integration
EDI Integration refers to Grooper's ability to process EDI files.
Fine-Tuning for AI Extract
Fine-tuning is the process of further training a large language model (LLM) on a specific dataset to make it more specialized for a particular task or domain. This allows the model to adapt its general language understanding to better handle the unique vocabulary, style, and structure of the domain it's fine-tuned on.
In Grooper, you can easily start fine-tuning a model based on a data_table Data Model that will facilitate better extraction when using AI Extract.
A "Footer Row" is a row at the bottom of a table Data Table that displays sum totals for numerical view_column Data Columns. This can help Data Viewer users validate data Grooper extracts for one or more Data Columns. The Data Column's "Footer Mode" controls if a sum calculation is performed or not (and if Tabular Layout's "Capture Footer Row" creates the Footer Row if and how document data is used to capture and validate the footer value).
Label Sets
Label Sets are collections of label definitions used in Grooper to identify and extract information from documents. A label set maps document text—such as field names, headers, or column titles—to corresponding Data Field, Data Section, or Data Table elements in the Data Model. Label sets are essential for automating extraction and classification, especially in environments where document layouts and terminology may vary.
URL Endpoints for Review
Three different URL endpoints can be used to open Review tasks in the Grooper Web Client, given certain information like the Grooper Repository ID, settings Batch Process name, inventory_2 Batch Id and more. This allows Grooper users to link directly to a Batch in Review with a URL.
XML Schema Integration
XML Schema Integration refers to Grooper's ability to use XML schemas to build Data Models, extract XML documents, and more.
UI Element
A UI Element is a portion of the Grooper interface that allows users to interact with or otherwise receive information about the application.
Data Inspector
The Grooper Data Inspector is a UI Element that can be found anywhere there is a Document Viewer showing extraction results. This UI Element allows a user to inspect the Data Instance hierarchies of an extracted result.
Design Page
GrooperReview.Pages.Design.DesignPage
The Design Page is the primary user interface for Grooper configuration. It is the central workplace for Grooper designers and administrators. From the Design page, users create, test and administer nodes in a Grooper Repository.
Document Viewer
The Grooper Document Viewer is the portal to your documents. It is the UI that allows you to see a folder Batch Folder's (or a contract Batch Page's) image, text content, and more.
Node Tree
The Node Tree is the hierarchical list of Grooper node objects found in the left panel in the Design Page. It is the basis for navigation and creation in the Design Page.
Overrides
Overrides is a tab provided to allow overriding of default properties set to a Data Element.
Search Page
The Search Page allows users to leverage AI Search indexes to query indexed documents. Both full text and metadata searches are supported, with feature rich querying and filtering capabilities. Users can interact with search results in several ways. They can view documents in the Document Viewer, review documents' extracted data, create new inventory_2 Batches from the result set, submit processing jobs, start a conversation with an psychology AI Assistant and more.
Scan Viewer
The Scan Viewer is a user interface that can be added to the user-attended person_search Review step in a settings Batch Process. It is used to scan documents into inventory_2 Batches from one or more scanning workstations.
Summary Tabs
stacks Content Models and collections_bookmark Content Categories have a Summary tab where you can view "Descendant Node Types", description Document Types, and Expressions.
Other
Concepts
There are many objects and properties a user can configure in Grooper, however, gaining an understanding how, why, and when to use these objects and properties is powered by one's understanding of the underlying concepts that define what what these objects and properties are doing and why.
Activity Processing
Activity Processing is the execution of a sequence of configured tasks which are performed within a settings Batch Process to transform raw data from documents into structured and actionable information. Tasks are defined by Grooper Activities, configurated to perform document classification, extraction, or data enhancement.
CMIS+
CMIS+ is a conceptual term that refers to Grooper's connectivity architecture to external storage platforms. CMIS+ standardizes connections to a variety of content management system based on the CMIS standard. This provides a standardized setup to allow Grooper to interoperate with both CMIS compliant systems and non-CMIS systems. It further provides normalized access to document content and metadata for import (CMIS Import) and export (CMIS Export) operations.
CMIS
CMIS (Content Management Interoperability Services) is open standard allowing different content management systems to "interoperate", sharing files, folders and their metadata as well as programmatic control of the platform over the internet.
Classification
Classification is the process of identifying and organizing documents into categorical types based on their content or layout. Classification is key for efficient document management and data extraction workflows. Grooper has different methods for classifying documents. These include methods that use machine learning and text pattern recognition. In a Grooper Batch Process, the Classify Activity will assign a Content Type to a folder Batch Folder.
Code Expressions
Code Expressions (not to be confused with regular expressions) are snippets of VB.NET code that expand Grooper's core functionality.
Data Context
Data Context refers to contextual information used to extract data, such as a label that identifies the value you want to collect.
Data Extraction
Data Extraction involves identifying and capturing specific information from documents (represented by folder Batch Folders in Grooper). Extraction is performed by configurable Data Extractors, which transform unstructured or semi-structured data into a structured, usable format for processing and analysis.
Data Extractor
Data Extractor (or just "extractor") refers to all Value Extractors and Extractor Nodes. Extractors define the logic used to return data from a document's text content, including general data (such as a date) and specific data (such as an agreement date on a contract).
Data Instance
A Data Instance is an encapsulation of text data within a document returned by Grooper's extractors. Data instances are the hierarchy of text data created by Grooper's extractors.
Expressions
Expressions (not to be confused with regular expressions) are snippets of VB.NET code that expand Grooper's core functionality.
Expressions Cookbook
The "Expressions Cookbook" is a reference list for commonly used Code Expressions in Grooper.
Field Mapping
Field Mapping refers to how logical connections are made between metadata content in Grooper and an external storage platform.
Five Phases of Grooper
The "Five Phases of Grooper" is a conceptual term that seeks to build understanding of how documents are processed through Grooper.
Flow Collation
"Flow Collation" refers to the text-flow based layout option used by various Collation Providers forpin Data Type extractors.
Fuzzy RegEx
Fuzzy RegEx is Grooper's use of fuzzy logic within Value Extractors that leverage regular expressions to match patterns. Fuzzy RegEx allows extractors to overcome defects in a document's OCR results to accurately return results. Fuzzy RegEx is enabled by enabling the Fuzzy Matching property.
GPT Integration
Grooper's GPT Integration is refers to the usage of OpenAI's GPT models within Grooper to enhance the capabilities of data extractors, classification, and lookups.
Grooper Infrastructure
Grooper Infrastructure refers to the computing underpinnings of what makes up a Grooper Repository and the software that allows the Grooper platform to automate tasks and users to interface with it.
Grooper Repository
A Grooper Repository is the environment used to create, configure and execute objects in Grooper. It provides the framework to "do work" in Grooper. Fundamentally, a Grooper Repository is a connection to a database and file store location, which store the node configurations and their associated file content. The Grooper application interacts with the Grooper Repository to automate tasks and provide the Grooper user interface.
Image Processing
"Image processing", as a general term, refers to software techniques that manipulate and enhance images. Image processing removes imperfections and adjusts images to improve OCR accuracy. In Grooper, images are processed primarily by two Activities:
- Image Processing - This Activity permanently adjusts the image using. It is primarily used to compensate for defects produced by a document scanner (like border artifacts and skewed images). It does so by applying IP Commands in an perm_media IP Profile.
- Recognize - This Activity performs OCR. When an library_books OCR Profile references an perm_media IP Profile, the image will be processed temporarily. A temporary image is handed to the OCR engine and discarded once characters are recognized.
- Grooper also has "computer vision" capabilities that analyze and interpret images. These capabilities are also executed during Grooper's image processing. For example, Grooper's "Line Removal" command will locate lines on an image (computer vision), remove those artifacts to improve OCR results during Recognize (image processing) and store that data for later use in Grooper (computer vision).
LINQ to Grooper Objects
LINQ is Microsoft .NET component that provides data querying capabilities to the .NET framework. In Grooper, you can use the LINQ syntax in Code Expressions to "LINQ to Grooper Objects". This allows expressions to access information from collections of data, such as from multi-instance Data Sections or Data Tables.
Layout Data
Layout Data refers to visual information Grooper certain IP Commands collect, such as lines, checkboxes, barcodes, and detected shapes. This data is stored in a "Grooper.Layout.json" file attached to contract Batch Pages. Layout data is used by certain extractors and other features that rely on the presence of that data to function.
Microfiche Processing
Microfiche Processing refers to Grooper's suite of specialized Activities and IP Commands that process microfiche documents.
Microsoft Office Integration
Grooper's Microsoft Office Integration allows the platform to easily convert Microsoft Word and Microsoft Excel files into formats that Grooper can read natively (PDF and CSV).
Mixed Classification
"Mixed Classification" refers to leveraging a Classify Method and "rules" defined on a description Document Type to overcome the shortcomings of an individual method.
OCR
OCR is stands for Optical Character Recognition. It allows text on paper documents to be digitized, in order to be searched or edited by other software applications. OCR converts typed or printed text from digital images of physical documents into machine readable, encoded text.
OCR Synthesis
OCR Synthesis refers to a suite of OCR related functionality unique to Grooper. The OCR Synthesis suite will pre-process and re-process raw results from the OCR Engine and synthesize its results into a single, more accurate OCR result.
Object Nomenclature
The Grooper Wiki's Object Nomenclature defines how Grooper users categorize and refer to different types of Node Objects in a Grooper Repository. Knowing what objects can be added to the Grooper Node Tree and how they are related is a critical part of understanding Grooper itself.
PDF Page Types
PDF pages can be one of several PDF Page Types. "Page types" describe the kind of content in a PDF page. This informs Grooper how certain Activities should process the page. For example, "single image" pages are OCR'd by the Recognize activity, where "text only" pages have their native text extracted by Recognize.
Prompt Engineering
"Prompt Engineering" is the process of designing and refining prompts to interact more effectively with large language models (LLMs) like GPT-4. The goal is to guide the model to produce desired outputs by carefully crafting the input queries.
Regular Expression
Regular Expression (or regex) is a standard syntax designed to parse text strings. This is a way of finding information in text. It is the primary method by which Grooper extracts and returns data from documents.
Separation
Separation is the process of taking an unorganized inventory_2 Batch of loose contract Batch Pages and organizing them into documents represented by folder Batch Folders in Grooper. This is done so Grooper can later assign a description Document Type to each document folder in a process known as "classification".
TF-IDF
TF-IDF stands for term frequency-inverse document frequency. It is a statistical calculation intended to reflect how important a word is to a document within a document set (or "corpus"). It is how Grooper uses machine learning for training-based document classification (via the Lexical method) and data extraction (via the input Field Class extractor).
Table Extraction
"Table Extraction" refers to Grooper's ability to extract data from cells in tables on documents. This is accomplished by configuring the table Data Table and its child view_column Data Column elements in a data_table Data Model.
Thread
A Thread is the smallest unit of processing that can be performed within an operating system. In Grooper, threads are allocated for processing by Activity Processing services.
Training-Based Approaches to Document Classification
"Training-Based Approaches to Document Classification" refers to Grooper Classify Methods that classify folder Batch Folders using document examples for each description Document Type. The Classify activity then assigns unclassified Batch Folders a Document Type based on how similar it is to the Document Type's training data.
Training Batch
The Training Batch is a special inventory_2 Batch created when training document examples using the Lexical classification method. The Training Batch service two purposes: (1) It is a Batch that holds all previously trained folder Batch Folders. Designers can go to this Batch to view these documents and copy and paste them into other Batches if needed. (2) Batch Folders in the Training Batch will be used to re-train the Content Model's classification data when the Rebuild Training command is executed.
UNC Path
UNC Path is a conceptual term that refers to UNC (Universal Naming Convention) which is a standard used in Microsoft Windows for accessing shared network folders.
Waterfall Classification
Waterfall Classification is a classification technique in Grooper that prioritizes training similarity over classification "rules" set by a description Document Type's Positive Extractor. This can be helpful in scenarios where folder Batch Folders get misclassified and simply retraining won't help.
Disambiguation
Repository
A "repository" is a general term in computer science referring to where files and/or data is stored and managed. In Grooper, the term "repository" may refer to:
- PRIMARILY a Grooper Repository. This is most commonly what people are referring to when they simply say "repository".
- Less commonly a CMIS Repository