Glossary

From Grooper Wiki

This glossary seeks to educate readers on various Grooper terms, objects and other entities. Glossary entries will be short paragraphs describing the topic. For each glossary entry, you will find links to a full article about the entry as well as articles on associated terms.

Each entry is organized according to what major Grooper entity they belong to. For example, "Classify" is an Activity. It is found in the "Activity" section of the Glossary.

Activity

Activity is a property on edit_document Batch Process Steps. Activities define specific document processing operations done to a inventory_2 Batch, folder Batch Folder, or contract Batch Page. Batch Process Steps configured with specific Activities are frequently referred by the name of the Activity followed by the word "step". For example: Classify step.

AI Dialogue

network_intelligence_update AI Dialogue is an Activity that executes a scripted conversation with an psychology AI Analyst and saves the resulting conversion on the document as a JSON file.

Apply Rules

flowsheet Apply Rules is an Activity that runs flowsheet Data Rules on data that has already been extracted from a inventory_2 Batch. A edit_document Batch Process Step configured with the Apply Rules Activity will always need to be preceded by a Batch Process Step configured with the export_notes Extract Activity.

Classify

unknown_document Classify is an Activity that "classifies" folder Batch Folders in a inventory_2 Batch by assigning them a Content Type (e.g. a description Document Type) using patterns, lexical understanding, or rules as defined by a stacks Content Model.

Clip Frames

view_module Clip Frames is an Activity that extracts defined areas from microfiche card images, creating new image frames or layers for focused analysis or processing.

Correct

abc Correct is an Activity that performs spell correction. It can correct a folder Batch Folder's text content or specific Data Element values to resolve OCR errors, deidentify data or otherwise enhance text data.

Detect Frames

view_module Detect Frames is an Activity that locates and identifies frame lines on microfiche card images, enabling the isolation of areas within the frames for further data extraction or processing.

Execute

tv_options_edit_channels Execute is an Activity that runs one or more specified object commands. This gives access to a variety of Grooper commands in a settings Batch Process for which there is no Activity, such as the "Sort Children" command for Batch Folders or the "Expand Attachments" command for email attachments.

Export

output Export is an Activity that transfers documents and extracted information to external file systems and content management systems, completing the data processing workflow.

Extract

export_notes Extract is an Activity that retrieves information from folder Batch Folder documents, as defined by Data Elements in a data_table Data Model. This is how Grooper locates unstructured data on your documents and collects it in a structured, usable format.

Image Processing

wallpaper Image Processing is an Activity that enhances contract Batch Page images and optimizes them for better OCR text recognition and data extraction results.

Initialize Card

view_module Initialize Card is an Activity prepares and configures microfiche card images for further processing.

Merge

file_save Merge is an Activity that creates a PDF, TIF, XML or ZIP file from the page and data content of a Batch Folder and saves it to that Batch Folder.

Recognize

format_letter_spacing_wide Recognize is an Activity that obtains machine-readable text from contract Batch Pages and folder Batch Folders. Recognize will selectively perform OCR for images and native-text extraction for digital text in PDFs. Recognize can also be configured to collect "layout data" like lines, checkboxes, and barcodes. Various other Activities then use this machine-readable text and layout data for document analysis and data extraction.

Redact

format_ink_highlighter Redact is an Activity that visibly obscures (or "redacts") text information on an page based on results returned from a extractor. Be aware, Redact does not alter the text data. It only alters the image.

Render

print Render is an Activity that converts files of various formats to PDF. It does this by digitally printing the file to PDF using the Grooper Render Printer. This normalizes electronic document content from file formats Grooper cannot read natively to PDF (which it can read natively), allowing Grooper to extract the text via the format_letter_spacing_wide Recognize Activity.

Review

person_search Review is an Activity that allows user attended review of Grooper's results. This allows human operators to validate processed contract Batch Page and folder Batch Folder content using specialized user interfaces called "Viewers". Different kinds of Viewers assist users in reviewing Grooper's image processing, document classification, data extraction and operating document scanners.

Send Mail

forward_to_inbox Send Mail is an Activity automates email notifications from Grooper based on events and conditions set by a settings Batch Process. Optionally, documents in the inventory_2 Batch may be attached to the generated email.

Separate

insert_page_break Separate is an Activity that sorts contract Batch Pages into individual folder Batch Folders. This distinguishes "loose pages" from the documents formed by those pages. Once loose pages are separated into Batch Folder documents, they can be further processed by unknown_document Classify, export_notes Extract, output Export and other Activities that need to run on the folder (i.e. document) level.

Split Pages

Multi-page PDF and TIF files come into Grooper as files attached to single folder Batch Folders. Split Pages is an Activity that creates child contract Batch Pages for each page in the PDF or TIF. This allows Grooper to process and handle these pages as individual objects.

XML Transform

code_blocks XML Transform is an Activity that applies XSLT stylesheets to XML data to modify or reformat the output structure for various purposes.

Application

Grooper is an intelligent document processing platform that uses a wide array of sophisticated techniques to automate end-to-end content capture. From a technical standpoint, Grooper consists of a Grooper Repository and the applications that the support management and execution of configuration assets. A Grooper Repository consists of two things: (1) A series of tables in a SQL database (containing configuration nodes and their properties) and (2) a File Store (containing files associated to nodes in the database). The Grooper application is the interface by which a user can interact with that repository in an intuitive way.

Grooper Command Console

Grooper Command Console is a command-line interface that performs system configuration and administration tasks within Grooper.

Grooper Web Client

The Grooper user interface is accessed using a web browser from a URL. The Grooper Web Client is the application that installs the Grooper website on a web server.

Behavior

Behaviors refer a group of functionality configured using a Content Type's Behaviors property. Behaviors enable different features for how documents of a specific Content Type are processed and define their settings. This includes how they are exported, if Label Sets are used for the Document Type and more.

Export Behavior

An Export Behavior defines the parameters for exporting classified folder Batch Folder content from Grooper to other systems. This includes where they are exported to (what content management system, file system, database etc), what content is exported (attached files, images, and/or data), how it is formatted (PDF, CSV, XML etc), folder pathing, file naming and data mappings (for Data Export and CMIS Export).

Import Behavior

An Import Behavior defines how data is mapped from files in an external content management system to Batch Folders created on import when using CMIS Import.

Indexing Behavior

An Indexing Behavior is a Content Type Behavior designed to enable the ability for inherited Content Types to be indexed via the AI Search functionality.

Labeling Behavior

A Labeling Behavior is a Content Type Behavior designed to collect and utilize a document's field labels in a variety of ways. This includes functionality for classification, field extraction, table extraction, and section extraction.

PDF Data Mapping

PDF Data Mapping is a Content Type Behavior designed to enhance PDF files generated by the Merge or Export activities with metadata, bookmarks, annotations and/or different kinds of widgets.

CMIS Connection Type

A CMIS Connection Type is defined when creating a cloud CMIS Connection. The CMIS Connection Type (formally CMIS Binding) establishes the communication protocols used to connect Grooper with content management systems (CMS) adhering to the CMIS standard. Even when connecting to CMS platforms that are not true CMIS systems, Grooper normalizes connection to them as if they were. This allows Grooper to use CMIS Import and CMIS Export for all content management systems.

AppXtender

AppXtender is a CMIS Connection Type that connects Grooper to the AppEnhancer (formerly ApplicationXtender) content management system for import and export operations.

Box

Box is a CMIS Connection Type that connects Grooper to the Box content management system for import and export operations.

Exchange

Exchange is a CMIS Connection Type that connects Grooper to Microsoft Exchange email servers (including Outlook servers) for import and export operations.

FTP

FTP is a CMIS Connection Type that connects Grooper to FTP directories for import and export operations.

IMAP

IMAP is a CMIS Connection Type that connects Grooper to email messages and folders through an IMAP email server for import and export operations.

NTFS

NTFS is a CMIS Connection Type that connects Grooper to files and folders in the Microsoft Windows NTFS file system for import and export operations.

OneDrive

OneDrive is a CMIS Connection Type that connects Grooper to Microsoft OneDrive cloud services for import and export operations.

SFTP

SFTP is a CMIS Connection Type that connects Grooper to SFTP directories for import and export operations.

SharePoint

SharePoint is a CMIS Connection Type that connects Grooper to Microsoft SharePoint, providing access to content stored in "document libraries" and "picture libraries" for import and export operations.

Classification Method

A stacks Content Model's Classification Method property determines the technique used for document classification. Classification sorts folder Batch Folders into categories (called "description Document Types"). Grooper's various Classification Methods can utilize text-based pattern matching, machine learning models, or other methodologies to identify and organize documents accurately.

GPT Embeddings

GPT Embeddings is a Classification Method that uses an OpenAI embeddings model and trained document samples to tell one document from another.

Labelset-Based

Labelset-Based is a Classification Method that leverages the labels defined via a Labeling Behavior to classify folder Batch Folders.

Lexical

The Lexical Classification Method classifies folder Batch Folders based on the text content of trained document examples. This is achieved through the statistical analysis of word frequencies that identify description Document Types.

Rules-Based

The Rules-Based Classification Method employs "rules" defined on each description Document Type to classify folder Batch Folders. Positive Extractor and Negative Extractor properties are configured for each Document Type to positively or negatively associate a Batch Folder based on predefined criteria.

Note where the Positive and Negative Extractors will impact all Classification Method results, the Rules-Based method classifies using only these properties and nothing else.

Visual

The Visual Classification Method uses image analysis instead of text data to determine the description Document Type assigned to a folder Batch Folder during classification. Instead of using text-based extractors, an "Extract Features" IP Command in an perm_media IP Profile is used to collect image-based data from a Batch Folder's image(s). This image-based data is compared against that of previously trained document examples of each Document Type to classify the Batch Folder.

Collation Provider

The Collation property of a pin Data Type defines the method for converting its raw results into a final result set. It is configured by selecting a Collation Provider. The Collation Provider governs how initial matches from the Data Type's extractor(s) are combined and interpreted to produce the Data Type's final output.

AND

AND is a Collation Provider option for pin Data Type extractors. AND returns results only when each of its referenced or child extractors gets at least one hit, thus acting as a logical “AND” operator across multiple extractors.

Array

Array is a Collation Provider option for pin Data Type extractors. Array matches a list of values arranged in horizontal, vertical, or text-flow order, combining instances that qualify into a single result.

Combine

Combine is a Collation Provider option for pin Data Type extractors. Combine combines instances from returned results based on a specified grouping, controlling how extractor results are assembled together for output.

Key-Value List

Key-Value List is a Collation Provider option for pin Data Type extractors. Key-Value List matches instances where a key and a list of one or more values appear together on the document, adhering to a specific layout pattern.

Key-Value Pair

Key-Value Pair is a Collation Provider option for pin Data Type extractors. Key-Value Pair matches instances where a key is paired with a value on the document in a specific layout. Note: Key-Value Pair is an older technique in Grooper. In most cases, the Labeled Value extractor type is preferable to Key-Value Pair collation.

Multi-Column

Multi-Column is a Collation Provider option for pin Data Type extractors. Multi-Column combines multiple columns on a page into a single column for extraction.

Ordered Array

Ordered Array is a Collation Provider option for pin Data Type extractors. Ordered Array finds sequences of values where one result is present for each extractor, in the order they appear, according to a specified horizontal, vertical or text-flow layout.

Pattern-Based

Pattern-Based is a Collation Provider option for pin Data Type extractors. Pattern-Based uses regular expressions to sequence returned results into a final result set.

Split

Split is a Collation Provider option for pin Data Type extractors. Split separates a data instance at each match returned by the Data Type. The results are used as anchor points to "split" text into one or more smaller parts.

Concept

There are many objects and properties a user can configure in Grooper, however, gaining an understanding how, why, and when to use these objects and properties is powered by one's understanding of the underlying concepts that define what what these objects and properties are doing and why.

Activity Processing

Activity Processing is the execution of a sequence of configured tasks which are performed within a settings Batch Process to transform raw data from documents into structured and actionable information. Tasks are defined by Grooper Activities, configurated to perform document classification, extraction, or data enhancement.

CMIS+

CMIS+ is a conceptual term that refers to Grooper's connectivity architecture to external storage platforms. CMIS+ standardizes connections to a variety of content management system based on the CMIS standard. This provides a standardized setup to allow Grooper to interoperate with both CMIS compliant systems and non-CMIS systems. It further provides normalized access to document content and metadata for import (CMIS Import) and export (CMIS Export) operations.

CMIS

CMIS (Content Management Interoperability Services) is open standard allowing different content management systems to "interoperate", sharing files, folders and their metadata as well as programmatic control of the platform over the internet.

CMIS Query

A CMIS Query (aka CMISQL Query) is Grooper's way of searching for documents in CMIS Repositories and filtering them upon import when using the Import Query Results Import Provider. CMIS queries are based on a subset of the SQL-92 syntax for querying databases, with some specialized extensions added to support querying CMIS sources.

CSS Data Viewer Styling

CSS Data Viewer Styling refers to using CSS to custom style the Review activity's Data Viewer interface. This gives you a great deal of control over a data_table Data Model's appearance and layout during document review.

Classification

Classification is the process of identifying and organizing documents into categorical types based on their content or layout. Classification is key for efficient document management and data extraction workflows. Grooper has different methods for classifying documents. These include methods that use machine learning and text pattern recognition. In a Grooper Batch Process, the Classify Activity will assign a Content Type to a folder Batch Folder.

Code Expressions

Code Expressions (not to be confused with regular expressions) are snippets of VB.NET code that expand Grooper's core functionality.

Content Type

Content Type refers to objects in Grooper used to classify folder Batch Folders. These include: stacks Content Models, collections_bookmark Content Categories, and description Document Types.

Data Context

Data Context refers to contextual information used to extract data, such as a label that identifies the value you want to collect.

Data Element

Data Element refers to the objects in Grooper used to collect data from a document. These include: data_table Data Models, insert_page_break Data Sections, variables Data Fields, table Data Tables, and view_column Data Columns.

Data Extraction

Data Extraction involves identifying and capturing specific information from documents (represented by folder Batch Folders in Grooper). Extraction is performed by configurable Data Extractors, which transform unstructured or semi-structured data into a structured, usable format for processing and analysis.

Data Extractor

Data Extractor (or just "extractor") refers to all Extractor Types and extractor node objects. Extractors define the logic used to return data from a document's text content, including general data (such as a date) and specific data (such as an agreement date on a contract).

Data Instance

A Data Instance is an encapsulation of text data within a document returned by Grooper's extractors. Data instances are the hierarchy of text data created by Grooper's extractors.

Expressions

Expressions (not to be confused with regular expressions) are snippets of VB.NET code that expand Grooper's core functionality.

Expressions Cookbook

The "Expressions Cookbook" is a reference list for commonly used Code Expressions in Grooper.

Field Mapping

Field Mapping refers to how logical connections are made between metadata content in Grooper and an external storage platform.

Five Phases of Grooper

The "Five Phases of Grooper" is a conceptual term that seeks to build understanding of how documents are processed through Grooper.

Flow Collation

"Flow Collation" refers to the text-flow based layout option used by various Collation Providers forpin Data Type extractors.

Footer Rows and Footer Modes

"Footer Rows and Footer Modes" refers to how a table Data Table's "footer row" provides Grooper users a quick way to validate numerical data in a view_column Data Column. The Data Column's Footer Mode property controls if and how a total is determined for numerical values in a Data Column.

Fuzzy RegEx

Fuzzy RegEx is Grooper's use of fuzzy logic within Extractor Types that leverage regular expressions to match patterns. Fuzzy RegEx allows extractors to overcome defects in a document's OCR results to accurately return results. Fuzzy RegEx is enabled by enabling the Fuzzy Matching property.

GPT Integration

Grooper's GPT Integration is refers to the usage of OpenAI's GPT models within Grooper to enhance the capabilities of data extractors, classification, and lookups.

Grooper Infrastructure

Grooper Infrastructure refers to the computing underpinnings of what makes up a Grooper Repository and the software that allows the Grooper platform to automate tasks and users to interface with it.

Grooper Repository

A Grooper Repository is the environment used to create, configure and execute objects in Grooper. It provides the framework to "do work" in Grooper. Fundamentally, a Grooper Repository is a connection to a database and file store location, which store the node configurations and their associated file content. The Grooper application interacts with the Grooper Repository to automate tasks and provide the Grooper user interface.

Grooper Service

Grooper Services are various executable programs that run as a Windows Services to facilitate Grooper processing. Service instances are installed, configured, started and stopped using Grooper Command Console (Or in older Grooper versions, Grooper Config).

Image Processing

Image Processing, as a conceptual term, refers to how Grooper applies a variety of techniques to enhance scanned documents' quality. Image processing improves OCR accuracy by removing imperfections and adjusting visual characteristics to prepare images for data extraction and classification. Image processing may be applied permanently to an image by the Image Processing Activity or temporarily prior to OCR by the Recognize activity.

Import Mode and Document Linking

Import Mode and Document Linking refers to the usage of the Import Mode property. This affects whether or not an imported document maintains a link to its original file and/or if a copy of the file is made on import or not.

LINQ to Grooper Objects

LINQ is Microsoft .NET component that provides data querying capabilities to the .NET framework. In Grooper, you can use the LINQ syntax in Code Expressions to "LINQ to Grooper Objects". This allows expressions to access information from collections of data, such as from multi-instance Data Sections or Data Tables.

Layout Data

Layout Data refers to visual information such as line locations, OMR checkbox locations and states, barcode values, and detected shapes captured by certain image processing commands. This data is stored in a "Grooper.LayoutData.json" file and attached to a folder Batch Folder or contract Batch Page object. This data can later be recalled by Grooper extractors and other functionality that rely on the presence of that data to function.

Microfiche Processing

Microfiche Processing refers to Grooper's suite of specialized Activities and IP Commands that process microfiche documents.

Microsoft Office Integration

Grooper's Microsoft Office Integration allows the platform to easily convert Microsoft Word and Microsoft Excel files into formats that Grooper can read natively (PDF and CSV).

Mixed Classification

"Mixed Classification" refers to leveraging a Classification Methods and "rules" defined on a description Document Type to overcome the shortcomings of an individual method.

OCR

OCR is stands for Optical Character Recognition. It allows text on paper documents to be digitized, in order to be searched or edited by other software applications. OCR converts typed or printed text from digital images of physical documents into machine readable, encoded text.

OCR Synthesis

OCR Synthesis refers to a suite of OCR related functionality unique to Grooper. The OCR Synthesis suite will pre-process and re-process raw results from the OCR Engine and synthesize its results into a single, more accurate OCR result.

Object Nomenclature

The Grooper Wiki's Object Nomenclature defines how Grooper users categorize and refer to different types of Node Objects in a Grooper Repository. Knowing what objects can be added to the Grooper Node Tree and how they are related is a critical part of understanding Grooper itself.

PDF Page Types

PDF pages can be one of several PDF Page Types. "Page types" describe the kind of content in a PDF page. This informs Grooper how certain Activities should process the page. For example, "single image" pages are OCR'd by the Recognize activity, where "text only" pages have their native text extracted by Recognize.

Prompt Engineering

"Prompt Engineering" is the process of designing and refining prompts to interact more effectively with large language models (LLMs) like GPT-4. The goal is to guide the model to produce desired outputs by carefully crafting the input queries.

Regular Expression

Regular Expression (or regex) is a standard syntax designed to parse text strings. This is a way of finding information in text. It is the primary method by which Grooper extracts and returns data from documents.

Separation

Separation is the process of taking an unorganized inventory_2 Batch of loose contract Batch Pages and organizing them into documents represented by folder Batch Folders in Grooper. This is done so Grooper can later assign a description Document Type to each document folder in a process known as "classification".

TF-IDF

TF-IDF stands for term frequency-inverse document frequency. It is a statistical calculation intended to reflect how important a word is to a document within a document set (or "corpus"). It is how Grooper uses machine learning for training-based document classification (via the Lexical method) and data extraction (via the input Field Class extractor).

Table Extraction

"Table Extraction" refers to Grooper's ability to extract data from cells in tables on documents. This is accomplished by configuring the table Data Table and its child view_column Data Column elements in a data_table Data Model.

Test Batch

a "Test Batch" refers to any inventory_2 Batch created in the Test folder of the Batches folder in the Node Tree. Test Batches are used to test various configurations in Grooper, such as Batch Process Step configurations and Data Model configurations.

Thread

A Thread is the smallest unit of processing that can be performed within an operating system. In Grooper, threads are allocated for processing by Activity Processing services.

Training-Based Approaches to Document Classification

"Training-Based Approaches to Document Classification" refers to Grooper Classification Methods that classify folder Batch Folders using document examples for each description Document Type. The Classify activity then assigns unclassified Batch Folders a Document Type based on how similar it is to the Document Type's training data.

Training Batch

The Training Batch is a special inventory_2 Batch created when training document examples using the Lexical classification method. The Training Batch service two purposes: (1) It is a Batch that holds all previously trained folder Batch Folders. Designers can go to this Batch to view these documents and copy and paste them into other Batches if needed. (2) Batch Folders in the Training Batch will be used to re-train the Content Model's classification data when the Rebuild Training command is executed.

UNC Path

UNC Path is a conceptual term that refers to UNC (Universal Naming Convention) which is a standard used in Microsoft Windows for accessing shared network folders.

Waterfall Classification

Waterfall Classification is a classification technique in Grooper that prioritizes training similarity over classification "rules" set by a description Document Type's Positive Extractor. This can be helpful in scenarios where folder Batch Folders get misclassified and simply retraining won't help.

Export Definition

Export Behaviors are defined by adding and configuring one or more Export Definitions. An Export Definition defines export parameters to external systems, such as file systems, content management repositories, databases, or mail servers.

CMIS Export

CMIS Export is an Export Definition available when configuring an Export Behavior. It exports content over a cloud CMIS Connection, allowing users to export documents and their metadata to various on-premise and cloud-based storage platforms.

Data Export

Data Export is an Export Definition available when configuring an Export Behavior. It exports extracted document data over a database Data Connection, allowing users to export data to a Microsoft SQL Server or ODBC compliant database.

Extractor Type

An Extractor Type (shorthand for Value Extractor Type) is configured for numerous properties on a wide array of Grooper objects. They are used to return "data instances" from documents for one purpose or another. The Extractor Type defines an operation that reads data from the text or visual content of a document and returns one or more results. Each different Extractor Type uses a specialized logic to return results. Extractor Types are consumed by higher-level objects such as Data Elements, extractor objects, Content Types and more.

Ask AI

Ask AI is an Extractor Type that executes a chat completion using a large language model (LLM), such as OpenAI's GPT models. It uses a document's text content and user-defined instructions (a question about the document) in the chat prompt. Ask AI then returns the response as the extractor's result. Ask AI is a powerful, LLM based extraction method, that can be used anywhere in Grooper an Extractor Type is referenced. It can complete a wide array of tasks in Grooper with simple text prompts.

Detect Signature

Detect Signature is an Extractor Type that cant detect if a handwritten signature is present on a document. It detects signatures within a specified rectangular region on a document page by measuring the "fill percentage" (what percentage of pixels are filled in the region).

Field Match

Field Match is an Extractor Type that matches the value stored in a previously-extracted variables Data Field or view_column Data Column.

Find Barcode

Find Barcode is an Extractor Type that searches for and returns barcode values previously stored in a folder Batch Folder or contract Batch Page's layout data.

Note: Find Barcode differs slightly from Read Barcode. Read Barcode performs barcode recognition when the extractor executes. Find Barcode can only look up barcode data stored in the document or page's layout data. Find Barcode runs quicker than Read Barcode, but barcode values must have previously been collected in the Batch Process by the Image Processing or Recognize activities.

GPT Complete

GPT Complete is an Extractor Type that leverages Open AI's GPT models to generate chat completions for inputs, returning one hit for each result choice provided by the model's response.

PLEASE NOTE: GPT Complete is a deprecated extractor type. It uses an outdated method to call the OpenAI API. Please use the Ask AI extractor type going forward.

Highlight Zone

Highlight Zone is an Extractor Type that sets a highlight region on a document without performing any actual data extraction. This "extractor" is used to mark areas of interest or importance for Review users or for uncommon scenarios where a data instance location is needed with no actual value.

Label Match

Label Match is an Extractor Type that matches a list of one or more values using matching options defined by a Labeling Behavior. It is similar to List Match but uses shared settings defined in a Labeling Behavior for Fuzzy Matching, Vertical Wrap, and Constrained Wrap.

Labeled OMR

Labeled OMR is an Extractor Type used to output OMR checkbox labels. It determines whether labeled checkboxes are checked or not. If checked, it outputs the label(s) or a Boolean true/false value as the result.

Labeled Value

Labeled Value is an Extractor Type that identifies and extracts a value next to a label. This is one of the most commonly used extractors to extract data from structured documents (such as a standardized form) and static values on semi-structured documents (such as the header details on an invoice).

List Match

List Match is an Extractor Type designed to return values matching one or more items in a defined list. By default, the List Match extractor does not use or require regular expression, but can be configured to utilize regular expression syntax.

Ordered OMR

Ordered OMR is an Extractor Type used to return OMR check box information. Ordered OMR returns information for multiple check boxes within a defined zone based on their order and layout. The zone may be optionally fixed on the page or anchored to a static text value (such as a label).

Pattern Match

Pattern Match is an Extractor Type that extracts values from a document that match a specified regular expression, providing data collection following a known format or pattern.

Query HTML

Query HTML is an Extractor Type specialized for HTML documents. It uses either CSS or XPath selectors to return the inner text or an attribute of an HTML element.

Read Barcode

Read Barcode is an Extractor Type that uses barcode recognition technology to read and extract values from barcodes found in the document content.

Note: Read Barcode differs slightly from Find Barcode. Read Barcode performs barcode recognition when the extractor executes. Find Barcode can only look up barcode data stored in the document or page's layout data. Find Barcode runs quicker than Read Barcode, but barcode values must have previously been collected in the Batch Process by the Image Processing or Recognize activities.

Read Meta Data

Read Meta Data is an Extractor Type retrieves metadata values associated with a document. Read Meta Data can return metadata from a folder Batch Folder's attachment file based on its MIME type, such as PDF, Word and Mail Message ('message/rfc822' or 'application/vnd.ms-outlook'). It can also return data using a Document Link in Grooper, such as a File System Link or a CMIS Document Link.

Read Zone

Read Zone is an Extractor Type that allows you to extract text data in a rectangular region (called an "extraction zone" or just "zone") on a document. This can be a fixed zone, extracting text from the same location on a document, or a zone relative to a text value (such as a label) or a shape location on the document.

Reference

Reference is an Extractor Type used to reference an external extractor object within a Grooper property configuration. This allows users to create re-usable extractors and use the more complex pin Data Type and input Field Class extractors throughout Grooper.

Word Match

Word Match is an Extractor Type that extracts individual words or phrases from documents. It used for n-gram extraction. Each gram may be optionally executed against a dictionary Lexicon to ensure words and phrases only match a set vocabulary.

Zonal OMR

Zonal OMR is an Extractor Type that reads one or more OMR checkboxes using manually-configured zones. The zone may be optionally fixed on the page or anchored to a static text value (such as a label).

BE AWARE: Zonal OMR is outdated compared to Labeled OMR and Ordered OMR. It requires the most manual setup of any OMR extractor to configure. Use this as a last resort when other OMR extractor options have been exhausted.

Fill Method

Fill Method is a configurable property for data_table Data Models, insert_page_break Data Sections, and table Data Tables (aka "container elements" or "containers"). Fill Methods provide various mechanisms for populating these containers' child Data Elements. Fill Methods are secondary extraction operations. They populate descendant Data Elements after normal extraction during the Extract step.

AI Extract

AI Extract is a Fill Method that leverages a Large Language Model (LLM) to return extraction results to Data Elements in a data_table Data Model or insert_page_break Data Section. This mechanism provides powerful AI-based data extraction with minimal setup.

Functionality

AI Search

AI Search enables Grooper's document search and retrieval features in the Search page. It provides the framework to create document search indexes by Content Type and submit documents to an index. Once indexed, documents can be retrieved by full text searches in the Search Page with feature rich querying and filtering capabilities. Once retrieved, users can view documents in the Search page, download the results, or submit documents for further processing in Grooper.

AI Generator

AI Generators create custom documents using the results of a Search Page query and a large language model (LLM). Both document content and instructions are fed to the LLM to produce a text-based file.

EDI Integration

EDI Integration refers to Grooper's ability to process EDI files.

XML Schema Integration

XML Schema Integration refers to Grooper's ability to use XML schemas to build Data Models, extract XML documents, and more.

IP Command

IP Commands specify an Image Processing (IP) operation (such as image cleanup, format conversion or feature detection) and are used to construct image IP Steps in an IP Profile. IP Commands are configured using an IP Step's Command property.

Barcode Detection

Barcode Detection is an IP Command that detects and reads barcode data. The detected barcode information is stored as part of the object's layout data.

Binarize

Binarize is an IP Command that converts a color or grayscale image to a bi-tonal (black and white) image using various thresholding methods.

Extract Page

Extract Page is an IP Command that removes an image from a carrier image while simultaneously removing any image warping or skewing.

Line Removal

Line Removal is an IP Command that removes horizontal and vertical lines from documents. The line locations are then stored as part of the object's layout data.

Scratch Removal

Scratch Removal is an IP Command detects and removes or repairs scratches from film-based images.

Shape Detection

Shape Detection is an IP Command that detects shapes on a document matching sample images given by the user. The shape locations are then stored as part of the object's layout data

Shape Removal

Shape Removal is an IP Command detects and removes shapes from documents. The shape locations are then stored as part of the object's layout data

Import Provider

Import Providers enable Grooper to import file-based content from a variety of sources, such as file systems, mail servers, and content repositories. An Import Provider is selected and configured when configuring Import Jobs. Ad-hoc or "user directed" Import Jobs are submitted from the Imports Page, using the "Submit Import Job" button. Automated or "scheduled" Import Jobs are submitted by an Import Watcher service according to its Poling Loop or Specific Times specification. In all cases, the Import Provider is selected using the Provider property.

CMIS Import

CMIS Import refers to two Import Providers used to import content over a cloud CMIS Connection: Import Descendants and Import Query Results. CMIS Imports allow users to import from various on-premise and cloud based storage platforms.

Import Descendants

Import Descendants is one of two Import Providers that use cloud CMIS Connections to import document content into Grooper. Import Descendants imports files or folders in a settings_system_daydream CMIS Repository folder location, including any files or folders in any sub-folders (i.e. "descendant" files or folders).

Import Query Results

Import Query Results is one of two Import Providers that use cloud CMIS Connections to import document content into Grooper. Import Query Results imports files or folders in a settings_system_daydream CMIS Repository that match a "CMISQL query" (a specialized query language based on SQL database queries).

Lookup Specification

A Lookup Specification defines a "lookup operation", where existing Grooper fields (called "lookup fields") are used to query an external data source, such as a database. The results of the lookup can be used to validate or populate field values (called "target fields") in Grooper. Lookup Specifications are created on "container elements" (data_table Data Models, insert_page_break Data Sections and table Data Tables) using their Lookups property. Lookups may query using all single-instance fields relative to the container element (including those defined on parent elements up to the root Data Model), but cannot be used to populate a field value on a parent of the container element.

CMIS Lookup

CMIS Lookup is a Lookup Specification that performs a lookup against a settings_system_daydream CMIS Repository via a "CMISQL query" (a specialized query language based on SQL database queries).

Database Lookup

Database Lookup is a Lookup Specification that performs a lookup against a database Data Connection via a SQL query.

GPT Lookup

GPT Lookup is a Lookup Specification that performs a lookup using an OpenAI GPT model. PLEASE NOTE: GPT Lookup should be considered a "beta" feature. It was implemented as a prototype and has not been extensively tested.

Web Service Lookup

Web Service Lookup is a Lookup Specification that looks up external data at an API endpoint by calling a web service.

Object

In Grooper, objects are defined as configurable elements within its hierarchical tree structure. These include nodes and embedded objects that can be manipulated and edited to define the system's behavior, create workflows, and manage content.

AI Analyst

AI Analyst is an object in Grooper that facilitates the ability to interact with a document as you might with an AI chatbot.

AI Assistant

AI Assistants are Grooper's conversational AI personas. They answer questions about resources it can access (including content from documents, databases and/or web services). This greatly increases an AI's ability to answer domain-specific questions that require access to these resources.

Batch

inventory_2 Batch objects are fundamental in Grooper's architecture as they are the containers of documents that get moved through Grooper's workflow mechanisms known as settings Batch Processes.

Batch Folder

folder Batch Folder objects are defined as container objects within a inventory_2 Batch that are used to represent and organize both folders and pages. They can hold other Batch Folders or contract Batch Page objects as children. The Batch Folder acts as an organizational unit within a Batch, allowing for a structured approach to managing and processing a collection of documents.

  • Batch Folders are frequently referred to simply as "documents".

Batch Page

contract Batch Page objects represent individual pages within a inventory_2 Batch. The Batch Page object is the most granular unit in the hierarchy of Batch Objects in Grooper.

  • Batch Pages are frequently referred to simply as "pages".

Batch Process

settings Batch Process objects are crucial components in Grooper's architecture. A Batch Process orchestrates the document processing strategy and ensures each inventory_2 Batch of documents is managed systematically and efficiently.

  • Batch Processes by themselves do nothing. Instead, the workflows they execute are designed by adding child edit_document Batch Process Steps.
  • A Batch Process is often referred to as simply a "process".

Batch Process Step

edit_document Batch Process Step objects are specific actions within the sequence defined by a settings Batch Process. A Batch Process Step plays a critical role in automating and managing the flow of documents through the various stages of processing within Grooper.

  • Batch Process Steps are frequently referred to as simply "steps".
  • Because a single Batch Process Step executes a single Activity configuration, they are often referred to by their referenced Activity as well. For example, a "Recognize step".

CMIS Connection

cloud CMIS Connection node objects provide a standardized way of connecting to various content management systems (CMS). These objects allow Grooper to communicate with multiple external storage platforms, enabling access to documents and content that reside outside of Grooper's immediate environment.

  • For those that support the CMIS standard, the CMIS Connection connects to the CMS using the CMIS standard.
  • For those that do not, the CMIS Connection normalizes connection and transfer protocol as if they were a CMIS platform.

CMIS Repository

settings_system_daydream CMIS Repository node objects in Grooper allow access to external documents through a cloud CMIS Connection. They allows managing and interacting with those documents within Grooper's framework as if they were local. They are created as a child object of a CMIS Connection and used for various Activities.

Content Category

collections_bookmark Content Category node objects are containers within a stacks Content Model that hold other Content Categories and description Document Type objects. They allow for further classification and grouping of Document Types within a taxonomy, aiding in the logical structuring of complex document sets. Besides grouping Document Types together, Content Categories also serve to create new branches in a Data Element hierarchy. In most cases Content Categories are used as organizational buckets to group like Document Types together.

Content Model

stacks Content Model node objects define the taxonomy of document sets in terms of the description Document Type they contain. They also house the Data Elements that appear on each collections_bookmark Content Category and Document Type within them. Content Models serve as the root of a Content Type hierarchy and are crucial for organizing the different types of documents that Grooper can recognize and process.

Data Column

view_column Data Column node objects are child objects of a table Data Table, representing individual columns and defining the type of data each column holds along with its data extraction properties.

Data Connection

database Data Connection node objects define the settings for connecting to and interacting with a database. These interactions may include conducting lookups, exports, or other actions that relate to database management systems (DBMS). Once configured, a Data Connection object can be referenced by other components in Grooper for various DBMS-related activities.

Data Field

variables Data Field node objects are created as child objects of a data_table Data Model. A Data Field is a representation of a single piece of data targeted for extraction on a document.

Data Fields are frequently referred to simply as "fields".

Data Model

data_table Data Model node objects serve as the top-tier structure defining the taxonomy for Data Elements and are leveraged during the Extract Activity to extract data from a folder Batch Folders. They are a hierarchy of Data Elements that sets the stage for the extraction logic and review of data collected from documents.

Data Rule

flowsheet Data Rule objects define the logic for automated data manipulation which occurs after data has been extracted from folder Batch Folders. These rules are applied to normalize or otherwise prepare data collected in a data_table Data Model for downstream processes. Data Rules ensure that extracted data conforms to expected formats or meets certain quality standards.

Data Section

insert_page_break Data Section objects are grouping mechanisms for related variables Data Fields. Data Sections organize and segment child Data Elements into logical divisions of a document based on the structure and semantics of the information the documents contain.

Data Table

table Data Table objects are utilized for extracting repeating data that's formatted in rows and columns, allowing for complex multi-instance data organization that would be present in table-formatted content.

Data Type

pin Data Type objects hold a collection of child, referenced, and locally defined Data Extractors and settings that manage how multiple (even differing) matches from Data Extractors are consolidated (via Collation) into a result set.

Document Type

description Document Type objects represent a distinct type of document, like an invoice or contract. Document Types are created as children of a stacks Content Model or a collections_bookmark Content Category and are used to classify individual folder Batch Folders. Each Document Type in the hierarchy defines the Data Elements and Behaviors that apply to Batch Folders of that specific classification.

Field Class

input Field Class node objects are used to find values based on some natural language context near that value. Values are positively or negatively associated with text-based "features" nearby by training the extractor. During extraction, the extractor collects values based on these training weightings.

  • Field Classes are most useful when attempting to find values within the flow of natural language.
  • Field Classes can be configured to distinguish values within highly structured documents, but this type of extraction is better suited to simpler "Extractor Objects" like quick_reference_all Value Readers or pin Data Types.

File Store

hard_drive File Store objects define a storage location within Grooper where file content associated with nodes are saved. They are crucial for managing the content that forms the basis of the Grooper's processing tasks, allowing for the storage and retrieval of documents, images, and other "files". Not every object in Grooper will have files connected to it, but if it does, those files are stored in the location defined by this object.

Form Type

two_pager Form Type objects represent trained variations of a description Document Type. These objects store machine learning training data for Lexical and Visual document classification methods.

IP Group

gallery_thumbnail IP Group objects are child objects within perm_media IP Profiles that create a hierarchical structure for organizing image processing commands. IP Groups may contain other IP Groups or image IP Step objects.

IP Profile

perm_media IP Profile node objects detail the operations and parameters for image enhancement and cleanup. These operations improve the accuracy of further processing steps, like the Recognize and Classify Activities.

IP Step

image IP Step node objects are the basic units within an perm_media IP Profile that define a single image processing operation. IP Steps are performed sequentially within their parent gallery_thumbnail IP Group or IP Profile.

Lexicon

dictionary Lexicon node objects are dictionary objects that store a list of keys or key-value pairs. Lexicons can define local entries and/or import entries from other Lexicons and even import entries using a Data Connection. The entries in a Lexicon can be utilized in different areas of Grooper, such as data extraction, fuzzy matching, or OCR correction, providing a reference point that enhances the accuracy and consistency of the software's operations.

Machine

computer Machine node objects represent servers that have connected to the Grooper Repository. They allow for the management of Grooper Service instances and serve as a connection points for processing jobs to be executed on the server hardware. Machine objects are essential for the scaling of processing capabilities and for distributing processing loads across multiple servers.

OCR Profile

library_books OCR Profile node objects configure the settings for optical character recognition (OCR) leveraged by the Recognize activity. OCR converts images of text into machine-encoded text. OCR Profile objects influence how effectively text content is recognized and from contract Batch Pages.

Object Library

extension Object Library node objects are .NET libraries that contain code files for customizing the functionality of Grooper. These libraries are used for a range of customization and integration tasks, allowing users to extend Grooper's capabilities.

Examples include:

Processing Queue

memory Processing Queue node objects are designed for tasks performed by computer Machines and their Activity Processing services. Processing Queues are used to distribute Grooper "Code Activity" tasks among different servers and control the concurrency and/or processing rate of these tasks.

  • For example, activities such as Render or Export can be managed so that only one activity instance runs per machine or so multiple instances are processed concurrently, according to the queue configuration.

Project

package_2 Project node objects are the primary containers for configuration nodes within Grooper. The Project is where various processing objects such as stacks Content Models, settings Batch Processes, profile objects, and more are organized and managed. It allows for the encapsulation and modularization of these resources for easier management and reusability.

Resource File

A Resource File node object in Grooper is essentially a file that is stored as part of a Grooper package_2 Project. It can include various types of files such as text files or XML schema files.

Review Queue

person_play Review Queue node objects are designated for human-performed tasks. They organizes the Review tasks that require human attention and can distribute these tasks among different groups of users based on the queue's settings. Review Queues can be assigned on the settings Batch Process level to filter work by an entire process or Review Activities at the edit_document Batch Process Step level to filter tasks at a more granular step-based level.

Root

The database Root node object represents the topmost element of the Grooper Repository. It serves as the starting point from which all other objects branch out. It is the anchor point for all other structures within the repository and a necessary element for the organization and linkage of all other objects within Grooper.

Scanner Profile

scanner Scanner Profile node objects store configuration settings for operating a document scanner. Scanner Profiles provide users operating the "Scan Viewer" in Review a quick way to select pre-saved scanner configurations.

Separation Profile

insert_page_break Separation Profile objects contain rules and settings that determine how groupings of scanned pages are separated into individual folder Batch Folders, often using barcodes, blank pages, or patch codes as indicators for separation points.

Value Reader

quick_reference_all Value Reader objects define a single data extraction operation. You set the Extractor Type on the Value Reader that matches the specific data you're aiming to capture. For example, you would use the Pattern Match Extractor Type to return data using regular expression. You would use a Value Reader when you need to extract a single result or list of simple results from a document.

OCR Engine

An "OCR engine" is the part of OCR software that recognizes text from images. OCR engines analyze the image's pixels to determine where text is on the page and what each character is. In Grooper, OCR engines are selected when configuring an OCR Profile's OCR Engine property.

Azure OCR

Azure OCR is an OCR Engine option for OCR Profiles that utilizes Microsoft Azure's Read API. Azure's Read engine is an AI-based text recognition software that uses a convolutional neural network (CNN) to recognize text. Compared to traditional OCR engines, it yields superior results, especially for handwritten text and poor quality images. Furthermore, Grooper supplements Azure's results with those from a traditional OCR engine in areas where traditional OCR is better than the Read engine.

Property

A property is a mechanism by which an object in Grooper is configured that affects how the object performs its function.

Alignment

Alignment is a grouping of properties found on Fill Methods and Data Elements that manipulate the prompt provided to an LLM chatbot in an attempt to provide accurate highlighting of values displayed within the Document Viewer.

Confidence Multiplier and Output Confidence

Some results carry more weight than others. The Confidence Multiplier and Output Confidence properties allow you to manually adjust an extraction result's confidence.

Constrained Wrap

The Constrained Wrap property allows certain Extractor Types and the Labeling Behavior to match values which wrap from one line to the next inside a box (such as a table cell).

Content Type Filter

The Content Type Filter property restricts Activities to specific collections_bookmark Content Categories and/or description Document Types.

Document Quoting

Document Quoting is a property of the AI Extract Fill Method that limits the text fed to the AI to reduce the amount of tokens consumed. Controlling specifically what is given can not only reduce the monetary cost of using the AI, but also the time cost of running the Fill Method.

Output Extractor Key

The Output Extractor Key property is another weapon in the arsenal of powerful Grooper classification techniques. It allows pin Data Types to return results normalized in a way more beneficial to document classification.

Paragraph Marking

Paragraph Marking alters the normal text data in a document by placing the carriage return and new line feed pairs at the end of each paragraph, instead of the end of each line. This allows users to break up a document's text flow into segments of paragraphs instead of segments of lines.

Parameters

Parameters is a collection of properties used in the configuration of LLM constructs. Temperature, TopP, Presence Penalty, and Frequency Penalty are parameters that influence text generation in models. Temperature and TopP control the diversity and probability distribution of generated text, while Presence Penalty and Frequency Penalty help manage repetition by discouraging the reuse of words or phrases.

Permission Sets

A Permission Set is a property that allows you to restrict user access to repositories, pages, and certain activities. This helps eliminate the possibility of an unauthorized individual from editing or deleting information or inventory_2 Batches.

Preprocessing

The Preprocessing grouping of properties consists of settings that adjust how text is formatted and interpreted before any Data Extraction process begins. These properties are crucial for ensuring that the text data is in the most optimal format for subsequent extraction tasks, which could involve complex regular expressions or precise data parsing.

Scope

The Scope property of a edit_document Batch Process Step, as it relates to an Activity, determines at which level in a inventory_2 Batch hierarchy the Activity runs.

Secondary Types

Secondary Types allow the application of multiple Content Types to a single folder Batch Folder.

Tab Marking

Tab Marking allows you to insert tab characters into a document's text data.

Vertical Wrap

Vertical Wrap is a property of certain Extractor Types and a Content Type's Labeling Behavior used to provide simplified extraction of vertically wrapped text (typically stacked labels).

Repository Option

Repository Options are optional features that affect the entire repository. These optional features enable functionality that otherwise do not work without first establishing the connections these options provide. Repository Options are added to a Grooper Repository and configured using the database Root node's Options property.

LLM Connector

LLM Connector is a Repository Option that enables OpenAI-based functionality for the local Grooper repository.

AI Search

AI Search enables Grooper's document search and retrieval features in the Search page. It provides the framework to create document search indexes by Content Type and submit documents to an index. Once indexed, documents can be retrieved by full text searches in the Search Page with feature rich querying and filtering capabilities. Once retrieved, users can view documents in the Search page, download the results, or submit documents for further processing in Grooper.

Section Extract Method

The Extract Method property of a insert_page_break Data Section defines a "Section Extract Method" which specifies how section instances will be identified and extracted.

Clause Detection

Clause Detection is a insert_page_break Data Section Extract Method. It leverages LLM text embedding models to compare supplied samples of text against the text of a document to return what the AI determines is the "chunk" of text that most closely resembles the supplied samples.

Nested Table

Nested Table is a insert_page_break Data Section Extract Method. This method divides a document into sections by extracting table data within those sections. This gives Grooper users a method for extracting hierarchical tables as well as dividing up a document into sections where each of those sections have the same table (or at least tabular data which can be extracted by a single table Data Table object).

Transaction Detection

Transaction Detection is a insert_page_break Data Section Extract Method. This extraction method produces section instances by detecting repeating patterns of text around the Data Section's child variables Data Fields.

Separation Provider

The Provider property of the Separate Activity defines the type of separation to be performed at the designated Scope.

Change in Value Separation

The Change in Value Separation Provider creates a new folder and separates every time an extracted value changes from one contract Batch Page to another.

Control Sheet Separation

Control Sheet Separation is a Separation Provider that uses Grooper document_scanner Control Sheets to separate documents.

EPI Separation

The EPI Separation Separation Provider uses embedded page information ("EPI") to Separate loose pages into document folders. A Data Extractor is used to find page numbers from the text on a page and Grooper uses this information to separate the pages.

ESP Auto Separation

ESP Auto Separation is a Separation Provider used for document separation. It is unique in that it both separates and classifies documents at the same time. It uses page-level classification training examples (among other things) to determine where to insert document folders in a inventory_2 Batch.

Event-Based Separation

Event-Based Separation is a Separation Provider that Separates documents using one or more "Separation Events". Each Separation Event triggers the creation of a new folder.

Multi Separator

The Multi Separator Separation Provider performs separation using multiple Separation Providers. It allows users to create a list of any of the other Separation Providers. If the first provider on the list fails to separate a page (or, as more often is the case, a series of pages), the next one will be applied. If that fails, the next, and so on.

Pattern-Based Separation

Pattern-Based Separation is a Separation Provider that creates a new document folder every time a value returned by a defined pattern is encountered on a page.

Undo Separation

Undo Separation is a Separation Provider. Instead of putting loose contract Batch Pages into folder Batch Folders, this Separation Provider removes Batch Folders, leaving only loose pages.

Service

Grooper Service is a conceptual term that refers to the various executable programs that run as a Windows Services to facilitate Grooper processing. Service instances are installed, started and stopped using Grooper Command Console.

API Services

You can perform inventory_2 Batch processing via REST API web calls by installing API Services.

Activity Processing

Activity Processing is a Grooper Service that executes Activities assigned to edit_document Batch Process Steps in a settings Batch Process. This allows Grooper to automate Batch Steps that do not require a human operator.

Grooper Licensing

Grooper Licensing is a Grooper Service that distributes licenses to multiple workstations running Grooper applications.

Import Watcher

An Import Watcher is a Grooper Service that schedules and runs import jobs. It periodically executes an Import Provider to query or poll for files in a file system or content management system that meet specified criteria. Then, these files are imported into Grooper as documents in a new Batch. Afterward, the imported files can be (and should be) moved, deleted, or modified to prevent repeat imports in the next polling cycle.

Indexing Service

An Indexing Service is a Grooper Service that periodically polls the Grooper database to automate AI Search indexing. It checks to see if any documents in a Grooper Repository are classified as a Document Type that inherit from a Content Type configured with an Indexing Behavior. If there are any, and they need to be added, updated, or deleted to/from the search index, the Indexing Service will submit an "Indexing Job" to be picked up by an Activity Processing service.

Table Extract Method

A Table Extract Method defines the settings and logic for a table Data Table to perform extraction. It is set by configuring the Extract Method property of the Data Table.

Delimited Extract

The Delimited Extract Table Extract Method extracts tabular data from a delimiter-separated text file, such as a CSV file.

Fluid Layout

The Fluid Layout Table Extract Method will choose between Tabular Layout and Flow Layout configurations, depending on how labels are collected for a description Document Type.

Grid Layout

The Grid Layout Table Extract Method uses the positional location of row and column headers to interpret where a tabular grid would be around each value in a table and extract values from each cell in the interpreted grid.

Row Match

The Row Match Table Extract Method uses regular expression pattern matching to determine a tables structure based on the pattern of each row and extract cell data from each column.

Tabular Layout

The Tabular Layout Table Extract Method uses column header values determined by the view_column Data Columns Header Extractor results (or labels collected for the Data Columns when a Labeling Behavior is enabled) as well as Data Column Value Extractor results to model a table's structure and return its values.

UI Element

A UI Element is a portion of the Grooper interface that allows users to interact with or otherwise receive information about the application.

Document Viewer

The Grooper Document Viewer is the portal to your documents. It is the UI that allows you to see a folder Batch Folder's (or a contract Batch Page's) image, text content, and more.

Node Tree

The Node Tree is the hierarchical list of Grooper node objects found in the left panel in the Design Page. It is the basis for navigation and creation in the Design Page.

Overrides

Overrides is a tab provided to allow overriding of default properties set to a Data Element.

Search Page

The Search Page allows users to leverage AI Search indexes to query indexed documents. Once queried the user can interact with returned results in several ways, including creating new batches, submitting processing jobs, or even starting a conversation with an AI assistant.

Scan Viewer

The Scan Viewer is a user interface that can be added to the user-attended person_search Review step in a settings Batch Process. It is used to scan documents into inventory_2 Batches from one or more scanning workstations.

Summary Tabs

stacks Content Models and collections_bookmark Content Categories have a Summary tab where you can view "Descendant Node Types", description Document Types, and Expressions.

Miscellaneous Features

URL Endpoints for Review

Three different URL endpoints can be used to open Review tasks in the Grooper Web Client, given certain information like the Grooper Repository ID, settings Batch Process name, inventory_2 Batch Id and more. This allows Grooper users to link directly to a Batch in Review with a URL.

Disambiguation

Repository

A "repository" is a general term in computer science referring to where files and/or data is stored and managed. In Grooper, the term "repository" may refer to: