Glossary: Difference between revisions

Revision as of 15:03, 24 July 2024

Activity

Activity is a property on Batch Process Step objects. Activities define specific document processing operations done to a Batch, Batch Folder, or Batch Page.

Batch Process Steps configured with specific Activities are frequently referred by the name of the Activity followed by the word "step". For example: Classify Step.

Apply Rules

Apply Rules is an Activity that runs Data Rules on data that has already been extracted from a Batch. A Batch Process Step configured with the Apply Rules Activity will always need to be preceded by a Batch Process Step configured with the Extract Activity.

Classify

Classify is an Activity that "classifies" Batch Folders in a Batch by assigning them a Content Type using patterns, lexical understanding, or rules as defined by a Content Model.

Clip Frames

The Clip Frames Activity extracts defined areas from microfiche card images, creating new image frames or layers for focused analysis or processing.

Correct

The Correct Activity performs spell correction on the textual content of Batch Folders or specific Data Elements, enhancing the accuracy of data extraction by resolving recognition errors.

Detect Frames

The Detect Frames Activity locates and identifies frame lines on microfiche card images, enabling the isolation of areas within the frames for further data extraction or processing.

Execute

The Execute Activity runs a specified child command, allowing for the modular and controlled execution of tasks within a larger automated workflow.

Export

The Export Activity facilitates the transfer of documents and extracted information to external systems or formats, completing the data processing workflow.

Extract

The Extract Activity retrieves relevant information, defined by Data Elements, from Batch Folders, transforming unstructured or semi-structured content into structured, usable data.

Image Processing

The Image Processing Activity enhances and optimizes Batch Pages for better recognition and data extraction results.

Initialize Card

The Initialize Card Activity prepares and configures microfiche card images for further processing.

Merge

The Merge Activity creates a document from the Page objects in your Batch and saves to a Batch Folder.

Recognize

The Recognize Activity interprets Batch Pages and Batch Folders, converting them into machine-readable text and capturing layout data for comprehensive analysis and data extraction. This will attach a text and/or layoutData file to the respective object.

Redact

The Redact Activity hides or "redacts" text information on the image or PDF of a document based on data returned from a configured Extractor. It does not alter the text data attached to the image or PDF.

Render

The Render Activity normalizes electronic document content from file formats Grooper cannot read innately to a PDF format. This allows Grooper to extract the text via the Recognize Activity.

Review

The Review Activity facilitates human evaluation and validation of processed Batch Folders and extracted data for accuracy and completeness.

Send Mail

The Send Mail Activity automates the dispatch of emails with or without attachments, based on Batch Process events and conditions.

Separate

The Separate Activity sorts Batch Pages into individual Batch Folders, distinguishing them for independent processing and organization.

Split Pages

Multi-page documents (typically PDFs and TIFFs) come into Grooper represented as single Batch Folders. The Split Pages Activity exposes Batch Pages as child objects of the Batch Folders for individualized processing and handling.

XML Transform

The XML Transform Activity applies XSLT stylesheets to XML data to modify or reformat the output structure for various purposes.

Application

A Grooper repository consists of a series of tables in a database, and a File Store containing relevant files associated to objects that exist within that database. A Grooper application is the interface by which a user can interact with that repository of information in an intuitive way.

Grooper Command Console

The Grooper Command Console is a command-line interface that performs system configuration and administration tasks within Grooper.

Web Client

The Grooper Web Client allows users to connect to Grooper via a web browser using a URL. The URL is pointed at a website hosted by a server on which Grooper is installed and Internet Information Services configured.

Behavior

Behaviors is a property of Content Types and Export Activities that defines configurable actions that automate processing tasks based on the identified Content Type of a Batch Folder.

Export Behavior

An Export Behavior defines the conditions and actions for exporting Batch Folders and their associated data from Grooper to other systems.

Labeling Behavior

A Labeling Behavior is a Content Type Behavior designed to collect and utilize a document's field labels in a variety of ways. This includes functionality for Classification and Extraction.

PDF Data Mapping

PDF Data Mapping is a Content Type Behavior designed to create an exportable PDF file with additional native PDF elements.

CMIS Connection Type

CMIS Connection Type, or "binding", establishes the communication protocols used to connect Grooper with content management systems adhering to the CMIS standard.

AppXtender

The AppXtender CMIS Connection Type, or "binding", connects Grooper to the ApplicationXtender content management system for import and export operations.

Box

The Box CMIS Connection Type, or "binding", connects Grooper to the Box content management system for import and export operations.

Exchange

The Exchange CMIS Connection Type, or "binding", connects Grooper to the Microsoft Exchange Server mail server for import and export operations.

FTP

The FTP CMIS Connection Type, or "binding", connects Grooper to FTP directories for import and export operations.

IMAP

The IMAP CMIS Connection Type, or "binding", connects Grooper to email messages and folders through an IMAP email server.

NTFS

The NTFS CMIS Connection Type, or "binding", connects Grooper to files and folders in the Microsoft Windows NTFS file system.

OneDrive

The OneDrive CMIS Connection Type, or "binding", connects Grooper to Microsoft OneDrive cloud services.

SFTP

The SFTP CMIS Connection Type, or "binding", connects Grooper to SFTP directories for import and export operations.

SharePoint

The SharePoint CMIS Connection Type, or "binding", connects Grooper to Microsoft SharePoint, providing access to content stored in "document libraries" and "picture lLibraries".

Classification Method

The Classification Method property determines the technique used for document classification within a Content Model, enabling the sorting of Batch Folders into categories based on their content or structure. It can utilize pattern matching, machine learning models, or other methodologies to identify and organize documents accurately.

GPT Embeddings

The GPT Embeddings Classification Method is an OpenAI GPT training-based classification approach that uses "embeddings" to tell one document from another.

Labelset-Based

Labelset-Based is a Classification Method that leverages the labels defined via a Labeling Behavior to classify Batch Folders.

Lexical

The Lexical Classification Method classifies Batch Folders based on their text content by utilizing either pre-configured training or rules. This is achieved through the analysis of word frequencies or defined rules that identify document types.

Rules-Based

The Rules-Based Classification Method employs defined "rules" on Document Types to classify Batch Folders, utilizing Positive Extractor and Negative Extractor properties to accurately categorize them through rule application, thereby ensuring Batch Folders match predefined criteria.

Visual

The Visual Classification Method uses image data instead of text data to determine the Document Type assigned to a Batch Folder during classification. Instead of using text-based extractors, an IP Profile is used with an Extract Features IP Command to obtain data pertaining to a Batch Folder's image(s). Document samples are trained as examples of a Document Type.

Collation Provider

The Collation property of a Data Type defines the method for converting its raw results into a final result set, governing how lists of matches from the Data Type are combined and interpreted to produce the output data of the Data Type.

AND

The AND Collation Provider of a Data Type returns results only when each individual extractor specified within it gets at least one hit, thus acting as a logical “AND” operator across multiple extractors.

Array

The Array Collation Provider of a Data Type matches a list of values arranged in horizontal, vertical, or flow order, combining instances that qualify into a single result.

Combine

The Combine Collation Provider of a Data Type combines instances from returned results based on a specified grouping, controlling how extractor results are assembled together for output.

Key-Value List

The Key-Value List Collation Provider of a Data Type matches instances where a key and a list of one or more values appear together on the document, adhering to a specific layout pattern.

Key-Value Pair

The Key-Value Pair Collation Provider of a Data Type matches instances where a key is paired with a value on the document in a specific layout, essential for extracting label-value pairs.

Multi-Column

The Multi-Column Collation Provider of a Data Type combines multiple columns on a page into a single column for extraction.

Ordered Array

The Ordered Array Collation Provider of a Data Type finds sequences of values where one result is present for each extractor, in the order they appear.

Pattern-Based

The Pattern-Based Collation Provider of a Data Type uses regular expressions to sequence returned results into a final result set.

Split

The Split Collation Provider of a Data Type separates a data instance at each match returned by the Data Type.

Concept

There are many objects and properties a user can configure in Grooper, however, gaining an understanding how, why, and when to use these objects and properties is powered by one's understanding of the underlying concepts that define what what these objects and properties are doing and why.

Activity Processing

Activity Processing is a conceptual term that refers to the execution of a sequence of configured tasks, such as classification, extraction, or data enhancement on documents, which are performed within a Batch Process to transform raw data from documents into structured and actionable information.

CMIS+

CMIS+ is a conceptual term that refers to Grooper's CMIS+ architecture that provides a standardized access to document content and metadata across a variety of external storage platforms.

CMIS

CMIS is a conceptual term that refers to Content Management Interoperability Services: an open standard allowing different content management systems to share information over the Internet.

CMIS Query

CMIS Query is a conceptual term that refers to the fact that CMIS Queries are utilized to search documents in CMIS Repositories and to filter documents upon import when using the Import Query Results Import Provider.

CSS Data Viewer Styling

CSS Data Viewer Styling refers to using CSS to custom style the Review activity's Data Viewer interface. This gives you a great deal of control over a Data Model's appearance and layout during document review.

Classification

Classification is a conceptual term that refers to the process of identifying and organizing documents into categorical types based on their content or layout, often using machine learning, rules, or pattern recognition for efficient document management and data extraction workflows. Specifically, the Classify Activity will assign a Content Type to a Batch Folder.

Code Expressions

Code Expressions (not to be confused with regular expressions) is a conceptual term that refers to snippets of VB.Net code that expand Grooper's core functionality.

Combined Methods

Combined Methods is a conceptual term that refers to the idea that a user can leverage multiple Classification Methods to overcome the shortcomings of an individual method.

Content Type

Content Type is a conceptual term that refers to the grouping of three Grooper objects: Content Models, Content Categories, and Document Types.

Data Context

Data Context is a conceptual term that gives definition to data that, without it, is otherwise meaningless.

Data Element

Data Element is a conceptual term that refers to the grouping of five Grooper objects: Data Models, Data Sections, Data Fields, Data Tables, and Data Columns.

Data Extraction

Data Extraction is a conceptual term that involves identifying and capturing specific information from Batch Folders like forms or invoices using a set of configurable Data Extractors, which transform unstructured or semi-structured data into a structured, usable format for processing and analysis.

Data Extractor

Data Extractor (or just "extractor") refers to all Extractor Types and extractor objects. Extractors define the logic used to return data from a document's text content, including general data (such as a date) and specific data (such as an agreement date on a contract).

Data Instance

Data Instance is a conceptual term that refers to an encapsulation of text data within a document. Data instances are the hierarchy of text data that Grooper's extraction mechanisms create.

EDI Integration

EDI Integration is a conceptual term that refers to Grooper's ability to process EDI files.

Expressions

Expressions (not to be confused with regular expressions) is a conceptual term that refers to snippets of VB.Net code that expand Grooper's core functionality.

Expressions Cookbook

Expressions Cookbook is a conceptual term that refers to a reference list for commonly used expressions in Grooper.

Field Mapping

Field Mapping is a conceptual term that refers to how logical connections are made between metadata content in Grooper and an external storage platform.

Five Phases of Grooper

Five Phases of Grooper is a conceptual term that seeks to build understanding of how documents are processed through Grooper.

Flow Collation

Flow Collation is a conceptual term used to define a type of layout used in Collation Providers of Data Types.

Footer Rows and Footer Modes

Footer Rows and Footer Modes is a conceptual term that refers to how a "footer row" (enabled by the Generate Footer Row property of a Data Table) provides Grooper users a quick way to validate numerical data in a Data Column. The Data Column's Footer Mode property controls if and how a total is determined for numerical values in a Data Column.

Fuzzy RegEx

Fuzzy RegEx is a conceptual term that refers to the usage of fuzzy logic within Extractor Types that leverage regular expressions to match patterns via the enabling of the Fuzzy Matching' property.

GPT Integration

GPT Integration is a conceptual term that refers to the usage of OpenAI's GPT models within Grooper to enhance the capabilities of data extractors, classification, and lookups.

Grooper Infrastructure

Grooper Infrastructure is a conceptual term that refers to computing underpinnings of what makes up a Grooper repository and the software that allows interface with it.

Grooper Repository

Grooper Repository is a conceptual term that refers to the environment used to create, configure and execute objects in Grooper. It provides the framework to "do work" in Grooper.

Grooper Service

Grooper Services are various executable programs that run as a Windows Services to facilitate Grooper processing. Service instances are installed, configured, started and stopped using Grooper Config.

Image Processing

Image Processing is a conceptual term that refers to how Grooper applies a variety of techniques to enhance scanned documents' quality, improving OCR accuracy by removing imperfections and adjusting visual characteristics to prepare images for data extraction and classification.

Import Mode and Document Linking

Import Mode and Document Linking is a conceptual term that refers to the usage of the Import Mode property. This affects whether or not an imported document maintains a link to its original file and/or if a copy of the file is made on import or not.

LINQ to Grooper Objects

LINQ to Grooper Objects is a conceptual term that refers to the ability of Grooper to leverage LINQ syntax in expressions.

Layered OCR

Layered OCR is a conceptual term that refers to the usage of the Layered OCR setting of the OCR Engine property of an OCR Profile. The use of this setting enables the usage of secondary OCR Profiles on a single page. The OCR results from these secondary OCR Profiles are merged with (or layered on top of) the primary OCR Profile's results.

Layout Data

Layout Data is a conceptual term that refers to information such as line locations, OMR checkbox locations and states, barcode values, and detected shapes captured by certain image processing commands. This data is stored as an attached file on a Batch Folder or Batch Page object and can later be recalled by various functions within Grooper that rely on the presence of that data to function.

Microfiche Processing

Microfiche Processing is a conceptual term that refers to how Grooper leverages several IP Commands to accurately process microform documents.

Microsoft Office Integration

Microsoft Office Integration is a conceptual term that refers to Grooper's ability to convert Microsoft Word and Microsoft Excel files into formats that Grooper can read.

OCR

OCR is a conceptual term that stands for Optical Character Recognition. It allows text from paper documents to be digitized, in order to be searched or edited by other software applications. OCR converts typed or printed text from digital images of physical documents into machine readable, encoded text.

OCR Synthesis

OCR Synthesis is a conceptual term that refers to Grooper's unique method of pre-processing and re-processing raw results from the OCR Engine to get better results out of it.

Object Nomenclature

Object Nomenclature is a conceptual term that refers to the idea that mastery of a Grooper environment is greatly enhanced by understanding the myriad of objects that can exist and how they are related.

PDF Page Types

PDF Page Types is a conceptual term that refers to specific types of PDF pages. Page types describe the kind of content in a PDF page and informs Grooper how certain Activities should process the page. For example, "single image" pages are OCR'd by the Recognize activity where "text only" pages have their native text extracted.

Regular Expression

Regular Expression is a conceptual term that refers to a standard syntax designed to parse text strings. This is a way of finding information in a block of text. It is the primary method by which Grooper extracts and returns data from documents.

Repository

Repository is a conceptual term that refers to a location where files and/or data is stored and managed.

Separation

Separation is a conceptual term that refers to the process of taking an unorganized Batch of loose Batch Pages and organizing them into document folders. This is done so Grooper can later assign a Document Type to each document folder in a process known as Classification.

TF-IDF

TF-IDF is a conceptual term that refers to (term frequency-inverse document frequency), a numerical statistic intended to reflect how important a word is to a document within a collection (or document set or corpus). It is how Grooper uses machine learning for training-based document classification (via the Lexical method) and data extraction (via the Field Class extractor).

Table Extraction

Table Extraction is a conceptual term that refers to Grooper's functionality to extract data from cells in tables. This is accomplished by configuring the Data Table and its child Data Column Data Elements in a Data Model.

Test Batch

Test Batch is a conceptual term that refers to any Batch created in the Test folder of the Batches folder in the Node Tree).

Thread

Thread is a conceptual term that refers to the smallest unit of processing that can be performed within an operating system.

Training-Based Approaches to Document Classification

Training-Based Approaches to Document Classification is a conceptual term that refers to an approach to document classification that classifies Batch Folders according to the similarity of unclassified Batch Folders to trained examples of that kind of Document Type.

Training Batch

Training Batch is a conceptual term that refers to a more convenient way to work with all of the samples a Concent Model has been trained against. You can also still look at the Form Types underneath each Content Type, but the Training Set can show you all the samples in one place.

UNC Path

UNC Path is a conceptual term that refers to UNC (Universal Naming Convention) which is a standard used in Microsoft Windows for accessing shared network folders.

Waterfall Classification

Waterfall Classification is a conceptual term that refers to a classification notion in Grooper that manipulates the Positive Extractor property to prioritize training similarity in order to achieve a middle ground between high specificity and accuracy, and generality with minimal accuracy. This is helpful whenever Batch Folders get misclassified, and simply retraining won't help.

XML Schema Integration

XML Schema Integration is a conceptual term that refers to Grooper's ability to interact with XML schemas and the configuration required to do so.

Export Definition

Export Definitions is a property of Export Behaviors as defined on Content Types or Export Activities. It defines export connectivity to external systems such as file systems, content management repositories, databases, mail servers, etc.

CMIS Export

CMIS Export is an Export Definition available when configuring an Export Behavior. It exports content over a CMIS Connection, allowing users to export documents and their metadata to various on-premise and cloud-based storage platforms.

Data Export

Data Export is an Export Definition available when configuring an Export Behavior. It exports extracted document data over a Data Connection, allowing users to export data to a Microsoft SQL Server or ODBC compliant database.

Extractor Type

Extractor Type, or value extractor, is a property on a wide array of objects that goes by many different names. It defines a primitive operator which reads data values from the text or visual content of a document. Extractor Types are consumed by higher-level objects such as Data Elements, extractor objects, Content Types and more.

Detect Signature

The Detect Signature Extractor Type detects signatures within a specified rectangular region on a document page by measuring the fill percentage, providing a method to identify and validate the presence of handwritten signatures.

Field Match

The Field Match Extractor Type matches the value stored in a previously-extracted Data Field or Data Column, allowing for consistency and reference across different parts of a document or dataset.

Find Barcode

The Find Barcode Extractor Type searches the Batch Folder layout data for a barcode, capturing its value upon detection.

GPT Complete

The GPT Complete Extractor Type leverages OpenAI's GPT model to generate completions for inputs, returning one hit for each result choice provided by the model's response.

Highlight Zone

The Highlight Zone Extractor Type sets a highlight region on a document without performing any actual data extraction, effectively marking areas of interest or importance.

Label Match

The Label Match Extractor Type matches a list of one or more label values using matching options defined by a Labeling Behavior. It works similarly to List Match, but uses shared settings defined in a Labeling Behavior for Fuzzy Matching, Vertical Wrap, and Constrained Wrap.

Labeled OMR

The Labeled OMR Extractor Type is used to output OMR checkbox labels. It determines whether labeled checkboxes are checked or not. If checked, it outputs the label(s) as the result.

Labeled Value

The Labeled Value Extractor Type identifies and extracts information from a field presented as a label-value pair on a document, by matching a set of labels and a set of values, and determining pairs based on their geometric clustering on the document.

List Match

The List Match Extractor Type is designed to return values matching one or more items in a defined list. By default, the List Match extractor does not use or require regular expression.

Ordered OMR

The Ordered OMR Extractor Type is similar to a Labeled OMR in that it is used to return OMR check box information. Rather than relying on a label for the extraction, the Ordered OMR returns information for multiple check boxes within a given zone based on their order and layout.

Pattern Match

The Pattern Match Extractor Type extracts values from a document that match a specified regular expression, allowing for the detection of data following a known format or pattern.

Query HTML

The Query HTML Extractor Type queries an HTML document using a CSS selector and returns the inner text of each matching element.

Read Barcode

The Read Barcode Extractor Type uses barcode recognition technology to read and extract values from barcodes found in the document content.

Read Meta Data

The Read Meta Data Extractor Type retrieves metadata values associated with a document.

Read Zone

The Read Zone Extractor Type allows you to extract text data in a rectangular region (called a "extraction zone" or just "zone") on a document. This can be a fixed zone, extracting text from the same location on a document, or a zone relative to an extracted text anchor or shape location on the document.

Reference

The Reference Extractor Type allows for the referencing of an external extractor object to be used within a Grooper object's configuration, enabling consistent extraction logic across different objects.

Word Match

The Word Match Extractor Type extracts individual words or phrases containing multiple words from documents. It is designed to collect full words and is often used in n-gram extraction.

Zonal OMR

The Zonal OMR Extractor Type reads one or more checkboxes using manually-configured zones. It is mostly an outdated tool and should only be used if all other OMR extractor options have been exhausted. It requires the most manual setup of any OMR extractor to configure.

Fill Method

The Fill Method property on Data Models, Data Sections, and Data Tables is a collection of various mechanisms that allow for the population of descendant Data Elements of Data Models, Data Sections, and Data Tables (which can be referred to as "containers"). Fill Methods are secondary extraction operations which populate descendant Data Elements as they run after normal extraction.

AI Extract

AI Extract is a Fill Method that leverages a Large Language Model (LLM) to quickly and easily return extraction results to the child elements of Data Models, Data Sections, and Data Tables by using the .json structure of the relavent Data Elements as part of the instruction set to the LLM.

IP Command

The Command property of an IP Step object in Grooper specifies the Image Processing (IP) command to be executed for that specific step as part of an IP Profile.

Barcode Detection

The Barcode Detection IP Command detects and reads barcode data. The detected barcode information is stored as part of the object's layout data.

Binarize

The Binarize IP Command converts a color or grayscale image to black and white using various thresholding methods.

Extract Page

The Extract Page IP Command removes an image from a carrier image while simultaneously removing any image warping or skewing.

Line Removal

The Line Removal IP Command removes horizontal and vertical lines from documents.

Scratch Removal

The Scratch Removal IP Command detects and removes or repairs scratches from film-based images.

Shape Detection

The Shape Detection IP Command detects shapes on a document matching sample images given by the user.

Shape Removal

The Shape Removal IP Command detects and removes shapes from documents.

Import Provider

The Provider property is a selection of Import Providers which enable import of file-based content from a variety of sources such as file systems, mail servers, and content repositories.

CMIS Import

The CMIS Import Import Provider used to import content over a CMIS Connection, allowing users to import from various on-premise and cloud based storage platforms.

Import Descendants

Import Descendants is one of two Import Provider that use CMIS Connections to import document content into Grooper.

Import Query Results

Import Query Results is one of two Import Provider that use CMIS Connections to import document content into Grooper.

Lookup

The Lookups property is a list of lookup operations to be performed on child elements of the associated container. Each Lookup specification defines a lookup operation, where the value of one or more Grooper fields will be used to query an external data source, such as a database. The results of the query can be used to validate existing field values or populate additional field values.

CMIS Lookup

CMIS Lookup is a Lookup Specification that performs a lookup against a CMIS Repository via a CMISQL Query.

Database Lookup

Database Lookup is a Lookup Specification that performs a lookup against a Data Connection via a SQL query.

GPT Lookup

GPT Lookup is a Lookup Specification that performs a lookup using an OpenAI GPT model.

Web Service Lookup

Web Service Lookup is a Lookup Specification that looks up external data at an API endpoint by calling a web service.

Object

In Grooper, objects are defined as configurable elements within its hierarchical tree structure. These include nodes and embedded objects that can be manipulated and edited to define the system's behavior, create workflows, and manage content.

Batch

Batch objects are fundamental in Grooper's architecture as they are the containers of documents that get moved through Grooper's workflow mechanisms known as Batch Processes.

Batch Folder

Batch Folder objects are defined as container objects within a Batch that are used to represent and organize both folders and pages. They can hold other Batch Folders or Batch Page objects as children. The Batch Folder acts as an organizational unit within a Batch, allowing for a structured approach to managing and processing a collection of documents.

Batch Folders are frequently referred to simply as "documents".

Batch Page

Batch Page objects represent individual pages within a Batch. The Batch Page object is the most granular unit in the hierarchy of Batch Objects in Grooper.

Batch Pages are frequently referred to simply as "pages".

Batch Process

Batch Process objects are crucial components in Grooper's architecture. A Batch Process orchestrates the document processing strategy and ensures each Batch of documents is managed systematically and efficiently.

Batch Processes by themselves do nothing. Instead, the workflows they execute are designed by adding child Batch Process Steps.
A Batch Process is often referred to as simply a "process".

Batch Process Step

Batch Process Step objects are specific actions within the sequence defined by a Batch Process. A Batch Procsess Step plays a critical role in automating and managing the flow of documents through the various stages of processing within Grooper.

Batch Process Steps are frequently referred to as simply "steps".
Because a single Batch Process Step executes a single Activity configuration, they are often referred to by their referenced Activity as well. For example, a "Recognize step".

CMIS Connection

CMIS Connection objects provide a standardized way of connecting to various content management systems (CMS). These objects allow Grooper to communicate with multiple external storage platforms, enabling access to documents and content that reside outside of Grooper's immediate environment.

For those that support the CMIS standard, the CMIS Connection connects to the CMS using the CMIS standard.
For those that do not, the CMIS Connection normalizes connection and transfer protocol as if they were a CMIS platform.

CMIS Repository

CMIS Repository objects in Grooper allow access to external documents through a CMIS Connection. They allows managing and interacting with those documents within Grooper's framework as if they were local. They are created as a child object of a CMIS Connection and used for various Activities.

Content Category

Content Category objects are containers within a Content Model that hold other Content Categories and Document Type objects. They allow for further classification and grouping of Document Types within a taxonomy, aiding in the logical structuring of complex document sets. Besides grouping Document Types together, Content Categories also serve to create new branches in a Data Element hierarchy. In most cases Content Categories are used as organizational buckets to group like Document Types together.

Content Model

Content Model objects define the taxonomy of document sets in terms of the Document Type they contain. They also house the Data Elements that appear on each Content Category and Document Type within them. Content Models serve as the root of a Content Type hierarchy and are crucial for organizing the different types of documents that Grooper can recognize and process.

Data Column

Data Column objects are child objects of a Data Table, representing individual columns and defining the type of data each column holds along with its data extraction properties.

Data Connection

Data Connection objects define the settings for connecting to and interacting with a database. These interactions may include conducting lookups, exports, or other actions that relate to database management systems (DBMS). Once configured, a Data Connection object can be referenced by other components in Grooper for various DBMS-related activities.

Data Field

Data Field objects are created as child objects of a Data Model. A Data Field is a representation of a single piece of data targeted for extraction on a document.

Data Fields are frequently referred to simply as "fields".

Data Model

Data Model objects serve as the top-tier structure defining the taxonomy for Data Elements and are leveraged during the Extract Activity to extract data from a Batch Folders. They are a hierarchy of Data Elements that sets the stage for the extraction logic and review of data collected from documents.

Data Rule

Data Rule objects define the logic for automated data manipulation which occurs after data has been extracted from Batch Folders. These rules are applied to normalize or otherwise prepare data collected in a Data Model for downstream processes. Data Rules ensure that extracted data conforms to expected formats or meets certain quality standards.

Data Section

Data Section objects are grouping mechanisms for related Data Fields. Data Sections organize and segment child Data Elements into logical divisions of a document based on the structure and semantics of the information the documents contain.

Data Table

Data Table objects are utilized for extracting repeating data that's formatted in rows and columns, allowing for complex multi-instance data organization that would be present in table-formatted content.

Data Type

Data Type objects hold a collection of child, referenced, and locally defined Data Extractors and settings that manage how multiple (even differing) matches from Data Extractors are consolidated (via Collation) into a result set.

Document Type

Document Type objects represent a distinct type of document, like an invoice or contract. Document Types are created as children of a Content Model or a Content Category and are used to classify individual Batch Folders. Each Document Type in the hierarchy defines the Data Elements and Behaviors that apply to Batch Folders of that specific classification.

Field Class

Field Class objects are trainable extractors that distinguish between multiple instances of similar data within a document by understanding the context in which they occur. Field Classes can be configured to distinguish values within highly structured documents, but this type of extraction is better suited to simpler "Extractor Objects" like Value Readers or Data Types.

Field Classes are most useful when attempting to find values within the flow of natural language. This method involves training with positive and negative examples to distinguish the right context. You'd opt for a Field Class when the value you're after is an entire clause within a contract, or a specific value defined within the flow of text.

File Store

File Store objects define a storage location within Grooper where file content associated with nodes are saved. They are crucial for managing the content that forms the basis of the Grooper's processing tasks, allowing for the storage and retrieval of documents, images, and other "files". Not every object in Grooper will have files connected to it, but if it does, those files are stored in the location defined by this object.

Form Type

Form Type objects represent trained variations of a Document Type. These objects store machine learning training data for Lexical and Visual document classification methods.

IP Group

IP Group objects are child objects within IP Profiles that create a hierarchical structure for organizing image processing commands. IP Groups may contain other IP Groups or IP Step objects.

IP Profile

IP Profile objects detail the operations and parameters for image enhancement and cleanup. These operations improve the accuracy of further processing steps, like the Recognize and Classify Activities.

IP Step

IP Step objects are the basic units within an IP Profile that define a single image processing operation. IP Steps are performed sequentially within their parent IP Group or IP Profile.

Lexicon

Lexicon objects are dictionary objects that store a list of keys or key-value pairs. Lexicons can define local entries and/or import entries from other Lexicons and even import entries using a Data Connection. The entries in a Lexicon can be utilized in different areas of Grooper, such as data extraction, Fuzzy Matching, or OCR Correction, providing a reference point that enhances the accuracy and consistency of the software's operations.

Machine

Machine objects represent servers that have connected to the Grooper repository. They allow for the management of Grooper Service instances and serve as a connection points for processing jobs to be executed on the server hardware. Machine objects are essential for the scaling of processing capabilities and for distributing processing loads across multiple servers.

OCR Profile

OCR Profile objects configure the settings for optical character recognition (OCR) leveraged by the Recognize activity. OCR converts images of text into machine-encoded text. OCR Profile objects influence how effectively textual content is recognized and from Batch Pages.

Object Library

Object Library objects are .NET libraries that contain code files for customizing the functionality of Grooper. These libraries are used for a range of customization and integration tasks, allowing users to extend Grooper's capabilities.

Examples include:

Adding custom activities that execute within Batch Processes
Creating custom commands available during the Review Activity and in the Design page.
Defining custom methods that can be called from expressions on Data Field and Batch Process Step objects
Establish custom services that perform automated background tasks at regular intervals

Processing Queue

Processing Queue objects are designed for tasks performed by Machines, which include automated steps in the document processing lifecycle. Processing Queues are used to distribute machine tasks among different servers and control the concurrency or processing rate of these tasks.

For example, activities such as Render or Export can be managed so that only one activity instance runs per machine or so multiple instances are processed concurrently, according to the queue configuration.

Project

Project objects are collections of resources and serve as the primary containers for design components within Grooper. The Project object is where various processing objects such as Content Models, Batch Processes, Profile Objects, and more are organized and managed. It allows for the encapsulation and modularization of these resources for easier management and reusability.

Resource File

A Resource File object in Grooper is essentially a file that is stored as part of a Grooper Project. It can include various types of files such as text files or XML schema files.

Review Queue

Review Queue objects are designated for human-performed tasks. They organizes the Review tasks that require human attention and can distribute these tasks among different groups of users based on the queue's settings. Review Queues can be assigned on the Batch Process level to filter work by an entire process or Review Activities at the Batch Process Step level to filter tasks at a more granular step-based level.

Root

The Root object represents the topmost element of the Grooper repository. It serves as the starting point from which all other objects branch out. It is the anchor point for all other structures within the repository and a necessary element for the organization and linkage of all other objects within Grooper.

Scanner Profile

Scanner Profile objects outline the specifications for scanning physical documents into digital forms. This includes settings like resolution, color mode, and any post-scan image processing or enhancement functions.

See Desktop Scanning in Grooper for more information.

Separation Profile

Separation Profile objects contain rules and settings that determine how groupings of scanned pages are separated into individual Batch Folders, often using barcodes, blank pages, or patch codes as indicators for separation points.

Value Reader

Value Reader objects define a single data extraction operation. You set the Extractor Type on the Value Reader that matches the specific data you're aiming to capture. For example, you would use the Pattern Match Extractor Type to return data using regular expression. You would use a Value Reader when you need to extract a single result or list of simple results from a document.

Property

A property is a mechanism by which an object in Grooper is configured that affects how the object performs its function.

Confidence Multiplier and Output Confidence

Some results carry more weight than others. The Confidence Multiplier and Output Confidence properties allow you to manually adjust an extraction result's confidence.

Constrained Wrap

The Constrained Wrap property allows certain Extractor Types and the Labeling Behavior to match values which wrap from one line to the next inside a box (such as a table cell).

Content Type Filter

The Content Type Filter property restricts Activities to specific Content Categories and/or Document Types.

Document Quoting

Document Quoting is a property of the AI Extract Fill Method that limits the text fed to the AI to reduce the amount of tokens consumed. Controlling specifically what is given can not only reduce the monetary cost of using the AI, but also the time cost of running the Fill Method.

OCR Engine

An OCR Engine is the part of OCR software that does the actual character recognition, analyzing the pixels on an image and figuring out what characters they represent. This raw result can be further processed using Grooper's OCR Synthesis capabilities, producing the final OCR result used by Data Extractors to match text in a document and return the result.

Output Extractor Key

The Output Extractor Key property is another weapon in the arsenal of powerful Grooper classification techniques. It allows Data Types to return results normalized in a way more beneficial to document classification.

Paragraph Marking

Paragraph Marking alters the normal text data in a document by placing the carriage return and new line feed pairs at the end of each paragraph, instead of the end of each line. This allows users to break up a document's text flow into segments of paragraphs instead of segments of lines.

Parameters

Parameters is a colleciton of properties used in the configuration of LLM constructs. Temperature, TopP, Presence Penalty, and Frequency Penalty are parameters that influence text generation in models. Temperature and TopP control the diversity and probability distribution of generated text, while Presence Penalty and Frequency Penalty help manage repetition by discouraging the reuse of words or phrases.

Permission Sets

A Permission Set is a property that allows you to restrict user access to repositories, pages, and certain activities. This helps eliminate the possibility of an unauthorized individual from editing or deleting information or Batches.

Preprocessing

The Preprocessing grouping of properties consists of settings that adjust how text is formatted and interpreted before any Data Extraction process begins. These properties are crucial for ensuring that the text data is in the most optimal format for subsequent extraction tasks, which could involve complex regular expressions or precise data parsing.

Scope

The Scope property of a Batch Process Step, as it relates to an Activity, determines at which level in a Batch hierarchy the Activity runs.

Secondary Types

Secondary Types allow the application of multiple Content Types to a single Batch Folder.

Tab Marking

Tab Marking allows you to insert tab characters into a document's text data.

Vertical Wrap

Vertical Wrap is a property of certain Extractor Types and a Content Type's Labeling Behavior used to provide simplified extraction of vertically wrapped text (typically stacked labels).

Repository Option

The Options property of the database Root object is a collection of optional features that affect the entire repository. These optional features enable entire collections of functionality that otherwise do not work without first establishing the connections these options provide.

LLM Connector

LLM Connector is a Repository Option that enables OpenAI-based functionality for the local Grooper repository.

Section Extract Method

The Extract Method property of a Data Section defines a "Section Extract Method" which specifies how section instances will be identified and extracted.

Nested Table

Nested Table is a "Section Extract Method" enabled for a Data Section using the Extract Method property. This method divides a document into sections by extracting table data within those sections. This gives Grooper users a method for extracting hierarchical tables as well as dividing up a document into sections where each of those sections have the same table (or at least tabular data which can be extracted by a single Data Table object).

Transaction Detection

Transaction Detection is a Data Section Extract Method. This extraction method produces section instances by detecting repeating patterns of text around the Data Section's child Data Fields.

Separation Provider

The Provider property of the Separate Activity defines the type of separation to be performed at the designated Scope.

Change in Value Separation

The Change in Value Separation Provider creates a new folder and separates every time an extracted value changes from one Batch Page to another.

Control Sheet Separation

Control Sheet Separation is a Separation Provider that uses Grooper Control Sheets to separate documents.

EPI Separation

The EPI Separation Separation Provider uses embedded page information ("EPI") to Separate loose pages into document folders. A Data Extractor is used to find page numbers from the text on a page and Grooper uses this information to separate the pages.

ESP Auto Separation

ESP Auto Separation is a Separation Provider used for document separation. It is unique in that it both separates and classifies documents at the same time. It uses page-level classification training examples (among other things) to determine where to insert document folders in a Batch.

Event-Based Separation

Event-Based Separation is a Separation Provider that Separates documents using one or more "Separation Events". Each Separation Event triggers the creation of a new folder.

Multi Separator

The Multi Separator Separation Provider performs separation using multiple Separation Providers. It allows users to create a list of any of the other Separation Providers. If the first provider on the list fails to separate a page (or, as more often is the case, a series of pages), the next one will be applied. If that fails, the next, and so on.

Pattern-Based Separation

Pattern-Based Separation is a Separation Provider that creates a new document folder every time a value returned by a defined pattern is encountered on a page.

Undo Separation

Undo Separation is a Separation Provider. Instead of putting loose Batch Pages into Batch Folders, this Separation Provider removes Batch Folders, leaving only loose pages.

Service

Grooper Service is a conceptual term that refers to the various executable programs that run as a Windows Services to facilitate Grooper processing. Service instances are installed, configured, started and stopped using Grooper Config.

API Services

You can perform Batch processing via REST API web calls by installing API Services.

Activity Processing

Activity Processing is a Grooper Service that executes Activities assigned to Batch Process Steps in a Batch Process. This allows Grooper to automate Batch Steps that do not require a human operator.

Grooper Licensing

Grooper Licensing is a Grooper Service that distributes licenses to multiple workstations running Grooper applications.

Import Watcher

An Import Watcher Service schedules and runs import jobs. It periodically executes an Import Provider to query or poll for documents that meet specific criteria. When the matching documents are found, they are imported into Grooper. Afterward, the imported objects are moved, deleted, or modified to prevent repeated imports in the next polling cycle. This ensures that the same set of files is not imported over and over again."

Table Extract Method

The Extract Method property of a Data Table sets a Table Extract Method which defines the settings and logic for the Data Table to perform extraction.

Delimited Extract

The Delimited Extract Table Extract Method extracts tabular data from a delimiter-separated text file, such as a CSV file.

Fluid Layout

The Fluid Layout Table Extract Method will choose between Tabular Layout and Flow Layout configurations, depending on how labels are collected for a Document Type.

Grid Layout

The Grid Layout Table Extract Method uses the positional location of row and column headers to interpret where a tabular grid would be around each value in a table and extract values from each cell in the interpreted grid.

Row Match

The Row Match Table Extract Method uses regular expression pattern matching to determine a tables structure based on the pattern of each row and extract cell data from each column.

Tabular Layout

The Tabular Layout Table Extract Method uses column header values determined by the Data Columns Header Extractor results (or labels collected for the Data Columns when a Labeling Behavior is enabled) as well as Data Column Value Extractor results to model a table's structure and return its values.

UI Element

A UI Element is a portion of the Grooper interface that allows users to interact with or otherwise receive information about the application.

Document Viewer

The Grooper Document Viewer is the portal to your documents. It is the UI that allows you to see a Batch Folder's (or a Batch Page's) image, text content, and more.

Node Tree

The Node Tree is the hierarchical list of objects found in the left panel in the "Design" page. It is the basis for navigation and creation in Design.

Overrides

Overrides is a tab provided to allow overriding of default properties set to a Data Element.

Summary Tabs

Content Models and Content Categories have a Summary tab where you can view "Descendant Node Types", Document Types, and Expressions.

Miscellaneous Features

URL Endpoints for Review

Three different URL endpoints can be used to open Review tasks in the Grooper Web Client, given certain information like the Grooper Repository ID, Batch Process name, Batch Id and more. This allows Grooper users to link directly to a Batch in Review with a URL.

@@ Line 3: / Line 3: @@
 '''Batch Process Steps''' configured with specific '''''Activities''''' are frequently referred by the name of the '''''Activity''''' followed by the word "step". For example: '''Classify Step'''.<section end="Activity" />
-<div style="padding-left: 1.5em">
+<div style="padding-left: 1.5em;">
 === Apply Rules ===
 <section begin="Apply Rules" />'''''[[Apply Rules (Activity)|Apply Rules]]''''' is an '''''[[Activity (Property)|Activity]]''''' that runs '''[[Data Rule (Object)|Data Rules]]''' on data that has already been extracted from a [[image:GrooperIcon_Batch.png]] '''[[Batch (Object)|Batch]]'''. A '''[[Batch Process Step (Object)|Batch Process Step]]''' configured with the '''''Apply Rules Activity''''' will always need to be preceded by a '''Batch Process Step''' configured with the '''''Extract Activity'''''. <section end="Apply Rules" />
@@ Line 64: / Line 64: @@
 == Application ==
 <section begin="Application" />A '''Grooper''' [[Repository (Concept)|repository]] consists of a series of [https://en.wikipedia.org/wiki/Table_(information) tables] in a [https://en.wikipedia.org/wiki/Database database], and a '''[[File Store (Object)|File Store]]''' containing relevant files associated to objects that exist within that database. A '''Grooper''' [https://en.wikipedia.org/wiki/Application_software application] is the interface by which a user can interact with that repository of information in an intuitive way.<section end="Application" />
-<div style="padding-left: 1.5em">
+<div style="padding-left: 1.5em;">
 === Grooper Command Console ===
 <section begin="Grooper Command Console" />The '''[[Grooper Command Console (Application)|Grooper Command Console]]''' is a [https://en.wikipedia.org/wiki/Command-line_interface command-line interface] that performs system configuration and administration tasks within '''Grooper'''.<section end="Grooper Command Console" />
@@ Line 73: / Line 73: @@
 == Behavior ==
 <section begin="Behavior" />'''''[[Behaviors (Property)|Behaviors]]''''' is a property of '''[[Content Type (Concept)|Content Types]]''' and '''''[[Export (Activity)|Export]]''''' '''''[[Activity (Property)|Activities]]'''''  that defines configurable actions that automate processing tasks based on the identified '''Content Type''' of a [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder (Object)|Batch Folder]]'''.<section end="Behavior" />
-<div style="padding-left: 1.5em">
+<div style="padding-left: 1.5em;">
 === Export Behavior ===
 <section begin="Export Behavior" />An '''''[[Export Behavior (Behavior)|Export Behavior]]''''' defines the conditions and actions for exporting [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder (Object)|Batch Folders]]''' and their associated data from '''Grooper''' to other systems.<section end="Export Behavior" />
@@ Line 85: / Line 85: @@
 == CMIS Connection Type ==
 <section begin="CMIS Connection Type" />'''''CMIS Connection Type''''', or "binding", establishes the communication protocols used to connect '''Grooper''' with content management systems adhering to the [https://en.wikipedia.org/wiki/Content_Management_Interoperability_Services CMIS] standard.<section end="CMIS Connection Type" />
-<div style="padding-left: 1.5em">
+<div style="padding-left: 1.5em;">
 === AppXtender ===
 <section begin="AppXtender" />The '''''[[AppXtender (CMIS Connection Type)|AppXtender]]''''' '''''CMIS Connection Type''''', or "binding", connects '''Grooper''' to the [https://en.wikipedia.org/wiki/OpenText#AppEnhancer_(formerly_ApplicationXtender) ApplicationXtender] [https://en.wikipedia.org/wiki/Content_management_system content management system] for import and export operations.<section end="AppXtender" />
@@ Line 115: / Line 115: @@
 == Classification Method ==
 <section begin="Classification Method" />The '''''[[Classification Method (Property)|Classification Method]]''''' property determines the technique used for document [[Classification (Concept)|classification]] within a [[image:GrooperIcon_ContentModel.png]] '''[[Content Model (Object)|Content Model]]''', enabling the sorting of [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder (Object)|Batch Folders]]''' into categories based on their content or structure. It can utilize pattern matching, machine learning models, or other methodologies to identify and organize documents accurately.<section end="Classification Method" />
-<div style="padding-left: 1.5em">
+<div style="padding-left: 1.5em;">
 === GPT Embeddings ===
 <section begin="GPT Embeddings" />The '''''[[GPT Embeddings (Classification Method)|GPT Embeddings]]''''' '''''[[Classification Method (Property)|Classification Method]]''''' is an [https://en.wikipedia.org/wiki/OpenAI OpenAI] [https://en.wikipedia.org/wiki/Generative_pre-trained_transformer GPT] training-based [[Classification (Concept)|classification]] approach that uses "embeddings" to tell one document from another.<section end="GPT Embeddings" />
@@ Line 133: / Line 133: @@
 == Collation Provider ==
 <section begin="Collation Provider" />The '''''[[Collation Provider (Property)|Collation]]''''' property of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]''' defines the method for converting its raw results into a final result set, governing how lists of matches from the '''Data Type''' are combined and interpreted to produce the output data of the '''Data Type'''.<section end="Collation Provider" />
-<div style="padding-left: 1.5em">
+<div style="padding-left: 1.5em;">
 === AND ===
 <section begin="AND" />The '''''[[AND (Collation Provider)|AND]]''''' '''''[[Collation Provider (Property)|Collation Provider]]''''' of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]''' returns results only when each individual extractor specified within it gets at least one hit, thus acting as a logical “AND” operator across multiple extractors.<section end="AND" />
@@ Line 163: / Line 163: @@
 == Concept ==
 <section begin="Concept" />There are many objects and properties a user can configure in '''Grooper''', however, gaining an understanding how, why, and when to use these objects and properties is powered by one's understanding of the underlying concepts that define what what these objects and properties are doing and why.<section end="Concept" />
-<div style="padding-left: 1.5em">
+<div style="padding-left: 1.5em;">
 === Activity Processing ===
 <section begin="Activity Processing Concept" />[[Activity Processing (Concept)|Activity Processing]] is a conceptual term that refers to the execution of a sequence of configured tasks, such as [[Classification (Concept)|classification]], [[Data Extraction (Concept)|extraction]], or data enhancement on documents, which are performed within a [[image:GrooperIcon_BatchProcess.png]] '''[[Batch Process (Object)|Batch Process]]''' to transform raw data from documents into structured and actionable information.<section end="Activity Processing Concept" />
@@ Line 313: / Line 313: @@
 == Export Definition ==
 <section begin="Export Definition" />'''''[[Export Definitions (Property)|Export Definitions]]''''' is a property of '''''[[Export Behavior (Behavior)|Export Behaviors]]''''' as defined on '''[[Content Type (Concept)|Content Types]]''' or '''''[[Export (Activity)|Export]]''''' '''''[[Activity (Property)|Activities]]'''''. It defines export connectivity to external systems such as [https://en.wikipedia.org/wiki/File_system file systems], [https://en.wikipedia.org/wiki/Content_management_system content management repositories], [https://en.wikipedia.org/wiki/Database databases], [https://en.wikipedia.org/wiki/Message_transfer_agent mail servers], etc.<section end="Export Definition" />
-<div style="padding-left: 1.5em">
+<div style="padding-left: 1.5em;">
 === CMIS Export ===
 <section begin="CMIS Export" />'''''[[CMIS Export (Export Definition)|CMIS Export]]''''' is an '''''[[Export Definitions (Property)|Export Definition]]''''' available when configuring an '''''[[Export Behavior (Behavior)|Export Behavior]]'''''.  It exports content over a [[image:GrooperIcon_CMISConnection.png]] '''[[CMIS Connection (Object)|CMIS Connection]]''', allowing users to export documents and their [https://en.wikipedia.org/wiki/Metadata metadata] to various [https://en.wikipedia.org/wiki/On-premises_software on-premise] and [https://en.wikipedia.org/wiki/Cloud_storage cloud-based storage platforms].<section end="CMIS Export" />
@@ Line 322: / Line 322: @@
 == Extractor Type ==
 <section begin="Extractor Type" />'''''[[Extractor Type (Property)|Extractor Type]]''''', or value extractor, is a property on a wide array of objects that goes by many different names. It defines a primitive operator which reads data values from the text or visual content of a document. '''''Extractor Types''''' are consumed by higher-level objects such as '''[[Data Element (Concept)|Data Elements]]''', [[Object Nomenclature#Extractor Objects|extractor objects]], '''[[Content Type (Concept)|Content Types]]''' and more.<section end="Extractor Type" />
-<div style="padding-left: 1.5em">
+<div style="padding-left: 1.5em;">
 === Detect Signature ===
 <section begin="Detect Signature" />The '''''[[Detect Signature (Extractor Type)|Detect Signature]]''''' '''''[[Extractor Type (Property)|Extractor Type]]''''' detects signatures within a specified rectangular region on a document page by measuring the fill percentage, providing a method to identify and validate the presence of handwritten signatures.<section end="Detect Signature" />
@@ Line 379: / Line 379: @@
 == Fill Method ==
 <section begin="Fill Method" />The '''''[[Fill Method (Property)|Fill Method]]''''' property on '''[[Data Model (Object)|Data Models]]''', '''[[Data Section (Object)|Data Sections]]''', and '''[[Data Table (Object)|Data Tables]]''' is a collection of various mechanisms that allow for the population of descendant '''[[Data Element (Concept)|Data Elements]]''' of '''Data Models''', '''Data Sections''', and '''Data Tables''' (which can be referred to as "containers"). '''''Fill Methods''''' are secondary extraction operations which populate descendant '''Data Elements''' as they run ''after'' normal extraction.<section end="Fill Method" />
-<div style="padding-left: 1.5em">
+<div style="padding-left: 1.5em;">
 === AI Extract ===
 <section begin="AI Extract" />'''''[[AI Extract (Fill Method)|AI Extract]]''''' is a '''''[[Fill Method (Property)|Fill Method]]''''' that leverages a [https://en.wikipedia.org/wiki/Large_language_model Large Language Model (LLM)] to quickly and easily return extraction results to the child elements of '''[[Data Model (Object)|Data Models]]''', '''[[Data Section (Object)|Data Sections]]''', and '''[[Data Table (Object)|Data Tables]]''' by using the .json structure of the relavent '''[[Data Element (Concept)|Data Elements]]''' as part of the instruction set to the LLM.<section end="AI Extract" />
@@ Line 386: / Line 386: @@
 == IP Command ==
 <section begin="IP Command" />The '''''[[IP Command (Property)|Command]]''''' property of an [[image:GrooperIcon_IPStep.png]] '''[[IP Step (Object)|IP Step]]''' object in '''Grooper''' specifies the [[Image Processing (Concept)|Image Processing (IP)]] command to be executed for that specific step as part of an [[image:GrooperIcon_IPProfile.png]] '''[[IP Profile (Object)|IP Profile]]'''.<section end="IP Command" />
-<div style="padding-left: 1.5em">
+<div style="padding-left: 1.5em;">
 === Barcode Detection ===
 <section begin="Barcode Detection" />The '''''[[Barcode Detection (IP Command)|Barcode Detection]]''''' '''''[[IP Command (Property)|IP Command]]''''' detects and reads [https://en.wikipedia.org/wiki/Barcode barcode] data. The detected barcode information is stored as part of the object's [[Layout Data (Concept)|layout data]].<section end="Barcode Detection" />
@@ Line 410: / Line 410: @@
 == Import Provider ==
 <section begin="Import Provider" />The '''''[[Import Provider (Property)|Provider]]''''' property is a selection of '''''[[Import Provider (Property)|Import Providers]]''''' which enable import of file-based content from a variety of sources such as [https://en.wikipedia.org/wiki/File_system file systems], [https://en.wikipedia.org/wiki/Message_transfer_agent mail servers], and [https://en.wikipedia.org/wiki/Content_repository content repositories].<section end="Import Provider" />
-<div style="padding-left: 1.5em">
+<div style="padding-left: 1.5em;">
 === CMIS Import ===
 <section begin="CMIS Import" />The '''''[[CMIS Import (Import Provider)|CMIS Import]]''''' '''''[[Import Provider (Property)|Import Provider]]''''' used to import content over a [[image:GrooperIcon_CMISConnection.png]] '''[[CMIS Connection (Object)|CMIS Connection]]''', allowing users to import from various [https://en.wikipedia.org/wiki/On-premises_software on-premise] and [https://en.wikipedia.org/wiki/Cloud_storage cloud based storage] platforms.<section end="CMIS Import" />
@@ Line 422: / Line 422: @@
 == Lookup ==
 <section begin="Lookup" />The '''''[[Lookups (Property)|Lookups]]''''' property is a list of lookup operations to be performed on child elements of the associated container. Each Lookup specification defines a lookup operation, where the value of one or more '''Grooper''' fields will be used to query an external [https://en.wikipedia.org/wiki/Datasource data source], such as a [https://en.wikipedia.org/wiki/Database database]. The results of the query can be used to validate existing field values or populate additional field values.<section end="Lookup" />
-<div style="padding-left: 1.5em">
+<div style="padding-left: 1.5em;">
 === CMIS Lookup ===
 <section begin="CMIS Lookup" />'''''[[CMIS Lookup (Lookup)|CMIS Lookup]]''''' is a '''''[[Lookups (Property)|Lookup Specification]]''''' that performs a lookup against a [[image:GrooperIcon_CMISRepository.png]] '''[[CMIS Repository (Object)|CMIS Repository]]''' via a [[CMIS Query|CMISQL Query]].<section end="CMIS Lookup" />
@@ Line 437: / Line 437: @@
 == Object ==
 <section begin="Object" />In '''Grooper''', objects are defined as configurable elements within its hierarchical tree structure. These include nodes and embedded objects that can be manipulated and edited to define the system's behavior, create workflows, and manage content.<section end="Object" />
-<div style="padding-left: 1.5em">
+<div style="padding-left: 1.5em;">
 === Batch ===
 <section begin="Batch" />[[image:GrooperIcon_Batch.png]] '''[[Batch (Object)|Batch]]''' objects are fundamental in '''Grooper's''' architecture as they are the containers of documents that get moved through '''Grooper's''' workflow mechanisms known as [[image:GrooperIcon_BatchProcess.png]] '''[[Batch Process (Object)|Batch Processes]]'''.<section end="Batch" />
@@ Line 568: / Line 568: @@
 == Property ==
 <section begin="Property" />A property is a mechanism by which an object in '''Grooper''' is configured that affects how the object performs its function.<section end="Property" />
-<div style="padding-left: 1.5em">
+<div style="padding-left: 1.5em;">
 === Confidence Multiplier and Output Confidence ===
 <section begin="Confidence Multiplier and Output Confidence" />Some results carry more weight than others.  The '''''[[Confidence Multiplier and Output Confidence (Property)|Confidence Multiplier]]''''' and '''''[[Confidence Multiplier and Output Confidence (Property)|Output Confidence]]''''' properties allow you to manually adjust an [[Data Extraction (Concept)|extraction]] result's confidence.<section end="Confidence Multiplier and Output Confidence" />
@@ Line 611: / Line 611: @@
 <section begin="Vertical Wrap" />'''''[[Vertical Wrap (Property)|Vertical Wrap]]''''' is a property of certain '''''[[Extractor Type (Property)|Extractor Types]]''''' and a '''[[Content Type (Concept)|Content Type's]]''' ''[[Labeling Behavior (Behavior)|Labeling Behavior]]'' used to provide simplified [[Data Extraction (Concept)|extraction]] of vertically wrapped text (typically stacked labels).<section end="Vertical Wrap" />
 </div>
+== Repository Option ==
+<section begin="Repository Option" />The '''''[[Repository Option (Property)|Options]]''''' property of the {{GrooperRootIcon}} '''[[Root (Object)|Root]]''' object is a collection of optional features that affect the entire repository. These optional features enable entire collections of functionality that otherwise do not work without first establishing the connections these options provide.<section end="Repository Option" />
+<div style="padding-left: 1.5em;">
+=== LLM Connector ===
+<section begin="LLM Connector" />'''''[[LLM Connector (Repository Option)|LLM Connector]]''''' is a '''''[[Repository Opion (Property)|Repository Option]]''''' that enables [https://en.wikipedia.org/wiki/OpenAI OpenAI-based] functionality for the local '''Grooper''' repository.<section end="LLM Connector" />
+</div>
 == Section Extract Method ==
 <section begin="Section Extract Method" />The '''''Extract Method''''' property of a [[image:GrooperIcon_DataSection.png]] '''[[Data Section (Object)|Data Section]]''' defines a "Section Extract Method" which specifies how section instances will be identified and extracted.<section end="Section Extract Method" />
-<div style="padding-left: 1.5em">
+<div style="padding-left: 1.5em;">
 === Nested Table ===
 <section begin="Nested Table" />'''''[[Nested Table (Section Extract Method)|Nested Table]]''''' is a "Section Extract Method" enabled for a [[image:GrooperIcon_DataSection.png]] '''[[Data Section (Object)|Data Section]]''' using the '''''Extract Method''''' property. This method divides a document into sections by extracting table data within those sections. This gives '''Grooper''' users a method for extracting hierarchical tables as well as dividing up a document into sections where each of those sections have the same table (or at least tabular data which can be extracted by a single [[image:GrooperIcon_DataTable.png]] '''[[Data Table (Object)|Data Table]]''' object).<section end="Nested Table" />
@@ Line 623: / Line 628: @@
 == Separation Provider ==
 <section begin="Separation Provider" />The '''''[[Separation Provider (Property)|Provider]]''''' property of the '''''[[Separate (Activity)|Separate]]''''' '''''[[Activity (Property)|Activity]]''''' defines the type of [[Separation (Concept)|separation]] to be performed at the designated '''''[[Scope (Property)|Scope]]'''''.<section end="Separation Provider" />
-<div style="padding-left: 1.5em">
+<div style="padding-left: 1.5em;">
 === Change in Value Separation ===
 <section begin="Change in Value Separation" />The '''''[[Change in Value Separation (Separation Provider)|Change in Value]]''''' '''''[[Separation Provider (Property)|Separation Provider]]''''' creates a new folder and separates every time an extracted value changes from one [[image:GrooperIcon_BatchPage.png]] '''[[Batch Page (Object)|Batch Page]]''' to another.<section end="Change in Value Separation" />
@@ Line 650: / Line 655: @@
 == Service ==
 <section begin="Service" />[[Grooper Service (Concept)|Grooper Service]] is a conceptual term that refers to the various [https://en.wikipedia.org/wiki/Computer_program executable programs] that run as a [https://en.wikipedia.org/wiki/Windows_service Windows Services] to facilitate '''Grooper''' processing. Service instances are installed, configured, started and stopped using [[Grooper Config (Application)|Grooper Config]].<section end="Service" />
-<div style="padding-left: 1.5em">
+<div style="padding-left: 1.5em;">
 === API Services ===
 <section begin="API Services" />You can perform [[image:GrooperIcon_Batch.png]] '''[[Batch (Object)|Batch]]''' processing via [https://en.wikipedia.org/wiki/REST REST] [https://en.wikipedia.org/wiki/API API] web calls by installing  '''''[[API Services (Service)|API Services]]'''''.<section end="API Services" />
@@ Line 664: / Line 669: @@
 <section end="Import Watcher Service" />
 </div>
 == Table Extract Method ==
 <section begin="Table Extract Method" />The '''''[[Table Extract Method (Property)|Extract Method]]''''' property of a [[image:GrooperIcon_DataTable.png]] '''[[Data Table (Object)|Data Table]]''' sets a '''''Table Extract Method''''' which defines the settings and logic for the '''Data Table''' to perform [[Data Extraction (Concept)|extraction]].<section end="Table Extract Method" />
-<div style="padding-left: 1.5em">
+<div style="padding-left: 1.5em;">
 === Delimited Extract ===
 <section begin="Delimited Extract" />The '''''[[Delimited Extract (Table Extract Method)|Delimited Extract]]''''' '''''[[Table Extract Method (Property)|Table Extract Method]]''''' extracts tabular data from a [https://en.wikipedia.org/wiki/Delimiter-separated_values delimiter-separated] text file, such as a [https://en.wikipedia.org/wiki/Comma-separated_values CSV file].<section end="Delimited Extract" />
@@ Line 685: / Line 689: @@
 == UI Element ==
 <section begin="UI Element" />A UI Element is a portion of the '''Grooper''' interface that allows users to interact with or otherwise receive information about the application.<section end="UI Element" />
-<div style="padding-left: 1.5em">
+<div style="padding-left: 1.5em;">
 === Document Viewer ===
 <section begin="Document Viewer" />The [[Document Viewer (UI Element)|Grooper Document Viewer]] is the portal to your documents. It is the UI that allows you to see a [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder (Object)|Batch Folder's]]''' (or a [[image:GrooperIcon_BatchPage.png]] '''[[Batch Page (Object)|Batch Page's]]''') image, text content, and more.<section end="Document Viewer" />