Glossary
Activity
Activity is a property on
Batch Process Step objects. Activities define specific document processing operations done to a
Batch,
Batch Folder, or
Batch Page.
Batch Process Steps configured with specific Activities are frequently referred by the name of the Activity followed by the word "step". For example: Classify Step.
Classify
Classify is an Activity that "classifies"
Batch Folders in a
Batch by assigning them a Content Type using patterns, lexical understanding, or rules as defined by a
Content Model.
Clip Frames
The Clip Frames Activity extracts defined areas from microfiche card images, creating new image frames or layers for focused analysis or processing.
Correct
The Correct Activity performs spell correction on the textual content of
Batch Folders or specific Data Elements, enhancing the accuracy of data extraction by resolving recognition errors .
Detect Frames
The Detect Frames Activity locates and identifies frame lines on microfiche card images, enabling the isolation of areas within the frames for further data extraction or processing.
Execute
The Execute Activity runs a specified child command, allowing for the modular and controlled execution of tasks within a larger automated workflow.
Export
The Export Activity facilitates the transfer of documents and extracted information to external systems or formats, completing the data processing workflow.
Extract
The Extract Activity retrieves relevant information, defined by Data Elements, from
Batch Folders, transforming unstructured or semi-structured content into structured, usable data.
Image Processing
The Image Processing Activity enhances and optimizes
Batch Pages for better recognition and data extraction results.
Initialize Card
The Initialize Card Activity prepares and configures microfiche card images for further processing.
Recognize
The Recognize Activity interprets
Batch Pages and
Batch Folders, converting them into machine-readable text and capturing layout data for comprehensive analysis and data extraction. This will attach a text and/or layoutData file to the respective object.
Render
The Render Activity normalizes electronic document content from file formats Grooper cannot read innately to a PDF format. This allows Grooper to extract the text via the Recognize Activity.
Review
The Review Activity facilitates human evaluation and validation of processed
Batch Folders and extracted data for accuracy and completeness.
Send Mail
The Send Mail Activity automates the dispatch of emails with or without attachments, based on
Batch Process events and conditions.
Separate
The Separate Activity sorts
Batch Pages into individual
Batch Folders, distinguishing them for independent processing and organization.
Split Pages
Multi-page documents (typically PDFs and TIFFs) come into Grooper represented as single
Batch Folders. The Split Pages Activity exposes
Batch Pages as child objects of the
Batch Folders for individualized processing and handling.
XML Transform
The XML Transform Activity applies XSLT stylesheets to XML data to modify or reformat the output structure for various purposes.
Application
A Grooper repository consists of a series of tables in a database, and a File Store containing relevant files associated to objects that exist within that database. An Grooper application is the interface by which a user can interact with that repository of information in an intuitive way.
Grooper Command Console
The Grooper Command Console is a command-line interface that performs system configuration and administration tasks within Grooper.
Web Client
The Grooper Web Client allows users to connect to Grooper via a web browser using a URL. The URL is pointed at a website hosted by a server on which Grooper is installed and Internet Information Services configured.
Behavior
Behaviors is a property of Content Types and Export Activities that defines configurable actions that automate processing tasks based on the identified Content Type of a
Batch Folder.
Export Behavior
An Export Behavior defines the conditions and actions for exporting
Batch Folders and their associated data from Grooper to other systems.
Labeling Behavior
A Labeling Behavior is a Content Type Behavior designed to collect and utilize a document's field labels in a variety of ways. This includes functionality for Classification and Extraction.
PDF Data Mapping
PDF Data Mapping is a Content Type Behavior designed to create an exportable PDF file with additional native PDF elements.
CMIS Connection Type
CMIS Connection Type, or "binding", establishes the communication protocols used to connect Grooper with content management systems adhering to the CMIS standard.
AppXtender
The AppXtender CMIS Connection Type, or "binding", connects Grooper to the ApplicationXtender content management system for import and export operations.
Box
The Box CMIS Connection Type, or "binding", connects Grooper to the Box content management system for import and export operations.
Exchange
The Exchange CMIS Connection Type, or "binding", connects Grooper to the Microsoft Exchange Server mail server for import and export operations.
FTP
The FTP CMIS Connection Type, or "binding", connects Grooper to FTP directories for import and export operations.
IMAP
The IMAP CMIS Connection Type, or "binding", connects Grooper to email messages and folders through an IMAP email server.
NTFS
The NTFS CMIS Connection Type, or "binding", connects Grooper to files and folders in the Microsoft Windows NTFS file system.
OneDrive
The OneDrive CMIS Connection Type, or "binding", connects Grooper to Microsoft OneDrive cloud services.
SFTP
The SFTP CMIS Connection Type, or "binding", connects Grooper to SFTP directories for import and export operations.
The SharePoint CMIS Connection Type, or "binding", connects Grooper to Microsoft SharePoint, providing access to content stored in "document libraries" and "picture lLibraries".
Classification Method
The Classification Method property determines the technique used for document classification within a
Content Model, enabling the sorting of
Batch Folders into categories based on their content or structure. It can utilize pattern matching, machine learning models, or other methodologies to identify and organize documents accurately.
Labelset-Based
Labelset-Based is a Classification Method that leverages the labels defined via a Labeling Behavior to classify
Batch Folders.
Lexical
The Lexical Classification Method classifies
Batch Folders based on their text content by utilizing either pre-configured training or rules. This is achieved through the analysis of word frequencies or defined rules that identify document types .
Rules-Based
The Rules-Based Classification Method employs defined "rules" on
Document Types to classify
Batch Folders, utilizing Positive Extractor and Negative Extractor properties to accurately categorize them through rule application, thereby ensuring
Batch Folders match predefined criteria .
Visual
The Visual Classification Method uses image data instead of text data to determine the
Document Type assigned to a
Batch Folder during classification. Instead of using text-based extractors, an
IP Profile is used with an Extract Features IP Command to obtain data pertaining to a
Batch Folder's image(s). Document samples are trained as examples of a Document Type.
Collation Provider
The Collation property of a
Data Type defines the method for converting its raw results into a final result set, governing how lists of matches from the Data Type are combined and interpreted to produce the output data of the Data Type.
AND
The AND Collation Provider of a
Data Type returns results only when each individual extractor specified within it gets at least one hit, thus acting as a logical “AND” operator across multiple extractors .
Array
The Array Collation Provider of a
Data Type matches a list of values arranged in horizontal, vertical, or flow order, combining instances that qualify into a single result .
Combine
The Combine Collation Provider of a
Data Type combines instances from returned results based on a specified grouping, controlling how extractor results are assembled together for output.
Key-Value List
The Key-Value List Collation Provider of a
Data Type matches instances where a key and a list of one or more values appear together on the document, adhering to a specific layout pattern .
Key-Value Pair
The Key-Value Pair Collation Provider of a
Data Type matches instances where a key is paired with a value on the document in a specific layout, essential for extracting label-value pairs .
Multi-Column
The Multi-Column Collation Provider of a
Data Type combines multiple columns on a page into a single column for extraction.
Ordered Array
The Ordered Array Collation Provider of a
Data Type finds sequences of values where one result is present for each extractor, in the order they appear .
Pattern-Based
The Pattern-Based Collation Provider of a
Data Type uses regular expressions to sequence returned results into a final result set.
Split
The Split Collation Provider of a
Data Type separates a data instance at each match returned by the Data Type.
Concept
There are many objects and properties a user can configure in Grooper, however, gaining an understanding how, why, and when to use these objects and properties is powered by one's understanding of the underlying concepts that define what what these objects and properties are doing and why.
Activity Processing
Activity Processing is a conceptual term that refers to the execution of a sequence of configured tasks, such as classification, extraction, or data enhancement on documents, which are performed within a
Batch Process to transform raw data from documents into structured and actionable information.
CMIS+
CMIS+ is a conceptual term that refers to Grooper's CMIS+ architecture that provides a standardized access to document content and metadata across a variety of external storage platforms.
CMIS
CMIS is a conceptual term that refers to Content Management Interoperability Services: an open standard allowing different content management systems to share information over the Internet.
CMIS Query
CMIS Query is a conceptual term that refers to the fact that CMIS Queries are utilized to search documents in CMIS Repositories and to filter documents upon import when using the Import Query Results Import Provider.
CSS Data Viewer Styling
CSS Data Viewer Styling is a conceptual term that refers to the idea that the Grooper Web Client's Data View task view of the Review interface is styled using CSS. This gives you a great deal of control over a
Data Model's appearance and layout during document review.
Classification
Classification is a conceptual term that refers to the process of identifying and organizing documents into categorical types based on their content or layout, often using machine learning, rules, or pattern recognition for efficient document management and data extraction workflows. Specifically, the Classify Activity will assign a Content Type to a
Batch Folder.
Code Expressions
Code Expressions (not to be confused with regular expressions) is a conceptual term that refers to snippets of VB.Net code that expand Grooper’s core functionality.
Combined Methods
Combined Methods is a conceptual term that refers to the idea that a user can leverage multiple Classification Methods to overcome the shortcomings of an individual method.
Content Type
Content Type is a conceptual term that refers to the grouping of three Grooper objects:
Content Models,
Content Categories, and
Document Types.
Data Context
Data Context is a conceptual term that gives definition to data that, without it, is otherwise meaningless.
Data Element
Data Element is a conceptual term that refers to the grouping of five Grooper objects:
Data Models,
Data Sections,
Data Fields,
Data Tables, and
Data Columns.
Data Extraction
Data Extraction is a conceptual term that involves identifying and capturing specific information from
Batch Folders like forms or invoices using a set of configurable Data Extractors, which transform unstructured or semi-structured data into a structured, usable format for processing and analysis.
Data Extractor
Data Extractor is a conceptual term that refers to the grouping of all Extractor Types and extractor objects.
Data Instance
Data Instance is a conceptual term that refers to an encapsulation of text data within a document. Data instances are the hierarchy of text data that Grooper's extraction mechanisms create.
EDI Integration
EDI Integration is a conceptual term that refers to Grooper's ability to process EDI files.
Expressions
Expressions (not to be confused with regular expressions) is a conceptual term that refers to snippets of VB.Net code that expand Grooper’s core functionality.
Expressions Cookbook
Expressions Cookbook is a conceptual term that refers to a reference list for commonly used expressions in Grooper.
Field Mapping
Field Mapping is a conceptual term that refers to how logical connections are made between metadata content in Grooper and an external storage platform.
Five Phases of Grooper
Five Phases of Grooper is a conceptual term that seeks to build understanding of how documents are processed through Grooper.
Flow Collation
Flow Collation is a conceptual term used to define a type of layout used in Collation Providers of
Data Types.
Footer Rows and Footer Modes is a conceptual term that refers to how a "footer row" (enabled by the Generate Footer Row property of a
Data Table) provides Grooper users a quick way to validate numerical data in a
Data Column. The Data Column's Footer Mode property controls if and how a total is determined for numerical values in a Data Column.
Fuzzy RegEx
Fuzzy RegEx is a conceptual term that refers to the usage of fuzzy logic within Extractor Types that leverage regular expressions to match patterns via the enabling of the Fuzzy Matching' property.
GPT Integration
GPT Integration is a conceptual term that refers to the usage of OpenAI's GPT models within Grooper to enhance the capabilities of data extractors, classification, and lookups.
Grooper Infrastructure
Grooper Infrastructure is a conceptual term that refers to computing underpinnings of what makes up a Grooper repository and the software that allows interface with it.
Grooper Repository
Grooper Repository is a conceptual term that refers to the environment used to create, configure and execute objects in Grooper. It provides the framework to "do work" in Grooper.
Grooper Service
Grooper Service is a conceptual term that refers to the various executable programs that run as a Windows Services to facilitate Grooper processing. Service instances are installed, configured, started and stopped using Grooper Config.
Image Processing
Image Processing is a conceptual term that refers to how Grooper applies a variety of techniques to enhance scanned documents' quality, improving OCR accuracy by removing imperfections and adjusting visual characteristics to prepare images for data extraction and classification.
Import Mode and Document Linking
Import Mode and Document Linking is a conceptual term that refers to the usage of the Import Mode property. This affects whether or not an imported document maintains a link to its original file and/or if a copy of the file is made on import or not.
LINQ to Grooper Objects
LINQ to Grooper Objects is a conceptual term that refers to the ability of Grooper to leverage LINQ syntax in expressions.
Layered OCR
Layered OCR is a conceptual term that refers to the usage of the Layered OCR setting of the OCR Engine property of an
OCR Profile. The use of this setting enables the usage of secondary OCR Profiles on a single page. The OCR results from these secondary OCR Profiles are merged with (or layered on top of) the primary OCR Profile's results.
Layout Data
Layout Data is a conceptual term that refers to information such as line locations, OMR checkbox locations and states, barcode values, and detected shapes captured by certain image processing commands. This data is stored as an attached file on a
Batch Folder or
Batch Page object and can later be recalled by various functions within Grooper that rely on the presence of that data to function.
Microfiche Processing
Microfiche Processing is a conceptual term that refers to how Grooper leverages several IP Commands to accurately process microform documents.
Microsoft Office Integration
Microsoft Office Integration is a conceptual term that refers to Grooper's ability to convert Microsoft Word and Microsoft Excel files into formats that Grooper can read.
OCR
OCR is a conceptual term that stands for Optical Character Recognition. It allows text from paper documents to be digitized, in order to be searched or edited by other software applications. OCR converts typed or printed text from digital images of physical documents into machine readable, encoded text
OCR Synthesis
OCR Synthesis is a conceptual term that refers to Grooper's unique method of pre-processing and re-processing raw results from the OCR Engine to get better results out of it.
Object Nomenclature
Object Nomenclature is a conceptual term that refers to the idea that mastery of a Grooper environment is greatly enhanced by understanding the myriad of objects that can exist and how they are related.
PDF Page Types
PDF Page Types is a conceptual term that refers to specific types of PDF pages. Page types describe the kind of content in a PDF page and informs Grooper how certain Activities should process the page. For example, "single image" pages are OCR'd by the Recognize activity where "text only" pages have their native text extracted.
Regular Expression
Regular Expression is a conceptual term that refers to a standard syntax designed to parse text strings. This is a way of finding information in a block of text. It is the primary method by which Grooper extracts and returns data from documents.
Repository
Repository is a conceptual term that refers to a location where files and/or data is stored and managed.
Separation
Separation is a conceptual term that refers to the process of taking an unorganized
Batch of loose
Batch Pages and organizing them into document folders. This is done so Grooper can later assign a Document Type to each document folder in a process known as Classification.
TF-IDF
TF-IDF is a conceptual term that refers to (term frequency-inverse document frequency), a numerical statistic intended to reflect how important a word is to a document within a collection (or document set or corpus). It is how Grooper uses machine learning for training-based document classification (via the Lexical method) and data extraction (via the
Field Class extractor).
Table Extraction
Table Extraction is a conceptual term that refers to Grooper's functionality to extract data from cells in tables. This is accomplished by configuring the
Data Table and its child
Data Column Data Elements in a
Data Model.
Test Batch
Test Batch is a conceptual term that refers to any
Batch created in the Test folder of the Batches folder in the Node Tree).
Thread
Thread is a conceptual term that refers to the smallest unit of processing that can be performed within an operating system.
Training-Based Approaches to Document Classification
Training-Based Approaches to Document Classification is a conceptual term that refers to an approach to document classification that classifies
Batch Folders according to the similarity of unclassified Batch Folders to trained examples of that kind of Document Type.
Training Batch
Training Batch is a conceptual term that refers to a more convenient way to work with all of the samples a
Concent Model has been trained against. You can also still look at the Form Types underneath each Content Type, but the Training Set can show you all the samples in one place.
UNC Path
UNC Path is a conceptual term that refers to UNC (Universal Naming Convention) which is a standard used in Microsoft Windows for accessing shared network folders.
URL Endpoints for Review
URL Endpoints for Review is a conceptual term that refers to three URL endpoints that can be used to open Review tasks in the Grooper Web Client, given certain information like the Grooper Repository ID,
Batch Process name,
Batch Id and more.
Waterfall Classification
Waterfall Classification is a conceptual term that refers to a classification notion in Grooper that manipulates the Positive Extractor property to prioritize training similarity in order to achieve a middle ground between high specificity and accuracy, and generality with minimal accuracy. This is helpful whenever Batch Folders get misclassified, and simply retraining won't help.
XML Schema Integration
XML Schema Integration is a conceptual term that refers to Grooper's ability to interact with XML schemas and the configuration required to do so.
Export Definition
Export Definitions is a property of Export Behaviors as defined on Content Types or Export Activities. It defines export connectivity to external systems such as file systems, content management repositories, databases, mail servers, etc.
CMIS Export
CMIS Export is an Export Definition available when configuring an Export Behavior. It exports content over a
CMIS Connection, allowing users to export documents and their metadata to various on-premise and cloud-based storage platforms.
Data Export
Data Export is an Export Definition available when configuring an Export Behavior. It exports extracted document data over a
Data Connection, allowing users to export data to a Microsoft SQL Server or ODBC compliant database.
Extractor Type
Extractor Type, or value extractor, is a property on a wide array of objects that goes by many different names. It defines a primitive operator which reads data values from the text or visual content of a document. Extractor Types are consumed by higher-level objects such as Data Elements, extractor objects, Content Types and more.
Detect Signature
The Detect Signature Extractor Type detects signatures within a specified rectangular region on a document page by measuring the fill percentage, providing a method to identify and validate the presence of handwritten signatures.
Field Match
The Field Match Extractor Type matches the value stored in a previously-extracted
Data Field or
Data Column, allowing for consistency and reference across different parts of a document or dataset.
Find Barcode
The Find Barcode Extractor Type searches the
Batch Folder layout data for a barcode, capturing its value upon detection .
GPT Complete
The GPT Complete Extractor Type leverages OpenAI's GPT model to generate completions for inputs, returning one hit for each result choice provided by the model's response.
Highlight Zone
The Highlight Zone Extractor Type sets a highlight region on a document without performing any actual data extraction, effectively marking areas of interest or importance.
Label Match
The Label Match Extractor Type matches a list of one or more label values using matching options defined by a Labeling Behavior. It works similarly to List Match, but uses shared settings defined in a Labeling Behavior for Fuzzy Matching, Vertical Wrap, and Constrained Wrap.
Labeled OMR
The Labeled OMR Extractor Type is used to output OMR checkbox labels. It determines whether labeled checkboxes are checked or not. If checked, it outputs the label(s) as the result.
Labeled Value
The Labeled Value Extractor Type identifies and extracts information from a field presented as a label-value pair on a document, by matching a set of labels and a set of values, and determining pairs based on their geometric clustering on the document.
List Match
The List Match Extractor Type is designed to return values matching one or more items in a defined list. By default, the List Match extractor does not use or require regular expression.
Ordered OMR
The Ordered OMR Extractor Type is similar to a Labeled OMR in that it is used to return OMR check box information. Rather than relying on a label for the extraction, the Ordered OMR returns information for multiple check boxes within a given zone based on their order and layout.
Pattern Match
The Pattern Match Extractor Type extracts values from a document that match a specified regular expression, allowing for the detection of data following a known format or pattern.
Query HTML
The Query HTML Extractor Type queries an HTML document using a CSS selector and returns the inner text of each matching element.
Read Barcode
The Read Barcode Extractor Type uses barcode recognition technology to read and extract values from barcodes found in the document content.
Read Meta Data
The Read Meta Data Extractor Type retrieves metadata values associated with a document .
Read Zone
The Read Zone Extractor Type allows you to extract text data in a rectangular region (called a "extraction zone" or just "zone") on a document. This can be a fixed zone, extracting text from the same location on a document, or a zone relative to an extracted text anchor or shape location on the document.
Reference
The Reference Extractor Type allows for the referencing of an external extractor object to be used within a Grooper object's configuration, enabling consistent extraction logic across different objects.
Word Match
The Word Match Extractor Type extracts individual words or phrases containing multiple words from documents. It is designed to collect full words and is often used in n-gram extraction.
Zonal OMR
The Zonal OMR Extractor Type reads one or more checkboxes using manually-configured zones. It is mostly an outdated tool and should only be used if all other OMR extractor options have been exhausted. It requires the most manual setup of any OMR extractor to configure.
IP Command
The Command property of an
IP Step object in Grooper specifies the Image Processing (IP) command to be executed for that specific step as part of an
IP Profile.
Barcode Detection
The Barcode Detection IP Command detects and reads barcode data. The detected barcode information is stored as part of the object's layout data.
Binarize
The Binarize IP Command converts a color or grayscale image to black and white using various thresholding methods.
Extract Page
The Extract Page IP Command removes an image from a carrier image while simultaneously removing any image warping or skewing.
Line Removal
The Line Removal IP Command removes horizontal and vertical lines from documents.
Scratch Removal
The Scratch Removal IP Command detects and removes or repairs scratches from film-based images.
Shape Detection
The Shape Detection IP Command detects shapes on a document matching sample images given by the user.
Shape Removal
The Shape Removal IP Command detects and removes shapes from documents.
Import Provider
The Provider property is a selection of Import Providers which enable import of file-based content from a variety of sources such as file systems, mail servers, and content repositories.
CMIS Import
The CMIS Import Import Provider used to import content over a
CMIS Connection, allowing users to import from various on-premise and cloud based storage platforms.
Import Descendants
Import Descendants is one of two Import Provider that use
CMIS Connections to import document content into Grooper.
Import Query Results
Import Query Results is one of two Import Provider that use
CMIS Connections to import document content into Grooper.
Lookup
The Lookups property is a list of lookup operations to be performed on child elements of the associated container. Each Lookup specification defines a lookup operation, where the value of one or more Grooper fields will be used to query an external data source, such as a database. The results of the query can be used to validate existing field values or populate additional field values.
CMIS Lookup
CMIS Lookup is a Lookup Specification that performs a lookup against a
CMIS Repository via a CMISQL Query.
Database Lookup
Database Lookup is a Lookup Specification that performs a lookup against a
Data Connection via a SQL query.
Web Service Lookup
Web Service Lookup is a Lookup Specification that looks up external data at an API endpoint by calling a web service.
Object
In Grooper, objects are defined as configurable elements within its hierarchical tree structure. These include nodes and embedded objects that can be manipulated and edited to define the system's behavior, create workflows, and manage content .
Batch
Batch objects are fundamental in Grooper's architecture as they are the containers of documents that get moved through Grooper's workflow mechanisms known as
Batch Processes.
Batch Folder
Batch Folder objects are defined as container objects within a
Batch that are used to represent and organize both folders and pages. They can hold other Batch Folders or
Batch Page objects as children. The Batch Folder acts as an organizational unit within a Batch, allowing for a structured approach to managing and processing a collection of documents.
Batch Folders are frequently referred to simply as "documents".
Batch Page
Batch Page objects represent individual pages within a
Batch. The Batch Page object is the most granular unit in the hierarchy of Batch Objects in Grooper.
Batch Pages are frequently referred to simply as "pages".
Batch Process
Batch Process objects are crucial components in Grooper's architecture. A Batch Process orchestrates the document processing strategy and ensures each
Batch of documents is managed systematically and efficiently.
Batch Processes by themselves do nothing. Instead, the workflows they execute are designed by adding child
Batch Process Steps.
Batch Process Step
Batch Process Step objects are specific actions within the sequence defined by a
Batch Process. A Batch Procses Step plays a critical role in automating and managing the flow of documents through the various stages of processing within Grooper.
CMIS Connection
CMIS Connection objects' provide a standardized way of connecting to various content management systems (CMS).
- For those that support the CMIS standard, the CMIS Connection connects to the CMS using the CMIS standard.
- For those that do not, the CMIS Connection normalizes connection and transfer protocol as if they were a CMIS platform.
These objects allows Grooper to communicate with multiple external storage platforms, enabling access to documents and content that reside outside of Grooper's immediate environment.
CMIS Repository
CMIS Repository objects in Grooper allow access to external documents through a
CMIS Connection. They allows managing and interacting with those documents within Grooper's framework as if they were local. They are created as a child object of a CMIS Connection and used for various Activities.
Content Category
Content Category objects are containers within a
Content Model that hold other Content Categories and
Document Type objects. They allow for further classification and grouping of Document Types within a taxonomy, aiding in the logical structuring of complex document sets. Besides grouping Document Types together, Content Categories also serve to create new branches in a Data Element hierarchy.
In most cases Content Categories are used as organizational buckets to group like Document Types together.
Content Model
Content Model objects define the taxonomy of document sets in terms of the
Document Type they contains. They also house the Data Elements that appear on each
Content Category and Document Type within them. Content Models serve as the root of a Content Type hierarchy and are crucial for organizing the different types of documents that Grooper can recognize and process.
Data Column
Data Column objects are child objects of a
Data Table, representing individual columns and defining the type of data each column holds along with its data extraction properties.
Data Connection
Data Connection objects define the settings for connecting to and interacting with a database. These interactions may include conducting lookups, exports, or other actions that relate to database management systems (DBMS). Once configured, a Data Connection object can be referenced by other components in Grooper for various DBMS-related activities.
Data Field
Data Field objects are created as child objects of a
Data Model. A Data Field is a representation of a single piece of data targeted for extraction on a document.
Data Fields are frequently referred to simply as "fields".
Data Model
Data Model objects serve as the top-tier structure defining the taxonomy for Data Elements and are leveraged during the Extract Activity to extract data from a
Batch Folders. They are a hierarchy of Data Elements that sets the stage for the extraction logic and review of data collected from documents.
Data Rule
Data Rule objects define the logic for automated data manipulation which occurs after data has been extracted from
Batch Folders. These rules are applied to normalize or otherwise prepare data collected in a
Data Model for downstream processes. Data Rules ensure that extracted data conforms to expected formats or meets certain quality standards.
Data Section
Data Section objects are grouping mechanisms for related
Data Fields. Data Sections organize and segment child Data Elements into logical divisions of a document based on the structure and semantics of the information the documents contain.
Data Table
Data Table objects are utilized for extracting repeating data that's formatted in rows and columns, allowing for complex multi-instance data organization that would be present in table-formatted content.
Data Type
Data Type objects hold a collection of child, referenced, and locally defined Data Extractors and settings that manage how multiple (even differing) matches from Data Extractors are consolidated (via Collation) into a result set.
Document Type
Document Type objects represent a distinct type of document, like an invoice or contract. Document Types are created as children of a
Content Model or a
Content Category and are used to classify individual
Batch Folders. Each Document Type in the hierarchy defines the Data Elements and Behaviors that apply to Batch Folders of that specific classification.
Field Class
Field Class objects are trainable extractors that distinguish between multiple instances of similar data within a document by understanding the context in which they occur. Field Classes can be configured to distinguish values within highly structured documents, but this type of extraction is better suited to simpler "Extractor Objects" like
Value Readers or
Data Types.
Field Classes are most useful when attempting to find values within the flow of natural language. This method involves training with positive and negative examples to distinguish the right context. You'd opt for a Field Class when the value you're after is an entire clause within a contract, or a specific value defined within the flow of text.
File Store
File Store objects define a storage location within Grooper where file content associated with nodes are saved. They are crucial for managing the content that forms the basis of the Grooper's processing tasks, allowing for the storage and retrieval of documents, images, and other "files". Not every object in Grooper will have files connected to it, but if it does, those files are stored in the location defined by this object.
Form Type
Form Type objects represent trained variations of a
Document Type. These objects store machine learning training data for Lexical and Visual document classification methods.
IP Group
IP Group objects are child objects within
IP Profiles that create a hierarchical structure for organizing image processing commands. IP Groups may contain other IP Groups or
IP Step objects.
IP Profile
IP Profile objects detail the operations and parameters for image enhancement and cleanup. These operations improve the accuracy of further processing steps, like the Recognize and Classify Activities.
IP Step
IP Step objects are the basic units within an
IP Profile that define a single image processing operation. IP Steps are performed sequentially within their parent
IP Group or IP Profile.
Lexicon
Lexicon objects are dictionary objects that store a list of keys or key-value pairs. Lexicons can define local entries and/or import entries from other Lexicons and even import entries using a Data Connection. The entries in a Lexicon can be utilized in different areas of Grooper, such as data extraction, Fuzzy Matching, or OCR Correction, providing a reference point that enhances the accuracy and consistency of the software's operations.
Machine
Machine objects represent servers that have connected to the Grooper repository. They allow for the management of Grooper Service instances and serve as a connection points for processing jobs to be executed on the server hardware. Machine objects are essential for the scaling of processing capabilities and for distributing processing loads across multiple servers
OCR Profile
OCR Profile objects configure the settings for optical character recognition (OCR) leveraged by the Recognize activity. OCR converts images of text into machine-encoded text. OCR Profile objects influence how effectively textual content is recognized and from Batch Pages.
Object Library
Object Library objects are .NET libraries that contain code files for customizing the functionality of Grooper. These libraries are used for a range of customization and integration tasks, allowing users to extend Grooper's capabilities by adding:
- custom activities that execute within Batch Processes
- creating custom commands available during the Review Activity
- defining custom methods that can be called from expressions on Data Field and Batch Process Step objects
- establish custom services that perform automated background tasks at regular intervals
Processing Queue
Processing Queue objects are designed for tasks performed by
Machines, which include automated steps in the document processing lifecycle. Processing Queues are used to distribute machine tasks among different servers and control the concurrency or processing rate of these tasks.
- For example, activities such as Render or Export can be managed so that only one activity instance runs per machine or so multiple instances are processed concurrently, according to the queue configuration.
Project
Project objects are collections of resources and serve as the primary containers for design components within Grooper. The Project object is where various processing objects such as
Content Models,
Batch Processes, Profile Objects, and more are organized and managed. It allows for the encapsulation and modularization of these resources for easier management and reusability.
Resource File
A Resource File object in Grooper is essentially a file that is stored as part of a Grooper
Project. It can include various types of files such as text files or XML schema files.
Review Queue
Review Queue objects are designated for human-performed tasks. They organizes the Review tasks that require human attention and can distribute these tasks among different groups of users based on the queue's settings. Review Queues can be assigned on the
Batch Process level to filter work by an entire process or Review Activities at the
Batch Process Step level to filter tasks at a more granular step-based level.
Root
The
Root object represents the topmost element of the Grooper repository. It serves as the starting point from which all other objects branch out. It is the anchor point for all other structures within the repository and a necessary element for the organization and linkage of all other objects within Grooper.
Scanner Profile
Scanner Profile objects outline the specifications for scanning physical documents into digital forms. This includes settings like resolution, color mode, and any post-scan image processing or enhancement functions.
See Desktop Scanning in Grooper for more information.
Separation Profile
Separation Profile objects contain rules and settings that determine how groupings of scanned pages are separated into individual
Batch Folders, often using barcodes, blank pages, or patch codes as indicators for separation points.
Value Reader
Value Reader objects define a single data extraction operation. You set the Extractor Type on the Value Reader that matches the specific data you're aiming to capture. For example, you would use the Pattern Match Extractor Type to return data using regular expression. You would use a Value Reader when you need to extract a single result or list of simple results from a document.
Property
Confidence Multiplier and Output Confidence
Constrained Wrap
Content Type Filter
OCR Engine
Output Extractor Key
Paragraph Marking
Permission Sets
Scope
Secondary Types
Tab Marking
Vertical Wrap
Section Extract Method
Nested Table
Transaction Detection
Separation Provider
Separation Provider
Change in Value Separation
Control Sheet Separation
EPI Separation
ESP Auto Separation
Event-Based Separation
Multi Separator
Pattern-Based Separation
Undo Separation
Service
API Services
Activity Processing
Grooper Licensing
Table Extract Method
Delimited Extract
Fluid Layout
Grid Layout
Row Match
Tabular Layout
UI Element
Document Viewer
Node Tree
Overrides
Summary Tabs