Glossary

From Grooper Wiki
Revision as of 10:46, 22 April 2024 by Randallkinard (talk | contribs) (added Collation Provider section // via Wikitext Extension for VSCode)

Activity

Activity is a property on Batch Process Step objects. Activities define specific document processing operations done to a Batch, Batch Folder, or Batch Page.

Batch Process Steps configured with specific Activities are frequently referred by the name of the Activity followed by the word "step". For example: Classify Step.


Classify

Classify is an Activity that "classifies" Batch Folders in a Batch by assigning them a Content Type using patterns, lexical understanding, or rules as defined by a Content Model.


Clip Frames

The Clip Frames Activity extracts defined areas from microfiche card images, creating new image frames or layers for focused analysis or processing.


Detect Frames

The Detect Frames Activity locates and identifies frame lines on microfiche card images, enabling the isolation of areas within the frames for further data extraction or processing.


Execute

The Execute Activity runs a specified child command, allowing for the modular and controlled execution of tasks within a larger automated workflow.


Export

The Export Activity facilitates the transfer of documents and extracted information to external systems or formats, completing the data processing workflow.


Extract

The Extract Activity retrieves relevant information, defined by Data Elements, from Batch Folders, transforming unstructured or semi-structured content into structured, usable data.


Image Processing

The Image Processing Activity enhances and optimizes Batch Pages for better recognition and data extraction results.


Initialize Card

The Initialize Card Activity prepares and configures microfiche card images for further processing.


Recognize

The Recognize Activity interprets Batch Pages and Batch Folders, converting them into machine-readable text and capturing layout data for comprehensive analysis and data extraction. This will attach a text and/or layoutData file to the respective object.


Render

The Render Activity normalizes electronic document content from file formats Grooper cannot read innately to a PDF format. This allows Grooper to extract the text via the Recognize Activity.


Review

The Review Activity facilitates human evaluation and validation of processed Batch Folders and extracted data for accuracy and completeness.


Send Mail

The Send Mail Activity automates the dispatch of emails with or without attachments, based on Batch Process events and conditions.


Separate

The Separate Activity sorts Batch Pages into individual Batch Folders, distinguishing them for independent processing and organization.


Split Pages

Multi-page documents (typically PDFs and TIFFs) come into Grooper represented as single Batch Folders. The Split Pages Activity exposes Batch Pages as child objects of the Batch Folders for individualized processing and handling.


XML Transform

The XML Transform Activity applies XSLT stylesheets to XML data to modify or reformat the output structure for various purposes.


Behavior

Content Type and Export Behaviors are configurable actions that automate processing tasks based on the identified Content Type of a Batch Folder.


Export Behavior

An Export Behavior defines the conditions and actions for exporting Batch Folders and their associated data from Grooper to other systems.


Labeling Behavior

A Labeling Behavior is a Content Type Behavior designed to collect and utilize a document's field labels in a variety of ways. This includes functionality for Classification and Extraction.


PDF Data Mapping

PDF Data Mapping is a Content Type Behavior designed to create an exportable PDF file with additional native PDF elements.


CMIS Connection Type

CMIS Connection Type, or "binding", establishes the communication protocols used to connect Grooper with content management systems adhering to the CMIS standard.


AppXtender

The AppXtender CMIS Connection Type, or "binding", connects Grooper to the ApplicationXtender content management system for import and export operations.


Box

The Box CMIS Connection Type, or "binding", connects Grooper to the Box content management system for import and export operations.


Exchange

The Exchange CMIS Connection Type, or "binding", connects Grooper to the Microsoft Exchange Server mail server for import and export operations.


FTP

The FTP CMIS Connection Type, or "binding", connects Grooper to FTP directories for import and export operations.


IMAP

The IMAP CMIS Connection Type, or "binding", connects Grooper to email messages and folders through an IMAP email server.


NTFS

The NTFS CMIS Connection Type, or "binding", connects Grooper to files and folders in the Microsoft Windows NTFS file system.


OneDrive

The OneDrive CMIS Connection Type, or "binding", connects Grooper to Microsoft OneDrive cloud services.


SFTP

The SFTP CMIS Connection Type, or "binding", connects Grooper to SFTP directories for import and export operations.


SharePoint

The SharePoint CMIS Connection Type, or "binding", connects Grooper to Microsoft SharePoint, providing access to content stored in "document libraries" and "picture lLibraries".


Classification Method

The Classification Method property determines the technique used for document Classification within a Content Model, enabling the sorting of Batch Folders into categories based on their content or structure. It can utilize pattern matching, machine learning models, or other methodologies to identify and organize documents accurately.


Labelset-Based

Labelset-Based is a Classification Method that leverages the labels defined via a Labeling Behavior to classify Batch Folders.


Lexical

The Lexical Classification Method classifies Batch Folders based on their text content by utilizing either pre-configured training or rules. This is achieved through the analysis of word frequencies or defined rules that identify document types .


Rules-Based

The Rules-Based Classification Method employs defined "rules" on Document Types to classify Batch Folders, utilizing Positive Extractor and Negative Extractor properties to accurately categorize them through rule application, thereby ensuring Batch Folders match predefined criteria .


Visual

The Visual Classification Method uses image data instead of text data to determine the Document Type assigned to a Batch Folder during Classification. Instead of using text-based extractors, an IP Profile is used with an Extract Features IP Command to obtain data pertaining to a Batch Folder's image(s). Document samples are trained as examples of a Document Type.


Collation Provider

The Collation Provider property of a Data Type defines the method for converting its raw results into a final result set, governing how lists of matches from the Data Type are combined and interpreted to produce the output data of the Data Type.


AND

The AND Collatoin Provider of a Data Type returns results only when each individual extractor specified within it gets at least one hit, thus acting as a logical “AND” operator across multiple extractors .


Array

The AND Collatoin Provider of a Data Type matches a list of values arranged in horizontal, vertical, or flow order, combining instances that qualify into a single result .


Combine

The Combine Collatoin Provider of a Data Type combines instances from returned results based on a specified grouping, controlling how extractor results are assembled together for output.


Key-Value List

The Key-Value List Collatoin Provider of a Data Type matches instances where a key and a list of one or more values appear together on the document, adhering to a specific layout pattern .


Key-Value Pair

The Key-Value Pair Collatoin Provider of a 'Data Type matches instances where a key is paired with a value on the document in a specific layout, essential for extracting label-value pairs .


Ordered Array

The Ordered Array Collatoin Provider of a Data Type finds sequences of values where one result is present for each extractor, in the order they appear .


Pattern-Based

The Pattern-Based Collatoin Provider of a Data Type uses regular expressions to sequence returned results into a final result set.


Split

The Split Collatoin Provider of a Data Type separates a Data Instance at each match returned by the Data Type.


Concept


Activity Processing


Asset Management


Backup and Restore Grooper Repository


CMIS+


CMIS


CMIS Query


CSS Data Viewer Styling


Classification


Code Expressions


Combined Methods


Content Type


Data Context


Data Element


Data Extractor


Data Instance


Desktop Scanning in Grooper


Download or Upload Grooper Objects


EDI Integration


Expressions


Expressions Cookbook


Field Mapping


Five Phases of Grooper


Flow Collation


Footer Rows and Footer Modes


Fuzzy RegEx


GPT Integration


Grooper Azure AD Connector


Grooper Infrastructure


Grooper Repository


Grooper Service


Image Processing


Import Mode and Document Linking


Import or Export Grooper Objects


LINQ to Grooper Objects


Layered OCR


Layout Data


License Activation


Microfiche Processing


Microsoft Office Integration


OCR


OCR Synthesis


Object Nomenclature


Overrides


PDF Page Types


Regular Expression


Repository


Separation


TF-IDF


Table Extraction


Test Batch


Thread


Training-Based Approaches to Document Classification


Training Batch


UNC Path


URL Endpoints for Review


Waterfall Classification


XML Schema Integration


Export Type


CMIS Export


Data Export


Extractor Type


Detect Signature


Find Barcode


Highlight Zone


Labeled OMR


Labeled Value


List Match


Ordered OMR


Pattern Match


Read Barcode


Read Zone


Word Match


Zonal OMR


IP Command


Barcode Detection


Binarize


Extract Page


Line Removal


Scratch Removal


Shape Detection


Shape Removal


Import Provider


CMIS Import


Import Descendants


Import Query Results


Lookup


CMIS Lookup


Database Lookup


Web Service Lookup


Object


Batch


Batch Folder


Batch Page


Batch Process


CMIS Connection


CMIS Repository


Content Category


Content Model


Data Connection


Data Field


Data Model


Data Rule


Data Section


Data Table


Data Type


Document Type


Field Class


File Store


Form Type


IP Profile


Lexicon


Machine


OCR Profile


Object Library


Page Type


Processing Queue


Project


Review Queue


Scanner Profile


Separation Profile


Value Reader


Property


Confidence Multiplier and Output Confidence


Constrained Wrap


Content Type Filter


OCR Engine


Output Extractor Key


Paragraph Marking


Permission Sets


Scope


Secondary Types


Tab Marking


Vertical Wrap


Section Extract Method


Nested Table


Transaction Detection


Separation Provider


Separation Provider


Change in Value Separation


Control Sheet Separation


EPI Separation


ESP Auto Separation


Event-Based Separation


Multi Separator


Pattern-Based Separation


Undo Separation


Service


API Services


Activity Processing


Grooper Licensing


Table Extract Method


Delimited Extract


Fluid Layout


Grid Layout


Row Match


Tabular Layout


UI Element


Document Viewer


Node Tree


Summary Tabs