Glossary: Difference between revisions
added Classificaiton Methods // via Wikitext Extension for VSCode |
added Collation Provider section // via Wikitext Extension for VSCode |
||
| Line 183: | Line 183: | ||
=== AND === | === AND === | ||
<section begin="AND" /> | <section begin="AND" /> | ||
The '''''[[AND (Collation Provider)|AND]]''''' '''''[[Collation Provider (Property)|Collatoin Provider]]''''' of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]''' returns results only when each individual extractor specified within it gets at least one hit, thus acting as a logical “AND” operator across multiple extractors . | |||
<section end="AND" /> | <section end="AND" /> | ||
=== Array === | === Array === | ||
<section begin="Array" /> | <section begin="Array" /> | ||
The '''''[[AND (Collation Provider)|AND]]''''' '''''[[Collation Provider (Property)|Collatoin Provider]]''''' of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]''' matches a list of values arranged in horizontal, vertical, or flow order, combining instances that qualify into a single result . | |||
<section end="Array" /> | <section end="Array" /> | ||
=== Combine === | === Combine === | ||
<section begin="Combine" /> | <section begin="Combine" /> | ||
The '''''[[Combine (Collation Provider)|Combine]]''''' '''''[[Collation Provider (Property)|Collatoin Provider]]''''' of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]''' combines instances from returned results based on a specified grouping, controlling how extractor results are assembled together for output. | |||
<section end="Combine" /> | <section end="Combine" /> | ||
=== Key-Value List === | === Key-Value List === | ||
<section begin="Key-Value List" /> | <section begin="Key-Value List" /> | ||
The '''''[[Key-Value List (Collation Provider)|Key-Value List]]''''' '''''[[Collation Provider (Property)|Collatoin Provider]]''''' of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]''' matches instances where a key and a list of one or more values appear together on the document, adhering to a specific layout pattern . | |||
<section end="Key-Value List" /> | <section end="Key-Value List" /> | ||
=== Key-Value Pair === | === Key-Value Pair === | ||
<section begin="Key-Value Pair" /> | <section begin="Key-Value Pair" /> | ||
The '''''[[Key-Value Pair (Collation Provider)|Key-Value Pair]]''''' '''''[[Collation Provider (Property)|Collatoin Provider]]''''' of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]'' matches instances where a key is paired with a value on the document in a specific layout, essential for extracting label-value pairs . | |||
<section end="Key-Value Pair" /> | <section end="Key-Value Pair" /> | ||
=== Ordered Array === | === Ordered Array === | ||
<section begin="Ordered Array" /> | <section begin="Ordered Array" /> | ||
The '''''[[Ordered Array (Collation Provider)|Ordered Array]]''''' '''''[[Collation Provider (Property)|Collatoin Provider]]''''' of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]''' finds sequences of values where one result is present for each extractor, in the order they appear . | |||
<section end="Ordered Array" /> | <section end="Ordered Array" /> | ||
=== Pattern-Based === | === Pattern-Based === | ||
<section begin="Pattern-Based" /> | <section begin="Pattern-Based" /> | ||
The '''''[[Pattern-Based (Collation Provider)|Pattern-Based]]''''' '''''[[Collation Provider (Property)|Collatoin Provider]]''''' of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]''' uses regular expressions to sequence returned results into a final result set. | |||
<section end="Pattern-Based" /> | <section end="Pattern-Based" /> | ||
=== Split === | === Split === | ||
<section begin="Split" /> | <section begin="Split" /> | ||
The '''''[[Split (Collation Provider)|Split]]''''' '''''[[Collation Provider (Property)|Collatoin Provider]]''''' of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]''' separates a [[Data Instance (Concept)|Data Instance]] at each match returned by the '''Data Type'''. | |||
<section end="Split" /> | <section end="Split" /> | ||
Revision as of 10:46, 22 April 2024
Activity
Activity is a property on
Batch Process Step objects. Activities define specific document processing operations done to a
Batch,
Batch Folder, or
Batch Page.
Batch Process Steps configured with specific Activities are frequently referred by the name of the Activity followed by the word "step". For example: Classify Step.
Classify
Classify is an Activity that "classifies"
Batch Folders in a
Batch by assigning them a Content Type using patterns, lexical understanding, or rules as defined by a
Content Model.
Clip Frames
The Clip Frames Activity extracts defined areas from microfiche card images, creating new image frames or layers for focused analysis or processing.
Detect Frames
The Detect Frames Activity locates and identifies frame lines on microfiche card images, enabling the isolation of areas within the frames for further data extraction or processing.
Execute
The Execute Activity runs a specified child command, allowing for the modular and controlled execution of tasks within a larger automated workflow.
Export
The Export Activity facilitates the transfer of documents and extracted information to external systems or formats, completing the data processing workflow.
Extract
The Extract Activity retrieves relevant information, defined by Data Elements, from
Batch Folders, transforming unstructured or semi-structured content into structured, usable data.
Image Processing
The Image Processing Activity enhances and optimizes
Batch Pages for better recognition and data extraction results.
Initialize Card
The Initialize Card Activity prepares and configures microfiche card images for further processing.
Recognize
The Recognize Activity interprets
Batch Pages and
Batch Folders, converting them into machine-readable text and capturing layout data for comprehensive analysis and data extraction. This will attach a text and/or layoutData file to the respective object.
Render
The Render Activity normalizes electronic document content from file formats Grooper cannot read innately to a PDF format. This allows Grooper to extract the text via the Recognize Activity.
Review
The Review Activity facilitates human evaluation and validation of processed
Batch Folders and extracted data for accuracy and completeness.
Send Mail
The Send Mail Activity automates the dispatch of emails with or without attachments, based on
Batch Process events and conditions.
Separate
The Separate Activity sorts
Batch Pages into individual
Batch Folders, distinguishing them for independent processing and organization.
Split Pages
Multi-page documents (typically PDFs and TIFFs) come into Grooper represented as single
Batch Folders. The Split Pages Activity exposes
Batch Pages as child objects of the
Batch Folders for individualized processing and handling.
XML Transform
The XML Transform Activity applies XSLT stylesheets to XML data to modify or reformat the output structure for various purposes.
Behavior
Content Type and Export Behaviors are configurable actions that automate processing tasks based on the identified Content Type of a
Batch Folder.
Export Behavior
An Export Behavior defines the conditions and actions for exporting
Batch Folders and their associated data from Grooper to other systems.
Labeling Behavior
A Labeling Behavior is a Content Type Behavior designed to collect and utilize a document's field labels in a variety of ways. This includes functionality for Classification and Extraction.
PDF Data Mapping
PDF Data Mapping is a Content Type Behavior designed to create an exportable PDF file with additional native PDF elements.
CMIS Connection Type
CMIS Connection Type, or "binding", establishes the communication protocols used to connect Grooper with content management systems adhering to the CMIS standard.
AppXtender
The AppXtender CMIS Connection Type, or "binding", connects Grooper to the ApplicationXtender content management system for import and export operations.
Box
The Box CMIS Connection Type, or "binding", connects Grooper to the Box content management system for import and export operations.
Exchange
The Exchange CMIS Connection Type, or "binding", connects Grooper to the Microsoft Exchange Server mail server for import and export operations.
FTP
The FTP CMIS Connection Type, or "binding", connects Grooper to FTP directories for import and export operations.
IMAP
The IMAP CMIS Connection Type, or "binding", connects Grooper to email messages and folders through an IMAP email server.
NTFS
The NTFS CMIS Connection Type, or "binding", connects Grooper to files and folders in the Microsoft Windows NTFS file system.
OneDrive
The OneDrive CMIS Connection Type, or "binding", connects Grooper to Microsoft OneDrive cloud services.
SFTP
The SFTP CMIS Connection Type, or "binding", connects Grooper to SFTP directories for import and export operations.
The SharePoint CMIS Connection Type, or "binding", connects Grooper to Microsoft SharePoint, providing access to content stored in "document libraries" and "picture lLibraries".
Classification Method
The Classification Method property determines the technique used for document Classification within a
Content Model, enabling the sorting of
Batch Folders into categories based on their content or structure. It can utilize pattern matching, machine learning models, or other methodologies to identify and organize documents accurately.
Labelset-Based
Labelset-Based is a Classification Method that leverages the labels defined via a Labeling Behavior to classify
Batch Folders.
Lexical
The Lexical Classification Method classifies
Batch Folders based on their text content by utilizing either pre-configured training or rules. This is achieved through the analysis of word frequencies or defined rules that identify document types .
Rules-Based
The Rules-Based Classification Method employs defined "rules" on
Document Types to classify
Batch Folders, utilizing Positive Extractor and Negative Extractor properties to accurately categorize them through rule application, thereby ensuring
Batch Folders match predefined criteria .
Visual
The Visual Classification Method uses image data instead of text data to determine the
Document Type assigned to a
Batch Folder during Classification. Instead of using text-based extractors, an
IP Profile is used with an Extract Features IP Command to obtain data pertaining to a
Batch Folder's image(s). Document samples are trained as examples of a Document Type.
Collation Provider
The Collation Provider property of a
Data Type defines the method for converting its raw results into a final result set, governing how lists of matches from the Data Type are combined and interpreted to produce the output data of the Data Type.
AND
The AND Collatoin Provider of a
Data Type returns results only when each individual extractor specified within it gets at least one hit, thus acting as a logical “AND” operator across multiple extractors .
Array
The AND Collatoin Provider of a
Data Type matches a list of values arranged in horizontal, vertical, or flow order, combining instances that qualify into a single result .
Combine
The Combine Collatoin Provider of a
Data Type combines instances from returned results based on a specified grouping, controlling how extractor results are assembled together for output.
Key-Value List
The Key-Value List Collatoin Provider of a
Data Type matches instances where a key and a list of one or more values appear together on the document, adhering to a specific layout pattern .
Key-Value Pair
The Key-Value Pair Collatoin Provider of a
'Data Type matches instances where a key is paired with a value on the document in a specific layout, essential for extracting label-value pairs .
Ordered Array
The Ordered Array Collatoin Provider of a
Data Type finds sequences of values where one result is present for each extractor, in the order they appear .
Pattern-Based
The Pattern-Based Collatoin Provider of a
Data Type uses regular expressions to sequence returned results into a final result set.
Split
The Split Collatoin Provider of a
Data Type separates a Data Instance at each match returned by the Data Type.
Concept
Activity Processing
Asset Management
Backup and Restore Grooper Repository
CMIS+
CMIS
CMIS Query
CSS Data Viewer Styling
Classification
Code Expressions
Combined Methods
Content Type
Data Context
Data Element
Data Extractor
Data Instance
Desktop Scanning in Grooper
Download or Upload Grooper Objects
EDI Integration
Expressions
Expressions Cookbook
Field Mapping
Five Phases of Grooper
Flow Collation
Fuzzy RegEx
GPT Integration
Grooper Azure AD Connector
Grooper Infrastructure
Grooper Repository
Grooper Service
Image Processing
Import Mode and Document Linking
Import or Export Grooper Objects
LINQ to Grooper Objects
Layered OCR
Layout Data
License Activation
Microfiche Processing
Microsoft Office Integration
OCR
OCR Synthesis
Object Nomenclature
Overrides
PDF Page Types
Regular Expression
Repository
Separation
TF-IDF
Table Extraction
Test Batch
Thread
Training-Based Approaches to Document Classification
Training Batch
UNC Path
URL Endpoints for Review
Waterfall Classification
XML Schema Integration
Export Type
CMIS Export
Data Export
Extractor Type
Detect Signature
Find Barcode
Highlight Zone
Labeled OMR
Labeled Value
List Match
Ordered OMR
Pattern Match
Read Barcode
Read Zone
Word Match
Zonal OMR
IP Command
Barcode Detection
Binarize
Extract Page
Line Removal
Scratch Removal
Shape Detection
Shape Removal
Import Provider
CMIS Import
Import Descendants
Import Query Results
Lookup
CMIS Lookup
Database Lookup
Web Service Lookup
Object
Batch
Batch Folder
Batch Page
Batch Process
CMIS Connection
CMIS Repository
Content Category
Content Model
Data Connection
Data Field
Data Model
Data Rule
Data Section
Data Table
Data Type
Document Type
Field Class
File Store
Form Type
IP Profile
Lexicon
Machine
OCR Profile
Object Library
Page Type
Processing Queue
Project
Review Queue
Scanner Profile
Separation Profile
Value Reader
Property
Confidence Multiplier and Output Confidence
Constrained Wrap
Content Type Filter
OCR Engine
Output Extractor Key
Paragraph Marking
Permission Sets
Scope
Secondary Types
Tab Marking
Vertical Wrap
Section Extract Method
Nested Table
Transaction Detection
Separation Provider
Separation Provider
Change in Value Separation
Control Sheet Separation
EPI Separation
ESP Auto Separation
Event-Based Separation
Multi Separator
Pattern-Based Separation
Undo Separation
Service
API Services
Activity Processing
Grooper Licensing
Table Extract Method
Delimited Extract
Fluid Layout
Grid Layout
Row Match
Tabular Layout
UI Element
Document Viewer
Node Tree
Summary Tabs