Glossary: Difference between revisions
corrected spelling // via Wikitext Extension for VSCode |
added Concept section // via Wikitext Extension for VSCode |
||
| Line 1: | Line 1: | ||
== Activity == | == Activity == | ||
<section begin="Activity" /> | <section begin="Activity" /> | ||
'''''[[Activity (Property)|Activity]]''''' is a property on [[image:GrooperIcon_BatchProcessStep.png]] '''[[Batch Process Step]]''' objects. '''''Activities''''' define specific document processing operations done to a [[image:GrooperIcon_Batch.png]] '''[[Batch]]''', [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder]]''', or [[image:GrooperIcon_BatchPage.png]] '''[[Batch Page]]'''. | '''''[[Activity (Property)|Activity]]''''' is a property on [[image:GrooperIcon_BatchProcessStep.png]] '''[[Batch Process Step]]''' objects. '''''Activities''''' define specific document processing operations done to a [[image:GrooperIcon_Batch.png]] '''[[Batch (Object)|Batch]]''', [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder]]''', or [[image:GrooperIcon_BatchPage.png]] '''[[Batch Page]]'''. | ||
'''Batch Process Steps''' configured with specific '''''Activities''''' are frequently referred by the name of the '''''Activity''''' followed by the word "step". For example: '''Classify Step'''. | '''Batch Process Steps''' configured with specific '''''Activities''''' are frequently referred by the name of the '''''Activity''''' followed by the word "step". For example: '''Classify Step'''. | ||
| Line 8: | Line 8: | ||
=== Classify === | === Classify === | ||
<section begin="Classify" /> | <section begin="Classify" /> | ||
'''''[[Classify (Activity)|Classify]]''''' is an '''''[[Activity (Property)|Activity]]''''' that "classifies" [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folders]]''' in a [[image:GrooperIcon_Batch.png]] '''[[Batch]]''' by assigning them a '''[[Content Type (Concept)|Content Type]]''' using patterns, lexical understanding, or rules as defined by a [[image:GrooperIcon_ContentModel.png]] '''[[Content Model (Object)|Content Model]]'''. | '''''[[Classify (Activity)|Classify]]''''' is an '''''[[Activity (Property)|Activity]]''''' that "classifies" [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folders]]''' in a [[image:GrooperIcon_Batch.png]] '''[[Batch (Object)|Batch]]''' by assigning them a '''[[Content Type (Concept)|Content Type]]''' using patterns, lexical understanding, or rules as defined by a [[image:GrooperIcon_ContentModel.png]] '''[[Content Model (Object)|Content Model]]'''. | ||
<section end="Classify" /> | <section end="Classify" /> | ||
| Line 80: | Line 80: | ||
The '''''[[XML Transform (Activity)|XML Transform]]''''' '''''[[Activity (Property)|Activity]]''''' applies [https://en.wikipedia.org/wiki/XSLT XSLT] stylesheets to [https://en.wikipedia.org/wiki/XML XML] data to modify or reformat the output structure for various purposes. | The '''''[[XML Transform (Activity)|XML Transform]]''''' '''''[[Activity (Property)|Activity]]''''' applies [https://en.wikipedia.org/wiki/XSLT XSLT] stylesheets to [https://en.wikipedia.org/wiki/XML XML] data to modify or reformat the output structure for various purposes. | ||
<section end="XML Transform" /> | <section end="XML Transform" /> | ||
== Application == | |||
A '''Grooper''' [[Repository (Concept)|repository]] consists of a series of [https://en.wikipedia.org/wiki/Table_(information) tables] in a [https://en.wikipedia.org/wiki/Database database], and a '''[[File Store (Object)|File Store]]''' containing relevant files associated to objects that exist within that database. An '''Grooper''' [https://en.wikipedia.org/wiki/Application_software application] is the interface by which a user can interact with that repository of information in an intuitive way. | |||
=== Grooper Command Console === | |||
The '''[[Grooper Command Console (Application)|Grooper Command Console]]''' is a [https://en.wikipedia.org/wiki/Command-line_interface command-line interface] that performs system configuration and administration tasks within '''Grooper'''. | |||
=== Web Client === | |||
The '''[[Web Client (Application)|Grooper Web Client]]''' allows users to connect to '''Grooper''' via a [https://en.wikipedia.org/wiki/Web_browser web browser] using a [https://en.wikipedia.org/wiki/URL URL]. The URL is pointed at a [https://en.wikipedia.org/wiki/Website website] hosted by a [https://en.wikipedia.org/wiki/Server_(computing) server] on which '''Grooper''' is installed and [https://en.wikipedia.org/wiki/Internet_Information_Services Internet Information Services] configured. | |||
== Behavior == | == Behavior == | ||
| Line 153: | Line 162: | ||
== Classification Method == | == Classification Method == | ||
<section begin="Classification Method" /> | <section begin="Classification Method" /> | ||
The '''''[[Classification Method (Property)|Classification Method]]''''' property determines the technique used for document [[Classification (Concept)| | The '''''[[Classification Method (Property)|Classification Method]]''''' property determines the technique used for document [[Classification (Concept)|classification]] within a [[image:GrooperIcon_ContentModel.png]] '''[[Content Model (Object)|Content Model]]''', enabling the sorting of [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folders]]''' into categories based on their content or structure. It can utilize pattern matching, machine learning models, or other methodologies to identify and organize documents accurately. | ||
<section end="Classification Method" /> | <section end="Classification Method" /> | ||
| Line 173: | Line 182: | ||
=== Visual === | === Visual === | ||
<section begin="Visual" /> | <section begin="Visual" /> | ||
The '''''[[Visual (Classification Method)|Visual]]''''' '''''Classification Method''''' uses image data instead of text data to determine the [[image:GrooperIcon_DocumentType.png]] '''[[Document Type (Object)|Document Type]]''' assigned to a [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder]]''' during [[Classification (Concept)| | The '''''[[Visual (Classification Method)|Visual]]''''' '''''Classification Method''''' uses image data instead of text data to determine the [[image:GrooperIcon_DocumentType.png]] '''[[Document Type (Object)|Document Type]]''' assigned to a [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder]]''' during [[Classification (Concept)|classification]]. Instead of using text-based extractors, an [[image:GrooperIcon_IPProfile.png]] '''[[IP Profile (Object)|IP Profile]]''' is used with an '''''[[Extract Features (IP Command)|Extract Features]]''''' '''''[[IP Command (Property)|IP Command]]''''' to obtain data pertaining to a [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folder's]]''' image(s). Document samples are trained as examples of a '''Document Type'''. | ||
<section end="Visual" /> | <section end="Visual" /> | ||
| Line 218: | Line 227: | ||
=== Split === | === Split === | ||
<section begin="Split" /> | <section begin="Split" /> | ||
The '''''[[Split (Collation Provider)|Split]]''''' '''''[[Collation Provider (Property)|Collation Provider]]''''' of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]''' separates a [[Data Instance (Concept)| | The '''''[[Split (Collation Provider)|Split]]''''' '''''[[Collation Provider (Property)|Collation Provider]]''''' of a [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Type]]''' separates a [[Data Instance (Concept)|data instance]] at each match returned by the '''Data Type'''. | ||
<section end="Split" /> | <section end="Split" /> | ||
| Line 228: | Line 237: | ||
=== Activity Processing === | === Activity Processing === | ||
<section begin="Activity Processing" /> | <section begin="Activity Processing" /> | ||
[[Activity Processing (Concept)|Activity Processing]] is a conceptual term that refers to the execution of a sequence of configured tasks, such as [[Classification (Concept)|classification]], [[Extraction (Concept)|extraction]], or data enhancement on documents, which are performed within a [[image:GrooperIcon_BatchProcess.png]] '''[[Batch Process (Object)|Batch Process]]''' to transform raw data from documents into structured and actionable information. | |||
<section end="Activity Processing" /> | <section end="Activity Processing" /> | ||
=== Asset Management === | === Asset Management === | ||
<section begin="Asset Management" /> | <section begin="Asset Management" /> | ||
[[Asset Management (Concept)|Asset Management]] is a conceptual term that refers to an idea centered around a best practice to follow to make the organization of objects in a '''Grooper''' [[Repository (Concept)|repository]] clean and easy to follow. Adhering to a standard naming convention, especially if multiple users are designing in '''Grooper''', will reduce the time you spend configuring and troubleshooting issues. | |||
<section end="Asset Management" /> | <section end="Asset Management" /> | ||
=== CMIS+ === | === CMIS+ === | ||
<section begin="CMIS+" /> | <section begin="CMIS+" /> | ||
[[CMIS+ (Concept)|CMIS+]] is a conceptual term that refers to '''Grooper's''' [https://en.wikipedia.org/wiki/Content_Management_Interoperability_Services CMIS]+ architecture that provides a standardized access to document content and metadata across a variety of external storage platforms. | |||
<section end="CMIS+" /> | <section end="CMIS+" /> | ||
=== CMIS === | === CMIS === | ||
<section begin="CMIS" /> | <section begin="CMIS" /> | ||
[[CMIS (Concept)|CMIS]] is a conceptual term that refers to [https://en.wikipedia.org/wiki/Content_Management_Interoperability_Services CMIS] (Content Management Interoperability Services): an open standard allowing different content management systems to share information over the [https://en.wikipedia.org/wiki/Internet Internet]. | |||
<section end="CMIS" /> | <section end="CMIS" /> | ||
=== CMIS Query === | === CMIS Query === | ||
<section begin="CMIS Query" /> | <section begin="CMIS Query" /> | ||
[[CMIS Query (Concept)|CMIS Query]] is a conceptual term that refers to the fact that [https://en.wikipedia.org/wiki/Content_Management_Interoperability_Services CMIS] [https://en.wikipedia.org/wiki/Query#Computing_and_technology Queries] are utilized to search documents in CMIS [https://en.wikipedia.org/wiki/Repository#Archives_and_online_databases Repositories] and to filter documents upon import when using the '''''[[Import Query Results (Import Provider)|Import Query Results]]''''' '''''[[Import Provider (Property)|Import Provider]]'''''. | |||
<section end="CMIS Query" /> | <section end="CMIS Query" /> | ||
=== CSS Data Viewer Styling === | === CSS Data Viewer Styling === | ||
<section begin="CSS Data Viewer Styling" /> | <section begin="CSS Data Viewer Styling" /> | ||
[[CSS Data Viewer Styling (Concept)|CSS Data Viewer Styling]] is a conceptual term that refers to the idea that the '''[[Web Client (Application)|Grooper Web Client's]]''' '''[[Data View (Task View)|Data View]]''' task view of the '''[[Review (Activity)|Review]]''' interface is styled using [https://en.wikipedia.org/wiki/CSS CSS]. This gives you a great deal of control over a [[image:GrooperIcon_DataModel.png]] '''[[Data Model (Object)|Data Model's]]''' appearance and layout during document review. | |||
<section end="CSS Data Viewer Styling" /> | <section end="CSS Data Viewer Styling" /> | ||
=== Classification === | === Classification === | ||
<section begin="Classification" /> | <section begin="Classification" /> | ||
[[Classification (Concept)|Classification]] is a conceptual term that refers to the process of identifying and organizing documents into categorical types based on their content or layout, often using machine learning, rules, or pattern recognition for efficient document management and data extraction workflows. Specifically, the '''''[[Classify (Activity)|Classify]]''''' '''''[[Activity (Property)|Activity]]''''' will assign a '''[[Content Type (Concept)|Content Type]]''' to a [[image:GrooperIcon_Batch Folder]] '''[[Batch Folder (Object)|Batch Folder]]'''. | |||
<section end="Classification" /> | <section end="Classification" /> | ||
=== Code Expressions === | === Code Expressions === | ||
<section begin="Code Expressions" /> | <section begin="Code Expressions" /> | ||
[[Code Expressions (Concept)|Expressions]] (not to be confused with [https://en.wikipedia.org/wiki/Regular_expression regular expressions]) is a conceptual term that refers to snippets of [https://en.wikipedia.org/wiki/Visual_Basic_(.NET) VB.Net] code that expand '''Grooper’s''' core functionality. | |||
<section end="Code Expressions" /> | <section end="Code Expressions" /> | ||
=== Combined Methods === | === Combined Methods === | ||
<section begin="Combined Methods" /> | <section begin="Combined Methods" /> | ||
[[Combined Methods (Concept)|Combining Methods]] is a conceptual term that refers to the idea that a user can leverage multiple [[Classification Method (Property)|Classification Methods]] to overcome the shortcomings of an individual method. | |||
<section end="Combined Methods" /> | <section end="Combined Methods" /> | ||
=== Content Type === | === Content Type === | ||
<section begin="Content Type" /> | <section begin="Content Type" /> | ||
'''[[Content Type (Concept)|Content Type]]''' is a conceptual term that refers to the grouping of three '''Grooper''' objects: [[image:GrooperIcon_ContentModel.png]] '''[[Content Model (Object)|Content Models]]''', [[image:GrooperIcon_ContentCategory.png]] '''[[Content Category (Object)|Content Categories]]''', and [[image:GrooperIcon_DocumentType.png]] '''[[Document Type (Object)|Document Types]]'''. | |||
<section end="Content Type" /> | <section end="Content Type" /> | ||
=== Data Context === | === Data Context === | ||
<section begin="Data Context" /> | <section begin="Data Context" /> | ||
[[Data Context (Concept)|Data Context]] is a conceptual term that gives definition to data that, without it, is otherwise meaningless. | |||
<section end="Data Context" /> | <section end="Data Context" /> | ||
=== Data Element === | === Data Element === | ||
<section begin="Data Element" /> | <section begin="Data Element" /> | ||
'''[[Data Element (Concept)|Data Element]]''' is a conceptual term that refers to the grouping of five '''Grooper''' objects: [[image:GrooperIcon_DataModel.png]] '''[[Data Model (Object)|Data Models]]''', [[image:GrooperIcon_DataSection.png]] '''[[Data Section (Object)|Data Sections]]''', [[image:GrooperIcon_DataField.png]] '''[[Data Field (Object)|Data Fields]]''', [[image:GrooperIcon_DataTable.png]] '''[[Data Table (Object)|Data Tables]]''', and [[image:GrooperIcon_DataColumn.png]] '''[[Data Column (Object)|Data Columns]]'''. | |||
<section end="Data Element" /> | <section end="Data Element" /> | ||
=== Data Extractor === | === Data Extractor === | ||
<section begin="Data Extractor" /> | <section begin="Data Extractor" /> | ||
[[Data Extractor (Concept)|Data Extractor]] is a conceptual term that refers to the grouping of all [[Data Extractor (Concept)#Extractor_Types|extractor types]] and [[Object Nomenclature#Extractor Objects|extractor objects]]. | |||
<section end="Data Extractor" /> | <section end="Data Extractor" /> | ||
=== Data Instance === | === Data Instance === | ||
<section begin="Data Instance" /> | <section begin="Data Instance" /> | ||
[[Data Instance (Concept)|Data Instance]] is a conceptual term that refers to an encapsulation of text data within a document. Data Instances are the hierarchy of text data that '''Grooper's''' extraction mechanisms create. | |||
<section end="Data Instance" /> | <section end="Data Instance" /> | ||
=== EDI Integration === | === EDI Integration === | ||
<section begin="EDI Integration" /> | <section begin="EDI Integration" /> | ||
[[EDI Integration (Concept)|EDI Integration]] is a conceptual term that refers to '''Grooper's''' ability to process [https://en.wikipedia.org/wiki/Electronic_data_interchange EDI] files. | |||
<section end="EDI Integration" /> | <section end="EDI Integration" /> | ||
=== Expressions === | === Expressions === | ||
<section begin="Expressions" /> | <section begin="Expressions" /> | ||
[[Expressions (Concept)|Expressions]] (not to be confused with [https://en.wikipedia.org/wiki/Regular_expression regular expressions]) is a conceptual term that refers to snippets of [https://en.wikipedia.org/wiki/Visual_Basic_(.NET) VB.Net] code that expand '''Grooper’s''' core functionality. | |||
<section end="Expressions" /> | <section end="Expressions" /> | ||
=== Expressions Cookbook === | === Expressions Cookbook === | ||
<section begin="Expressions Cookbook" /> | <section begin="Expressions Cookbook" /> | ||
[[Expressions Cookbook (Concept)|Expressions Cookbook]] is a conceptual term that refers to a reference list for commonly used [https://en.wikipedia.org/wiki/Expression_(computer_science) expressions] in '''Grooper'''. | |||
<section end="Expressions Cookbook" /> | <section end="Expressions Cookbook" /> | ||
=== Field Mapping === | === Field Mapping === | ||
<section begin="Field Mapping" /> | <section begin="Field Mapping" /> | ||
[[Field Mapping (Concept)|Field Mapping]] is a conceptual term that refers to how logical connections are made between [https://en.wikipedia.org/wiki/Metadata metadata] content in '''Grooper''' and an external storage platform. | |||
<section end="Field Mapping" /> | <section end="Field Mapping" /> | ||
=== Five Phases of Grooper === | === Five Phases of Grooper === | ||
<section begin="Five Phases of Grooper" /> | <section begin="Five Phases of Grooper" /> | ||
[[Five Phases of Grooper (Concept)|Five Phases of Grooper]] is a conceptual term that seeks to build understanding of how documents are processed through '''Grooper'''. | |||
<section end="Five Phases of Grooper" /> | <section end="Five Phases of Grooper" /> | ||
=== Flow Collation === | === Flow Collation === | ||
<section begin="Flow Collation" /> | <section begin="Flow Collation" /> | ||
[[Flow Collation (Concept)|Flow Collation]] is a conceptual term used to define a type of layout used in '''''[[Collation Provider (Property)|Collation Providers]]''''' of [[image:GrooperIcon_DataType.png]] '''[[Data Type (Object)|Data Types]]'''. | |||
<section end="Flow Collation" /> | <section end="Flow Collation" /> | ||
=== Footer Rows and Footer Modes === | === Footer Rows and Footer Modes === | ||
<section begin="Footer Rows and Footer Modes" /> | <section begin="Footer Rows and Footer Modes" /> | ||
[[Footer Rows and Footer Modes (Concept)|Footer Rows and Footer Modes]] is a conceptual term that refers to how a "footer row" (enabled by the '''''Generate Footer Row''''' property of a [[image:GrooperIcon_DataTable.png]] '''[[Data Table (Object)|Data Table]]''') provides '''Grooper''' users a quick way to validate numerical data in a [[image:GrooperIcon_DataColumn.png]] '''[[Data Column (Object)|Data Column]]'''. The '''Data Column's''' '''''Footer Mode''''' property controls if and how a total is determined for numerical values in a '''Data Column'''. | |||
<section end="Footer Rows and Footer Modes" /> | <section end="Footer Rows and Footer Modes" /> | ||
=== Fuzzy RegEx === | === Fuzzy RegEx === | ||
<section begin="Fuzzy RegEx" /> | <section begin="Fuzzy RegEx" /> | ||
[[Fuzzy Regex (Concept)|Fuzzy Regex]] is a conceptual term that refers to the usage of [https://en.wikipedia.org/wiki/Fuzzy_logic fuzzy logic] within [[Data Extractor (Concept)#Extractor_Types|extractor types]] that leverage regular expressions to match patterns via the enabling of the '''''Fuzzy Matching'''''' property. | |||
<section end="Fuzzy RegEx" /> | <section end="Fuzzy RegEx" /> | ||
=== GPT Integration === | === GPT Integration === | ||
<section begin="GPT Integration" /> | <section begin="GPT Integration" /> | ||
[[GPT Integration (Concept)|GPT Integration]] is a conceptual term that refers to the usage of [https://en.wikipedia.org/wiki/OpenAI OpenAI's] [https://en.wikipedia.org/wiki/Generative_pre-trained_transformer GPT] models within '''Grooper''' to enhance the capabilities of [[Data Extractor (Concept)|data extractors]], [[Classification (Concept)|classification]], and lookups. | |||
<section end="GPT Integration" /> | <section end="GPT Integration" /> | ||
=== Grooper Infrastructure === | === Grooper Infrastructure === | ||
<section begin="Grooper Infrastructure" /> | <section begin="Grooper Infrastructure" /> | ||
[[Grooper Infrastructure (Concept)|Grooper Infrastructure]] is a conceptual term that refers to computing underpinnings of what makes up a [[Grooper Repository (Concept)|Grooper repository]] and the [https://en.wikipedia.org/wiki/Software software] that allows interface with it. | |||
<section end="Grooper Infrastructure" /> | <section end="Grooper Infrastructure" /> | ||
=== Grooper Repository === | === Grooper Repository === | ||
<section begin="Grooper Repository" /> | <section begin="Grooper Repository" /> | ||
[[Grooper Repository (Concept)|Grooper Repository]] is a conceptual term that refers to the environment used to create, configure and execute objects in '''Grooper'''. It provides the framework to "do work" in '''Grooper'''. | |||
<section end="Grooper Repository" /> | <section end="Grooper Repository" /> | ||
=== Grooper Service === | === Grooper Service === | ||
<section begin="Grooper Service" /> | <section begin="Grooper Service" /> | ||
[[[Grooper Services (Concept)|Grooper Services]]] is a conceptual term that refers to the various executable programs that run as a [https://en.wikipedia.org/wiki/Windows_service Windows Services] to facilitate '''Grooper''' processing. Service instances are installed, configured, started and stopped using [[Grooper Config (Application)|Grooper Config]]. | |||
<section end="Grooper Service" /> | <section end="Grooper Service" /> | ||
=== Image Processing === | === Image Processing === | ||
<section begin="Image Processing" /> | <section begin="Image Processing" /> | ||
[[Image Processing (Concept)|Image Processing]] is a conceptual term that refers to how '''Grooper''' applies a variety of techniques to enhance scanned documents' quality, improving [[OCR (Concept)|OCR]] accuracy by removing imperfections and adjusting visual characteristics to prepare images for data extraction and [[Classification (Concept)|classification]]. | |||
<section end="Image Processing" /> | <section end="Image Processing" /> | ||
=== Import Mode and Document Linking === | === Import Mode and Document Linking === | ||
<section begin="Import Mode and Document Linking" /> | <section begin="Import Mode and Document Linking" /> | ||
[[Import Mode and Document Linking (Concept)|Import Mode and Document Linking]] is a conceptual term that refers to the usage of the '''''Import Mode''''' property. This affects whether or not an imported document maintains a link to its original file and/or if a copy of the file is made on import or not. | |||
<section end="Import Mode and Document Linking" /> | <section end="Import Mode and Document Linking" /> | ||
=== LINQ to Grooper Objects === | === LINQ to Grooper Objects === | ||
<section begin="LINQ to Grooper Objects" /> | <section begin="LINQ to Grooper Objects" /> | ||
[[LINQ to Grooper Objects (Concept)|LINQ to Grooper Objects]] is a conceptual term that refers to the ability of '''Grooper''' to leverage [https://en.wikipedia.org/wiki/Language_Integrated_Query LINQ] syntax in [[Expressions (Concept)|expressions]]. | |||
<section end="LINQ to Grooper Objects" /> | <section end="LINQ to Grooper Objects" /> | ||
=== Layered OCR === | === Layered OCR === | ||
<section begin="Layered OCR" /> | <section begin="Layered OCR" /> | ||
[[Layered OCR (Concept)|Layered OCR]] is a conceptual term that refers to the usage of the ''Layered OCR'' setting of the '''''OCR Engine''''' property of an [[image:GrooperIcon_OCRProfile.png]] '''[[OCR Profile (Object)|OCR Profile]]'''. The use of this setting enables the usage of secondary '''OCR Profiles''' on a single page. The [[OCR (Concept)|OCR]] results from these secondary '''OCR Profiles''' are merged with (or ''layered'' on top of) the primary '''OCR Profile's''' results. | |||
<section end="Layered OCR" /> | <section end="Layered OCR" /> | ||
=== Layout Data === | === Layout Data === | ||
<section begin="Layout Data" /> | <section begin="Layout Data" /> | ||
[[Layout Data (Concept)|Layout Data]] is a conceptual term that refers to information such as line locations, [https://en.wikipedia.org/wiki/Optical_mark_recognition OMR] checkbox locations and states, [https://en.wikipedia.org/wiki/Barcode barcode] values, and detected shapes captured by certain [[Image Processing (Concept)|image processing]] commands. This data is stored as an attached file on a [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder]]''' or [[image:GrooperIcon_BatchPage.png]] '''[[Batch Page]]''' object and can later be recalled by various functions within '''Grooper''' that rely on the presence of that data to function. | |||
<section end="Layout Data" /> | <section end="Layout Data" /> | ||
=== Microfiche Processing === | === Microfiche Processing === | ||
<section begin="Microfiche Processing" /> | <section begin="Microfiche Processing" /> | ||
[[Microfiche Processing (Concept)|Microfiche Processing]] is a conceptual term that refers to how '''Grooper''' leverages several '''''[[IP Command (Property)|IP Commands]]''''' to accurately process [https://en.wikipedia.org/wiki/Microform microform] documents. | |||
<section end="Microfiche Processing" /> | <section end="Microfiche Processing" /> | ||
=== Microsoft Office Integration === | === Microsoft Office Integration === | ||
<section begin="Microsoft Office Integration" /> | <section begin="Microsoft Office Integration" /> | ||
[[Microsoft Office Integration (Concept)|Microsoft Office Integration]] is a conceptual term that refers to '''Grooper's''' ability to convert [https://en.wikipedia.org/wiki/Microsoft_Word Microsoft Word] and [https://en.wikipedia.org/wiki/Microsoft_Excel Microsoft Excel] files into formats that '''Grooper''' can read. | |||
<section end="Microsoft Office Integration" /> | <section end="Microsoft Office Integration" /> | ||
=== OCR === | === OCR === | ||
<section begin="OCR" /> | <section begin="OCR" /> | ||
[[OCR (Concept)|OCR]] is a conceptual term that stands for [https://en.wikipedia.org/wiki/Optical_character_recognition Optical Character Recognition]. It allows text from paper documents to be digitized, in order to be searched or edited by other [https://en.wikipedia.org/wiki/Software software applications]. OCR converts typed or printed text from digital images of physical documents into machine readable, encoded text | |||
<section end="OCR" /> | <section end="OCR" /> | ||
=== OCR Synthesis === | === OCR Synthesis === | ||
<section begin="OCR Synthesis" /> | <section begin="OCR Synthesis" /> | ||
[[OCR Synthesis (Concept)|OCR Synthesis]] is a conceptual term that refers to '''Grooper's''' unique method of pre-processing and re-processing raw results from the '''''[[OCR Engine (Property)|OCR Engine]]''''' to get better results out of it. | |||
<section end="OCR Synthesis" /> | <section end="OCR Synthesis" /> | ||
=== Object Nomenclature === | === Object Nomenclature === | ||
<section begin="Object Nomenclature" /> | <section begin="Object Nomenclature" /> | ||
[[Object Nomenclature (Concept)|Object Nomenclature]] is a conceptual term that refers to the idea that a mastery of a '''Grooper''' environment is greatly enhanced by understanding the myriad of objects that can exist and how they are related. | |||
<section end="Object Nomenclature" /> | <section end="Object Nomenclature" /> | ||
=== PDF Page Types === | === PDF Page Types === | ||
<section begin="PDF Page Types" /> | <section begin="PDF Page Types" /> | ||
[[PDF Page Types (Concept)|PDF Page Types]] is a conceptual term that refers to specific types of [https://en.wikipedia.org/wiki/PDF PDF] pages. Page types describe the kind of content in a PDF page and informs '''Grooper''' how certain '''''[[Activity (Property)|Activities]]''''' should process the page. For example, "single image" pages are [[OCR (Concept)|OCR'd]] by the '''''[[Recognize (Activity)|Recognize]]''''' activity where "text only" pages have their native text extracted. | |||
<section end="PDF Page Types" /> | <section end="PDF Page Types" /> | ||
=== Regular Expression === | === Regular Expression === | ||
<section begin="Regular Expression" /> | <section begin="Regular Expression" /> | ||
[[Regular Expression (Concept)|Regular Expression]] is a conceptual term that refers to a standard [https://en.wikipedia.org/wiki/Syntax syntax] designed to parse [https://en.wikipedia.org/wiki/String_(computer_science) text strings]. This is a way of finding information in a block of text. It is the primary method by which '''Grooper''' extracts and returns data from documents. | |||
<section end="Regular Expression" /> | <section end="Regular Expression" /> | ||
=== Repository === | === Repository === | ||
<section begin="Repository" /> | <section begin="Repository" /> | ||
[[Repository (Concept)|Repository]] is a conceptual term that refers to a location where files and/or data is stored and managed. | |||
<section end="Repository" /> | <section end="Repository" /> | ||
=== Separation === | === Separation === | ||
<section begin="Separation" /> | <section begin="Separation" /> | ||
[[Separation (Concept)|Separation]] is a conceptual term that refers to the process of taking an unorganized [[image:GrooperIcon_Batch.png]] '''[[Batch (Object)|Batch]]''' of loose [[image:GrooperIcon_BatchPage.png]] '''[[Batch Page|Batch Pages]]''' and organizing them into document folders. This is done so Grooper can later assign a Document Type to each document folder in a process known as Classification. | |||
<section end="Separation" /> | <section end="Separation" /> | ||
=== TF-IDF === | === TF-IDF === | ||
<section begin="TF-IDF" /> | <section begin="TF-IDF" /> | ||
[[TF-IDF (Concept)|TF-IDF]] is a conceptual term that refers to ([https://en.wikipedia.org/wiki/Tf%E2%80%93idf term frequency-inverse document frequency]), a numerical statistic intended to reflect how important a word is to a document within a collection (or document set or [https://en.wikipedia.org/wiki/Text_corpus corpus]). It is how '''Grooper''' uses [https://en.wikipedia.org/wiki/Machine_learning machine learning] for training-based document [[Classification (Concept)|classification]] (via the [[Lexical (Classification Method)|Lexical]] method) and data extraction (via the [[image:GrooperIcon_FieldClass.png]] [[Field Class (Object)|Field Class]] extractor). | |||
<section end="TF-IDF" /> | <section end="TF-IDF" /> | ||
=== Table Extraction === | === Table Extraction === | ||
<section begin="Table Extraction" /> | <section begin="Table Extraction" /> | ||
[[Table Extraction (Concept)|Table Extraction]] is a conceptual term that refers to '''Grooper's''' functionality to extract data from [https://en.wikipedia.org/wiki/Table_cell cells] in [https://en.wikipedia.org/wiki/Table_(information) tables]. This is accomplished by configuring the [[image:GrooperIcon_DataTable.png]] '''[[Data Table (Object)|Data Table]]''' '''[[Data Element (Concept)|Data Element]]''' in a [[image:GrooperIcon_DataModel.png]] '''[[Data Model (Object)|Data Model]]'''. | |||
<section end="Table Extraction" /> | <section end="Table Extraction" /> | ||
=== Test Batch === | === Test Batch === | ||
<section begin="Test Batch" /> | <section begin="Test Batch" /> | ||
[[Test Batch (Concept)|Test Batch]] is a conceptual term that refers to any [[image:GrooperIcon_Batch.png]] '''[[Batch (Object)|Batch]]''' created in the '''Test''' folder of the '''Batches''' folder in the [[Node Tree (UI Element)|Node Tree]]). | |||
<section end="Test Batch" /> | <section end="Test Batch" /> | ||
=== Thread === | === Thread === | ||
<section begin="Thread" /> | <section begin="Thread" /> | ||
[[Thread (Concept)|Thread]] is a conceptual term that refers to the smallest unit of processing that can be performed within an [https://en.wikipedia.org/wiki/Operating_system operating system]. | |||
<section end="Thread" /> | <section end="Thread" /> | ||
=== Training-Based Approaches to Document Classification === | === Training-Based Approaches to Document Classification === | ||
<section begin="Training-Based Approaches to Document Classification" /> | <section begin="Training-Based Approaches to Document Classification" /> | ||
[[Training-Based Approaches to Document Classification (Concept)|Training-Based Approaches to Document Classification]] is a conceptual term that refers to an approach to document [[Classification (Concept)|classification]] that classifies [[image:GrooperIcon_BatchFolder.png]] '''[[Batch Folder|Batch Folders]]''' according to the similarity of unclassified '''[[Batch Folder|Batch Folders]]''' to trained examples of that kind of '''[[Document Type (Object)|Document Type]]'''. | |||
<section end="Training-Based Approaches to Document Classification" /> | <section end="Training-Based Approaches to Document Classification" /> | ||
=== Training Batch === | === Training Batch === | ||
<section begin="Training Batch" /> | <section begin="Training Batch" /> | ||
[[Training Batch (Concept)|Training Batch]] is a conceptual term that refers to a more convenient way to work with all of the samples a [[image:GrooperIcon_ContentModel.png]] [[Content Model (Object)|Concent Model]] has been trained against. You can also still look at the '''[[Form Type (Object)|Form Types]]''' underneath each '''[[Content Type (Concept)|Content Type]]''', but the '''Training Set''' can show you all the samples in one place. | |||
<section end="Training Batch" /> | <section end="Training Batch" /> | ||
=== UNC Path === | === UNC Path === | ||
<section begin="UNC Path" /> | <section begin="UNC Path" /> | ||
[[UNC Path (Concept)|UNC Path]] is a conceptual term that refers to [https://en.wikipedia.org/wiki/Path_(computing)#UNC UNC (Universal Naming Convention)] which is a standard used in [https://en.wikipedia.org/wiki/Microsoft_Windows Microsoft Windows] for accessing [https://en.wikipedia.org/wiki/Shared_resource shared network folders]. | |||
<section end="UNC Path" /> | <section end="UNC Path" /> | ||
=== URL Endpoints for Review === | === URL Endpoints for Review === | ||
<section begin="URL Endpoints for Review" /> | <section begin="URL Endpoints for Review" /> | ||
[[URL Endpoints for Review (Concept)|URL Endpoints for Review]] is a conceptual term that refers to three [https://en.wikipedia.org/wiki/URL URL] [https://en.wikipedia.org/wiki/Web_API#Endpoints endpoints] that can be used to open '''''[[Review (Activity)|Review]]''''' tasks in the '''[[Web Client (Application)|Grooper Web Client]]''', given certain information like the '''Grooper''' '''''Repository ID''''', [[image:GrooperIcon_BatchProcess.png]] '''[[Batch Process (Object)|Batch Process]]''' name, [[image:GrooperIcon_Batch.png]] '''[[Batch (Object)|Batch]]''' '''''Id''''' and more. | |||
<section end="URL Endpoints for Review" /> | <section end="URL Endpoints for Review" /> | ||
=== Waterfall Classification === | === Waterfall Classification === | ||
<section begin="Waterfall Classification" /> | <section begin="Waterfall Classification" /> | ||
[[Waterfall Classification (Concept)|Waterfall Classification]] is a conceptual term that refers to a [[Classification (Concept)|classification]] notion in '''Grooper''' that manipulates the '''''Positive Extractor''''' property to prioritize training similarity in order to achieve a middle ground between high specificity and accuracy, and generality with minimal accuracy. This is helpful whenever '''[[Batch Folder|Batch Folders]]''' get misclassified, and simply retraining won't help. | |||
<section end="Waterfall Classification" /> | <section end="Waterfall Classification" /> | ||
=== XML Schema Integration === | === XML Schema Integration === | ||
<section begin="XML Schema Integration" /> | <section begin="XML Schema Integration" /> | ||
[[XML Schema Integration (Concept)|XML Schema Integration]] is a conceptual term that refers to '''Grooper's''' ability to interact with [https://en.wikipedia.org/wiki/XML_schema XML schemas] and the configuration required to do so. | |||
<section end="XML Schema Integration" /> | <section end="XML Schema Integration" /> | ||
| Line 861: | Line 892: | ||
<section begin="Node Tree" /> | <section begin="Node Tree" /> | ||
<section end="Node Tree" /> | <section end="Node Tree" /> | ||
=== Overrides === | |||
<section begin="Overrides" /> | |||
<section end="Overrides" /> | |||
=== Summary Tabs === | === Summary Tabs === | ||
<section begin="Summary Tabs" /> | <section begin="Summary Tabs" /> | ||
<section end="Summary Tabs" /> | <section end="Summary Tabs" /> | ||
Revision as of 14:00, 23 April 2024
Activity
Activity is a property on
Batch Process Step objects. Activities define specific document processing operations done to a
Batch,
Batch Folder, or
Batch Page.
Batch Process Steps configured with specific Activities are frequently referred by the name of the Activity followed by the word "step". For example: Classify Step.
Classify
Classify is an Activity that "classifies"
Batch Folders in a
Batch by assigning them a Content Type using patterns, lexical understanding, or rules as defined by a
Content Model.
Clip Frames
The Clip Frames Activity extracts defined areas from microfiche card images, creating new image frames or layers for focused analysis or processing.
Detect Frames
The Detect Frames Activity locates and identifies frame lines on microfiche card images, enabling the isolation of areas within the frames for further data extraction or processing.
Execute
The Execute Activity runs a specified child command, allowing for the modular and controlled execution of tasks within a larger automated workflow.
Export
The Export Activity facilitates the transfer of documents and extracted information to external systems or formats, completing the data processing workflow.
Extract
The Extract Activity retrieves relevant information, defined by Data Elements, from
Batch Folders, transforming unstructured or semi-structured content into structured, usable data.
Image Processing
The Image Processing Activity enhances and optimizes
Batch Pages for better recognition and data extraction results.
Initialize Card
The Initialize Card Activity prepares and configures microfiche card images for further processing.
Recognize
The Recognize Activity interprets
Batch Pages and
Batch Folders, converting them into machine-readable text and capturing layout data for comprehensive analysis and data extraction. This will attach a text and/or layoutData file to the respective object.
Render
The Render Activity normalizes electronic document content from file formats Grooper cannot read innately to a PDF format. This allows Grooper to extract the text via the Recognize Activity.
Review
The Review Activity facilitates human evaluation and validation of processed
Batch Folders and extracted data for accuracy and completeness.
Send Mail
The Send Mail Activity automates the dispatch of emails with or without attachments, based on
Batch Process events and conditions.
Separate
The Separate Activity sorts
Batch Pages into individual
Batch Folders, distinguishing them for independent processing and organization.
Split Pages
Multi-page documents (typically PDFs and TIFFs) come into Grooper represented as single
Batch Folders. The Split Pages Activity exposes
Batch Pages as child objects of the
Batch Folders for individualized processing and handling.
XML Transform
The XML Transform Activity applies XSLT stylesheets to XML data to modify or reformat the output structure for various purposes.
Application
A Grooper repository consists of a series of tables in a database, and a File Store containing relevant files associated to objects that exist within that database. An Grooper application is the interface by which a user can interact with that repository of information in an intuitive way.
Grooper Command Console
The Grooper Command Console is a command-line interface that performs system configuration and administration tasks within Grooper.
Web Client
The Grooper Web Client allows users to connect to Grooper via a web browser using a URL. The URL is pointed at a website hosted by a server on which Grooper is installed and Internet Information Services configured.
Behavior
Content Type and Export Behaviors are configurable actions that automate processing tasks based on the identified Content Type of a
Batch Folder.
Export Behavior
An Export Behavior defines the conditions and actions for exporting
Batch Folders and their associated data from Grooper to other systems.
Labeling Behavior
A Labeling Behavior is a Content Type Behavior designed to collect and utilize a document's field labels in a variety of ways. This includes functionality for Classification and Extraction.
PDF Data Mapping
PDF Data Mapping is a Content Type Behavior designed to create an exportable PDF file with additional native PDF elements.
CMIS Connection Type
CMIS Connection Type, or "binding", establishes the communication protocols used to connect Grooper with content management systems adhering to the CMIS standard.
AppXtender
The AppXtender CMIS Connection Type, or "binding", connects Grooper to the ApplicationXtender content management system for import and export operations.
Box
The Box CMIS Connection Type, or "binding", connects Grooper to the Box content management system for import and export operations.
Exchange
The Exchange CMIS Connection Type, or "binding", connects Grooper to the Microsoft Exchange Server mail server for import and export operations.
FTP
The FTP CMIS Connection Type, or "binding", connects Grooper to FTP directories for import and export operations.
IMAP
The IMAP CMIS Connection Type, or "binding", connects Grooper to email messages and folders through an IMAP email server.
NTFS
The NTFS CMIS Connection Type, or "binding", connects Grooper to files and folders in the Microsoft Windows NTFS file system.
OneDrive
The OneDrive CMIS Connection Type, or "binding", connects Grooper to Microsoft OneDrive cloud services.
SFTP
The SFTP CMIS Connection Type, or "binding", connects Grooper to SFTP directories for import and export operations.
The SharePoint CMIS Connection Type, or "binding", connects Grooper to Microsoft SharePoint, providing access to content stored in "document libraries" and "picture lLibraries".
Classification Method
The Classification Method property determines the technique used for document classification within a
Content Model, enabling the sorting of
Batch Folders into categories based on their content or structure. It can utilize pattern matching, machine learning models, or other methodologies to identify and organize documents accurately.
Labelset-Based
Labelset-Based is a Classification Method that leverages the labels defined via a Labeling Behavior to classify
Batch Folders.
Lexical
The Lexical Classification Method classifies
Batch Folders based on their text content by utilizing either pre-configured training or rules. This is achieved through the analysis of word frequencies or defined rules that identify document types .
Rules-Based
The Rules-Based Classification Method employs defined "rules" on
Document Types to classify
Batch Folders, utilizing Positive Extractor and Negative Extractor properties to accurately categorize them through rule application, thereby ensuring
Batch Folders match predefined criteria .
Visual
The Visual Classification Method uses image data instead of text data to determine the
Document Type assigned to a
Batch Folder during classification. Instead of using text-based extractors, an
IP Profile is used with an Extract Features IP Command to obtain data pertaining to a
Batch Folder's image(s). Document samples are trained as examples of a Document Type.
Collation Provider
The Collation Provider property of a
Data Type defines the method for converting its raw results into a final result set, governing how lists of matches from the Data Type are combined and interpreted to produce the output data of the Data Type.
AND
The AND Collation Provider of a
Data Type returns results only when each individual extractor specified within it gets at least one hit, thus acting as a logical “AND” operator across multiple extractors .
Array
The Array Collation Provider of a
Data Type matches a list of values arranged in horizontal, vertical, or flow order, combining instances that qualify into a single result .
Combine
The Combine Collation Provider of a
Data Type combines instances from returned results based on a specified grouping, controlling how extractor results are assembled together for output.
Key-Value List
The Key-Value List Collation Provider of a
Data Type matches instances where a key and a list of one or more values appear together on the document, adhering to a specific layout pattern .
Key-Value Pair
The Key-Value Pair Collation Provider of a
Data Type matches instances where a key is paired with a value on the document in a specific layout, essential for extracting label-value pairs .
Ordered Array
The Ordered Array Collation Provider of a
Data Type finds sequences of values where one result is present for each extractor, in the order they appear .
Pattern-Based
The Pattern-Based Collation Provider of a
Data Type uses regular expressions to sequence returned results into a final result set.
Split
The Split Collation Provider of a
Data Type separates a data instance at each match returned by the Data Type.
Concept
There are many objects and properties a user can configure in Grooper, however, gaining an understanding how, why, and when to use these objects and properties is powered by one's understanding of the underlying concepts that define what what these objects and properties are doing and why.
Activity Processing
Activity Processing is a conceptual term that refers to the execution of a sequence of configured tasks, such as classification, extraction, or data enhancement on documents, which are performed within a
Batch Process to transform raw data from documents into structured and actionable information.
Asset Management
Asset Management is a conceptual term that refers to an idea centered around a best practice to follow to make the organization of objects in a Grooper repository clean and easy to follow. Adhering to a standard naming convention, especially if multiple users are designing in Grooper, will reduce the time you spend configuring and troubleshooting issues.
CMIS+
CMIS+ is a conceptual term that refers to Grooper's CMIS+ architecture that provides a standardized access to document content and metadata across a variety of external storage platforms.
CMIS
CMIS is a conceptual term that refers to CMIS (Content Management Interoperability Services): an open standard allowing different content management systems to share information over the Internet.
CMIS Query
CMIS Query is a conceptual term that refers to the fact that CMIS Queries are utilized to search documents in CMIS Repositories and to filter documents upon import when using the Import Query Results Import Provider.
CSS Data Viewer Styling
CSS Data Viewer Styling is a conceptual term that refers to the idea that the Grooper Web Client's Data View task view of the Review interface is styled using CSS. This gives you a great deal of control over a
Data Model's appearance and layout during document review.
Classification
Classification is a conceptual term that refers to the process of identifying and organizing documents into categorical types based on their content or layout, often using machine learning, rules, or pattern recognition for efficient document management and data extraction workflows. Specifically, the Classify Activity will assign a Content Type to a File:GrooperIcon Batch Folder Batch Folder.
Code Expressions
Expressions (not to be confused with regular expressions) is a conceptual term that refers to snippets of VB.Net code that expand Grooper’s core functionality.
Combined Methods
Combining Methods is a conceptual term that refers to the idea that a user can leverage multiple Classification Methods to overcome the shortcomings of an individual method.
Content Type
Content Type is a conceptual term that refers to the grouping of three Grooper objects:
Content Models,
Content Categories, and
Document Types.
Data Context
Data Context is a conceptual term that gives definition to data that, without it, is otherwise meaningless.
Data Element
Data Element is a conceptual term that refers to the grouping of five Grooper objects:
Data Models,
Data Sections,
Data Fields,
Data Tables, and
Data Columns.
Data Extractor
Data Extractor is a conceptual term that refers to the grouping of all extractor types and extractor objects.
Data Instance
Data Instance is a conceptual term that refers to an encapsulation of text data within a document. Data Instances are the hierarchy of text data that Grooper's extraction mechanisms create.
EDI Integration
EDI Integration is a conceptual term that refers to Grooper's ability to process EDI files.
Expressions
Expressions (not to be confused with regular expressions) is a conceptual term that refers to snippets of VB.Net code that expand Grooper’s core functionality.
Expressions Cookbook
Expressions Cookbook is a conceptual term that refers to a reference list for commonly used expressions in Grooper.
Field Mapping
Field Mapping is a conceptual term that refers to how logical connections are made between metadata content in Grooper and an external storage platform.
Five Phases of Grooper
Five Phases of Grooper is a conceptual term that seeks to build understanding of how documents are processed through Grooper.
Flow Collation
Flow Collation is a conceptual term used to define a type of layout used in Collation Providers of
Data Types.
Footer Rows and Footer Modes is a conceptual term that refers to how a "footer row" (enabled by the Generate Footer Row property of a
Data Table) provides Grooper users a quick way to validate numerical data in a
Data Column. The Data Column's Footer Mode property controls if and how a total is determined for numerical values in a Data Column.
Fuzzy RegEx
Fuzzy Regex is a conceptual term that refers to the usage of fuzzy logic within extractor types that leverage regular expressions to match patterns via the enabling of the Fuzzy Matching' property.
GPT Integration
GPT Integration is a conceptual term that refers to the usage of OpenAI's GPT models within Grooper to enhance the capabilities of data extractors, classification, and lookups.
Grooper Infrastructure
Grooper Infrastructure is a conceptual term that refers to computing underpinnings of what makes up a Grooper repository and the software that allows interface with it.
Grooper Repository
Grooper Repository is a conceptual term that refers to the environment used to create, configure and execute objects in Grooper. It provides the framework to "do work" in Grooper.
Grooper Service
[[[Grooper Services (Concept)|Grooper Services]]] is a conceptual term that refers to the various executable programs that run as a Windows Services to facilitate Grooper processing. Service instances are installed, configured, started and stopped using Grooper Config.
Image Processing
Image Processing is a conceptual term that refers to how Grooper applies a variety of techniques to enhance scanned documents' quality, improving OCR accuracy by removing imperfections and adjusting visual characteristics to prepare images for data extraction and classification.
Import Mode and Document Linking
Import Mode and Document Linking is a conceptual term that refers to the usage of the Import Mode property. This affects whether or not an imported document maintains a link to its original file and/or if a copy of the file is made on import or not.
LINQ to Grooper Objects
LINQ to Grooper Objects is a conceptual term that refers to the ability of Grooper to leverage LINQ syntax in expressions.
Layered OCR
Layered OCR is a conceptual term that refers to the usage of the Layered OCR setting of the OCR Engine property of an
OCR Profile. The use of this setting enables the usage of secondary OCR Profiles on a single page. The OCR results from these secondary OCR Profiles are merged with (or layered on top of) the primary OCR Profile's results.
Layout Data
Layout Data is a conceptual term that refers to information such as line locations, OMR checkbox locations and states, barcode values, and detected shapes captured by certain image processing commands. This data is stored as an attached file on a
Batch Folder or
Batch Page object and can later be recalled by various functions within Grooper that rely on the presence of that data to function.
Microfiche Processing
Microfiche Processing is a conceptual term that refers to how Grooper leverages several IP Commands to accurately process microform documents.
Microsoft Office Integration
Microsoft Office Integration is a conceptual term that refers to Grooper's ability to convert Microsoft Word and Microsoft Excel files into formats that Grooper can read.
OCR
OCR is a conceptual term that stands for Optical Character Recognition. It allows text from paper documents to be digitized, in order to be searched or edited by other software applications. OCR converts typed or printed text from digital images of physical documents into machine readable, encoded text
OCR Synthesis
OCR Synthesis is a conceptual term that refers to Grooper's unique method of pre-processing and re-processing raw results from the OCR Engine to get better results out of it.
Object Nomenclature
Object Nomenclature is a conceptual term that refers to the idea that a mastery of a Grooper environment is greatly enhanced by understanding the myriad of objects that can exist and how they are related.
PDF Page Types
PDF Page Types is a conceptual term that refers to specific types of PDF pages. Page types describe the kind of content in a PDF page and informs Grooper how certain Activities should process the page. For example, "single image" pages are OCR'd by the Recognize activity where "text only" pages have their native text extracted.
Regular Expression
Regular Expression is a conceptual term that refers to a standard syntax designed to parse text strings. This is a way of finding information in a block of text. It is the primary method by which Grooper extracts and returns data from documents.
Repository
Repository is a conceptual term that refers to a location where files and/or data is stored and managed.
Separation
Separation is a conceptual term that refers to the process of taking an unorganized
Batch of loose
Batch Pages and organizing them into document folders. This is done so Grooper can later assign a Document Type to each document folder in a process known as Classification.
TF-IDF
TF-IDF is a conceptual term that refers to (term frequency-inverse document frequency), a numerical statistic intended to reflect how important a word is to a document within a collection (or document set or corpus). It is how Grooper uses machine learning for training-based document classification (via the Lexical method) and data extraction (via the
Field Class extractor).
Table Extraction
Table Extraction is a conceptual term that refers to Grooper's functionality to extract data from cells in tables. This is accomplished by configuring the
Data Table Data Element in a
Data Model.
Test Batch
Test Batch is a conceptual term that refers to any
Batch created in the Test folder of the Batches folder in the Node Tree).
Thread
Thread is a conceptual term that refers to the smallest unit of processing that can be performed within an operating system.
Training-Based Approaches to Document Classification
Training-Based Approaches to Document Classification is a conceptual term that refers to an approach to document classification that classifies
Batch Folders according to the similarity of unclassified Batch Folders to trained examples of that kind of Document Type.
Training Batch
Training Batch is a conceptual term that refers to a more convenient way to work with all of the samples a
Concent Model has been trained against. You can also still look at the Form Types underneath each Content Type, but the Training Set can show you all the samples in one place.
UNC Path
UNC Path is a conceptual term that refers to UNC (Universal Naming Convention) which is a standard used in Microsoft Windows for accessing shared network folders.
URL Endpoints for Review
URL Endpoints for Review is a conceptual term that refers to three URL endpoints that can be used to open Review tasks in the Grooper Web Client, given certain information like the Grooper Repository ID,
Batch Process name,
Batch Id and more.
Waterfall Classification
Waterfall Classification is a conceptual term that refers to a classification notion in Grooper that manipulates the Positive Extractor property to prioritize training similarity in order to achieve a middle ground between high specificity and accuracy, and generality with minimal accuracy. This is helpful whenever Batch Folders get misclassified, and simply retraining won't help.
XML Schema Integration
XML Schema Integration is a conceptual term that refers to Grooper's ability to interact with XML schemas and the configuration required to do so.
Export Type
CMIS Export
Data Export
Extractor Type
Detect Signature
Find Barcode
Highlight Zone
Labeled OMR
Labeled Value
List Match
Ordered OMR
Pattern Match
Read Barcode
Read Zone
Word Match
Zonal OMR
IP Command
Barcode Detection
Binarize
Extract Page
Line Removal
Scratch Removal
Shape Detection
Shape Removal
Import Provider
CMIS Import
Import Descendants
Import Query Results
Lookup
CMIS Lookup
Database Lookup
Web Service Lookup
Object
Batch
Batch Folder
Batch Page
Batch Process
CMIS Connection
CMIS Repository
Content Category
Content Model
Data Connection
Data Field
Data Model
Data Rule
Data Section
Data Table
Data Type
Document Type
Field Class
File Store
Form Type
IP Profile
Lexicon
Machine
OCR Profile
Object Library
Page Type
Processing Queue
Project
Review Queue
Scanner Profile
Separation Profile
Value Reader
Property
Confidence Multiplier and Output Confidence
Constrained Wrap
Content Type Filter
OCR Engine
Output Extractor Key
Paragraph Marking
Permission Sets
Scope
Secondary Types
Tab Marking
Vertical Wrap
Section Extract Method
Nested Table
Transaction Detection
Separation Provider
Separation Provider
Change in Value Separation
Control Sheet Separation
EPI Separation
ESP Auto Separation
Event-Based Separation
Multi Separator
Pattern-Based Separation
Undo Separation
Service
API Services
Activity Processing
Grooper Licensing
Table Extract Method
Delimited Extract
Fluid Layout
Grid Layout
Row Match
Tabular Layout
UI Element
Document Viewer
Node Tree
Overrides
Summary Tabs