Classify Method: Difference between revisions

From Grooper Wiki
Created page with "{{AutoVersion}} {{stubs}} <blockquote>{{#lst:Glossary|Classification Method}}</blockquote> Please see the following articles for more information: * Classification * Labelset-Based * Lexical * Rules-Based * Visual"
 
No edit summary
Line 11: Line 11:
* [[Rules-Based (Classification Method)|Rules-Based]]
* [[Rules-Based (Classification Method)|Rules-Based]]
* [[Visual (Classification Method)|Visual]]
* [[Visual (Classification Method)|Visual]]
== Glossary ==
<u><big>'''Batch Folder'''</big></u>: {{#lst:Glossary|Batch Folder}}
<u><big>'''Classification Method'''</big></u>: {{#lst:Glossary|Classification Method}}
<u><big>'''Classification'''</big></u>: {{#lst:Glossary|Classification}}
<u><big>'''Content Model'''</big></u>: {{#lst:Glossary|Content Model}}
<u><big>'''Labelset-Based'''</big></u>: {{#lst:Glossary|Labelset-Based}}
<u><big>'''Lexical'''</big></u>: {{#lst:Glossary|Lexical}}
<u><big>'''Rules-Based'''</big></u>: {{#lst:Glossary|Rules-Based}}
<u><big>'''Visual'''</big></u>: {{#lst:Glossary|Visual}}

Revision as of 10:30, 3 May 2024

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

STUB

This article is a stub. It contains minimal information on the topic and should be expanded.

Please see the following articles for more information:

Glossary

Batch Folder: The folder Batch Folder is an organizational unit within a inventory_2 Batch, allowing for a structured approach to managing and processing a collection of documents. Batch Folder nodes serve two purposes in a Batch. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other Batch Folders and/or contract Batch Page nodes as children.

  • Batch Folders are frequently referred to simply as "documents" or "folders" depending on how they are used in the Batch.

Classification Method:

Classification: Classification is the process of identifying and organizing documents into categorical types based on their content or layout. Classification is key for efficient document management and data extraction workflows. Grooper has different methods for classifying documents. These include methods that use machine learning and text pattern recognition. In a Grooper Batch Process, the Classify Activity will assign a Content Type to a folder Batch Folder.

Content Model: stacks Content Model nodes define a classification taxonomy for document sets in Grooper. This taxonomy is defined by the collections_bookmark Content Categories and description Document Types they contain. Content Models serve as the root of a Content Type hierarchy, which defines Data Element inheritance and Behavior inheritance. Content Models are crucial for organizing documents for data extraction and more.

Labelset-Based: "Labelset-Based" is a Classify Method that leverages the labels defined via a Labeling Behavior to classify folder Batch Folders.

Lexical: "Lexical" is a Classify Method that classifies folder Batch Folders based on the text content of trained document examples. This is achieved through the statistical analysis of word frequencies that identify description Document Types.

Rules-Based: "Rules-Based" is a Classify Method that employs "rules" defined on each description Document Type to classify folder Batch Folders. Positive Extractor and Negative Extractor properties are configured for each Document Type to positively or negatively associate a Batch Folder based on predefined criteria.

  • Where the Positive and Negative Extractors will impact all Classify Method results, the Rules-Based method classifies using only these properties and nothing else.

Visual: "Visual" is a Classify Method that uses image analysis instead of text data to determine the description Document Type assigned to a folder Batch Folder during classification. Instead of using text-based extractors, an "Extract Features" IP Command in an perm_media IP Profile is used to collect image-based data from a Batch Folder's image(s). This image-based data is compared against that of previously trained document examples of each Document Type to classify the Batch Folder.