Classify Method

From Grooper Wiki

STUB

This article is a stub. It contains minimal information on the topic and should be expanded.

"Classify Methods" define classification logic used by stacks Content Models during the unknown_document Classify activity. Classify Methods organize document content in Grooper by assigning folder Batch Folders a description Document Type.

  • Classify Methods analyze documents (Batch Folders) to determine what kind of document it is.
  • Each Classify Methods analyzes documents according to different methodologies to organize documents accurately. This includes text-based pattern matching, computer vision, machine learning models, label sets and more.
  • Classify Methods are configured by setting and configuring a Content Model's "Classification Method" property.

About

There are currently 5 Classify Methods in Grooper:

  • Rules-Based - Classifies documents using simple "rules" defined by each Document Type's Positive Extractor and Negative Extractor properties.
  • Lexical - Classifies documents based on text-based features on trained examples of each Document Type in the Content Model.
    • The Lexical method can be used to classify already separated, unclassified Batch Folders during the Classify activity. It also can be used with ESP Auto Separation to separate loose pages into classified Batch Folders during the Separate activity.
  • Labelset-Based - Classifies documents based on the presence of text-based labels defined in each Document Type's "label set"
  • Search Classifier - Classifies documents by finding similar documents in a search index. The Search Classifier method compares large language model (LLM) embeddings on unclassified documents to embeddings already collected for documents in the search index.
  • Visual - Classifies documents with computer vision, using visual features on trained examples of each Document Type in the Content Model.
    • This is a less common Classify Method. It is suitable only for highly-structured documents like forms whose general visual appearance does not change from document to document.
    • This is the only Classify Method that does not rely on text data. It can be used at scan-time when combined with "Event-Based Separation" using the "Content Type" event.