Page Type (Node Type)

From Grooper Wiki

STUB

This article is a stub. It contains minimal information on the topic and should be expanded.

Page Types are Content Types in a Content Model representing individual pages of a Form Type. These objects store machine learning training data for Lexical and Visual classification methods.

A single page of a trained document is created automatically within Grooper as a child of a trained document's Form Type. If the trained document has five pages, the Form Type will have five Page Type children. They are used to store sample pages and their training data (the weightings of classification features).

Page Type objects are critical for the page-level classification performed by the ESP Auto Separation Separation Provider.

Glossary

Content Model: stacks Content Model nodes define a classification taxonomy for document sets in Grooper. This taxonomy is defined by the collections_bookmark Content Categories and description Document Types they contain. Content Models serve as the root of a Content Type hierarchy, which defines Data Element inheritance and Behavior inheritance. Content Models are crucial for organizing documents for data extraction and more.

Content Type: Content Types are a class of node types used used to classify folder Batch Folders. They represent categories of documents (stacks Content Models and collections_bookmark Content Categories) or distinct types of documents (description Document Types). Content Types serve an important role in defining Data Elements and Behaviors that apply to a document.

ESP Auto Separation: ESP Auto Separation is a Separation Provider used for document separation. It is unique in that it both separates and classifies documents at the same time. It uses page-level classification training examples (among other things) to determine where to insert document folders in a inventory_2 Batch.

Form Type: two_pager Form Types represent trained variations of a description Document Type. These nodes store machine learning training data for Lexical and Visual document classification methods.

Lexical: "Lexical" is a Classify Method that classifies folder Batch Folders based on the text content of trained document examples. This is achieved through the statistical analysis of word frequencies that identify description Document Types.

Separation Provider: The Provider property of the Separate Activity defines the type of separation to be performed at the designated Scope.

Separation: Separation is the process of taking an unorganized inventory_2 Batch of loose contract Batch Pages and organizing them into documents represented by folder Batch Folders in Grooper. This is done so Grooper can later assign a description Document Type to each document folder in a process known as "classification".

Visual: "Visual" is a Classify Method that uses image analysis instead of text data to determine the description Document Type assigned to a folder Batch Folder during classification. Instead of using text-based extractors, an "Extract Features" IP Command in an perm_media IP Profile is used to collect image-based data from a Batch Folder's image(s). This image-based data is compared against that of previously trained document examples of each Document Type to classify the Batch Folder.