Content Type

From Grooper Wiki
Revision as of 16:29, 6 August 2025 by Dgreenwood (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Content Types are a class of node types used used to classify folder Batch Folders. They represent categories of documents (stacks Content Models and collections_bookmark Content Categories) or distinct types of documents (description Document Types). Content Types serve an important role in defining Data Elements and Behaviors that apply to a document.

About

Types of Content Types

In Grooper, the "Content Type" nodes consist of:

stacks Content Model ...
collections_bookmark Content Category and ...
description Document Type nodes.

These nodes create a classification taxonomy in Grooper. They define how documents are classified, what data to collect from a document, how different kinds of documents are related, and even how certain activities like Export should behave based on how a document is classified.

Content Types work together in Grooper to enable sophisticated document processing workflows. With different types of documents properly classified, they can have their data extracted and are handled according to the rules and behaviors defined by the Document Types within a Content Model.

The relationship between these Content Types is established through a hierarchical inheritance system. Content Categories and Document Types are building blocks within a Content Model seen as the "tree". Content Categories act as the "branches". Document Types are the "leaves" of the hierarchy.

Content Types and document classification

Documents are classified by having a Content Type (usually a Document Type) assigned either by the Classify activity, manually by a user, or other mechanisms in Grooper.

The Content Model plays a special role in defining the "Classify Method" used to classify documents. Classify Methods define the logic for

Content Types and data extraction

"Data Elements" represent information written on the document and contain instructions on how to collect it.

Data Elements can be defined for each Content Type by adding a Data Model. Data Elements (including Data Fields, Data Sections and Data Tables) are added these Data Models. Data Elements are inherited down the "tree" of the Content Type hierarchy.

  • Data Elements defined at the Content Model level are applied to all Content Types within the Content Model and will apply to the whole "tree".
  • Data Elements defined at the Content Category level are applied to all Content Types that exist within that specific "branch".
  • Data Elements defined on a Document Type will apply to that specific "leaf".


  • This is why documents must be "classified" in order to have their data extracted. It is the Content Type that determines which Data Model is used to collect data when the Extract activity runs.

Content Types and "Behaviors"

"Behaviors" are a set of different configurations that affect certain Activities and other areas of Grooper based on how a document is classified. They include:

  • Import Behaviors - Defining how documents and metadata are imported from CMIS Repositories based on their classification.
  • Export Behaviors - Defining how documents and data are exported based on their classification.
  • Labeling Behaviors - Defining how Label Sets are used for documents based on their classification.
  • PDF Data Mapping - Defining several PDF generation capabilities for documents based on their classification.
  • Indexing Behavior - Defining how documents are added to a Grooper search index based on their classification.

Behaviors also respect the Content Type hierarchy.

  • Behaviors defined at the Content Model level are applied to all Content Types within the Content Model, unless a child Content Type has its own Behavior configured. Content Category and Document Type Behavior configurations will override the Content Model configuration.
  • Behaviors defined at the Content Category level are applied to all Content Types within that branch, unless a child Content Type has its own Behavior configured. Child Content Category and Document Type Behavior configurations will override a parent Content Category configuration.
  • Behaviors defined at the Document Type level are applied to that Document Type only. Document Type Behavior configurations will override all parent Content Category and/or Content Model configurations.


Related Node Types

Content Model

stacks Content Model nodes define a classification taxonomy for document sets in Grooper. This taxonomy is defined by the collections_bookmark Content Categories and description Document Types they contain. Content Models serve as the root of a Content Type hierarchy, which defines Data Element inheritance and Behavior inheritance. Content Models are crucial for organizing documents for data extraction and more.

Content Category

collections_bookmark A Content Category is a container for other Content Category or description Document Type nodes in a stacks Content Model. Content Categories are often used simply as organizational buckets for Content Models with large numbers of Document Types. However, Content Categories are also necessary to create branches in a Content Model's classification taxonomy, allowing for more complex Data Element inheritance and Behavior inheritance.

Document Type

description Document Type nodes represent a distinct type of document, such as an invoice or a contract. Document Types are created as child nodes of a stacks Content Model or a collections_bookmark Content Category. They serve three primary purposes:

  1. They are used to classify documents. Documents are considered "classified" when the folder Batch Folder is assigned a Content Type (most typically, a Document Type).
  2. The Document Type's data_table Data Model defines the Data Elements extracted by the Extract activity (including any Data Elements inherited from parent Content Types).
  3. The Document Type defines all "Behaviors" that apply (whether from the Document Type's Behavior settings or those inherited from a parent Content Type).

What about Form Types and Page Types?

Technically speaking, Form Types and Page Types are also Content Types, but they aren't typically used in the same way. Form Types and Page Types are created automatically when training example documents for classification. They hold the feature weighting data for documents.

  • Form Types
    • When a Document Type is trained for classification, the training samples are created as Form Types.
    • Form Types are generated automatically when training documents for Lexical classification (and less commonly for Visual classification).
  • Page Types
    • The Page Types are the individual pages of a Form Type. All training weightings are stored on the Page Types for each page of the training document.
    • Page Types are generated automatically when training documents for Lexical classification (and less commonly for Visual classification).


Object Model info

Grooper Type Name

Grooper.Core.ContentType

Inheritance

Grooper Object (Grooper.GrooperObject)
Connected Object (Grooper.ConnectedObject)
Node (Grooper.GrooperNode)
Content Type (Grooper.Core.ContentType)

Derived Types

Content Type (Grooper.Core.ContentType)
Content Category (Grooper.Core.ContentCategory)
Content Model (Grooper.Core.ContentModel)
Document Type (Grooper.Core.DocumentType)
Form Type (Grooper.Core.FormType)
Page Type (Grooper.Core.PageType)