Batch Folder (Node Type)

From Grooper Wiki
(Redirected from Batch Folder)

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

The folder Batch Folder is an organizational unit within a inventory_2 Batch, allowing for a structured approach to managing and processing a collection of documents. Batch Folder nodes serve two purposes in a Batch. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other Batch Folders and/or contract Batch Page nodes as children.

  • Batch Folders are frequently referred to simply as "documents" or "folders" depending on how they are used in the Batch.

Overview: Batches and their purpose

A Batch in Grooper is a container for a set of documents and pages that are processed together through a defined workflow. Batches are central to Grooper’s document processing system, enabling users to collect, organize, and process large volumes of documents efficiently. Each Batch moves through a series of steps, known as a Batch Process, which orchestrates activities such as scanning, image processing, classification, data extraction, review, and export.

Batches are typically created when documents are ingested into Grooper—whether by scanning, importing files, or other means. Once created, a Batch provides a structured environment for managing the lifecycle of its contents, from initial capture to final export.

What is a Batch Folder?

A Batch Folder is a hierarchical organizational unit within a Batch. Batch Folders serve as the primary means of grouping and representing documents and subfolders inside a Batch. Each Batch contains a root Batch Folder, and this folder can contain additional Batch Folders (subfolders) and Batch Pages (individual pages).

Batch Folders are essential for structuring the contents of a Batch. They allow users to:

  • Represent individual documents within a Batch.
  • Organize documents into logical groups or sections.
  • Attach files, manage document-level metadata, and track processing status.

Batch Folders as document containers

In Grooper, each document within a Batch is represented by a Batch Folder. This means that when Grooper separates or classifies pages into documents, it creates a new Batch Folder for each document. These folders contain the pages that make up the document, as well as any associated data, attachments, or metadata.

Batch Folders can be nested, allowing for complex document structures (such as documents with appendices or grouped document sets). The hierarchy of Batch Folders and Batch Pages within a Batch mirrors the logical structure of the documents being processed.

Key properties and capabilities of Batch Folders

  • "Content Type": Each Batch Folder can be assigned a Content Type, such as a Document Type defined in the Content Model. This determines the Data Model used for data extraction and validation.
  • "Data Is Valid": Indicates whether the data extracted or entered for the folder is valid according to the assigned Data Model.
  • "Attachment File Name", "MIME Type", and "Size": Batch Folders can store attachments (such as native files or PDFs) and provide information about these files.
  • "Has Local Attachment", "Has PDF Version": Flags indicating the presence of local or PDF attachments.
  • "Links": Batch Folders can reference external content or related documents via Content Links.
  • "Total Pages", "Total Folders": Provide counts of pages and subfolders contained within the folder.
  • "All Pages", "All Folders": Enumerate all descendant pages and folders, respectively.

Batch Folder hierarchy and navigation

Batch Folders are organized in a tree structure:

  • The root Batch Folder represents the entire Batch.
  • Each Batch Folder can contain other Batch Folders (subfolders) and Batch Pages.
  • This structure allows for flexible organization, such as grouping documents by type, section, or other criteria.

Users can navigate the Batch Folder hierarchy to view, process, or review documents at any level.

Batch Folders and the Batch Process

The Batch Process defines the workflow that a Batch and its contents follow. Batch Folders are the primary units of work for many activities in the Batch Process, especially those related to document-level processing. Many steps in the Batch Process can operate on Batch Folders, performing activities such as:

Batch Process Steps are configured to process Batch Folders by configuring their "Scope" property

Batch Folders and Content Models

Batch Folders play a central role in Grooper’s document classification and data extraction processes, acting as the bridge between the physical organization of documents and the logical structure defined by the Content Model.

The Content Model: Defining Document Types and Data Models

A Content Model is the root object that organizes and describes the types of documents and data elements present in a document set. It establishes a hierarchy of Content Categories and Document Types, each of which may have its own Data Model for extraction. The Content Model provides the configuration and training data required for automated classification and extraction.

Assigning Content Types to Batch Folders

Each Batch Folder can be assigned a Content Type, which is typically a Document Type defined in the Content Model. This assignment is crucial for two main reasons:

  • Document Classification: When Grooper performs classification, it analyzes the contents of each Batch Folder (such as its pages or attachments) and assigns the most appropriate Content Type. This determines what kind of document the folder represents (e.g., Invoice, Contract, Application). Users may also manually classify a document by assigning the Batch Folder a Content Type with the
  • Data Extraction: The assigned Content Type determines which Data Model is used for extracting structured data from the document. The Data Model defines the Data Fields, Data Sections, and Data Tables that Grooper will extract and validate.

The "Content Type" property on a Batch Folder reflects this assignment. Changing the Content Type can affect the available data fields, extraction logic, and downstream processing for that folder.

How Classification Works with Batch Folders

During the classification step in a Batch Process, Grooper evaluates each Batch Folder to determine its document type. This can be done using various classification methods configured in the Content Model, such as machine learning, rules-based, or manual assignment. The result of classification is the assignment of a Content Type to the Batch Folder.

  • If classification is confident, the folder is assigned the detected Content Type.
  • If classification is ambiguous or not configured, the folder may be assigned a "Default Content Type" or left unclassified for manual review.

This classification enables Grooper to treat each Batch Folder as a specific document type, unlocking the appropriate extraction and validation logic.

Data Extraction and Validation

Once a Batch Folder has a Content Type, Grooper uses the associated Data Model to extract data from the folder’s contents (pages, attachments, etc.). The Data Model specifies:

  • Which Data Fields to extract (e.g., Invoice Number, Date, Total).
  • How to extract and validate each field.
  • Any Data Sections or Data Tables relevant to the document type.

The extracted data is stored as index data on the Batch Folder. The "Data Is Valid" property indicates whether the extracted data meets the validation rules defined in the Data Model.

Inheritance and Flexibility

Batch Folders support both primary and secondary Content Types, allowing for multi-classification scenarios. This is useful when a document may belong to multiple categories or require multiple schemas for extraction. Batch Folders also inherit data elements and behaviors from parent Content Types, enabling shared extraction logic across related document types.

Summary of Batch Folder Roles in Classification and Extraction

  • Batch Folders are the units of classification—each folder is evaluated and assigned a Content Type.
  • The assigned Content Type (from the Content Model) determines the Data Model used for extraction.
  • Data extraction and validation are performed at the Batch Folder level, with results stored as index data.
  • Batch Folders support complex scenarios, including multi-classification, inheritance, and flexible document structures.

By linking the physical organization of documents (Batch Folders) with the logical structure defined in the Content Model, Grooper enables powerful, automated document processing workflows that are both accurate and adaptable to a wide range of business needs.

Batch Pages: The Building Blocks of Batch Folders

A Batch Page is the atomic unit of content in Grooper, representing a single scanned or imported page within a Batch. Batch Pages are always contained within Batch Folders, and together, these two elements form the hierarchical structure of a Batch.

Creation of Batch Pages

Batch Pages are created in two primary ways:

  • Scanning: When physical documents are scanned into Grooper, each scanned image becomes a Batch Page.
  • Digital Import: When importing digital files (such as PDFs or multi-page TIFFs), the Split Pages activity is used to convert each page or image frame into an individual Batch Page.

The Split Pages Activity

The Split Pages activity is a foundational step in most Grooper workflows that start by importing files. Its purpose is to take multi-page documents—such as PDFs, TIFFs, or other supported formats—and split them into individual Batch Pages within a Batch Folder. This enables page-level processing, parallelization, and granular document management.

Key features of the Split Pages activity include:

  • Supports a wide range of file types, including PDFs, TIFFs, and other image formats.
  • Creates a separate Batch Page for each page or image frame in the input file.
  • Offers options to filter which pages are extracted, limit the maximum number of pages, and control overwrite behavior.
  • Can replicate PDF bookmark hierarchies as subfolders, preserving logical document structure.
  • Optionally deletes the original attachment after splitting to conserve storage.

By splitting documents into pages, Grooper enables downstream activities—such as Image Processing and Recognize—to operate at the page level, improving performance and flexibility.

Document Separation: Organizing Pages into Documents

After Batch Pages are created, they are often "loose"—not yet organized into documents. The process of grouping these pages into logical documents is called document separation.

The Separate activity is responsible for this step. It analyzes a sequence of loose Batch Pages (either within a Batch Folder or at the Batch root) and groups them into new Batch Folders, each representing a distinct document. Separation logic can be based on:

  • Barcodes or patch codes
  • Blank pages
  • Classification results
  • Fixed page counts
  • Custom rules or profiles

For example, after scanning a stack of documents, all pages will initially appear as loose pages in a Batch. Running the Separate activity will group these pages Batch Pages into Batch Folders, each folder representing a single document.

Batch Pages in Classification and Extraction

Once Batch Pages are organized into Batch Folders (documents), Grooper can perform classification and data extraction at the document level. Each Batch Folder (containing its Batch Pages) is evaluated and assigned a Content Type (such as a Document Type from the Content Model).

However, Batch Pages play an important roll in this process. Batch Pages provide the raw content—images, OCR results, and layout data—that classification logic uses to assign a Document Type and extraction logic uses to populate data fields, sections, and tables defined in the Data Model.

Summary: Batch Pages

  • Batch Pages are the fundamental building blocks of documents in Grooper.
  • The Split Pages activity creates Batch Pages from multi-page files, enabling granular processing.
  • The Separation activity organizes loose Batch Pages into documents (Batch Folders).
  • Batch Pages provide page-level content that used in classification, data extraction, review, and export.

Summary

  • A Batch is a container for documents and pages processed together in Grooper.
  • A Batch Folder represents a document or group of documents within a Batch.
  • Batch Folders organize pages, store attachments, and manage document-level data and metadata.
  • The hierarchy of Batch Folders and Batch Pages mirrors the logical structure of the documents being processed.
  • Batch Folders are central to document separation, classification, data extraction, review, and export in Grooper.

For more information, see the documentation for Batch, Batch Folder, Batch Page, Batch Process, and Content Model.