2023.1:Batch (Node Type)

From Grooper Wiki

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252023.120232.80

inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). This contains one or more Batches of sample documents.

Glossary

Activity Processing: Activity Processing is a Grooper Service that executes Activities assigned to edit_document Batch Process Steps in a settings Batch Process. This allows Grooper to automate Batch Steps that do not require a human operator.

Activity Processing: Activity Processing is the execution of a sequence of configured tasks which are performed within a settings Batch Process to transform raw data from documents into structured and actionable information. Tasks are defined by Grooper Activities, configurated to perform document classification, extraction, or data enhancement.

Activity: Grooper Activities define specific document processing operations done to a inventory_2 Batch, folder Batch Folder, or contract Batch Page. In a settings Batch Process, each edit_document Batch Process Step executes a single Activity (determined by the step's "Activity" property).

  • Batch Process Steps are frequently referred by the name of their configured Activity followed by the word "step". For example: "Classify step".

Batch Folder: The folder Batch Folder is an organizational unit within a inventory_2 Batch, allowing for a structured approach to managing and processing a collection of documents. Batch Folder nodes serve two purposes in a Batch. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other Batch Folders and/or contract Batch Page nodes as children.

  • Batch Folders are frequently referred to simply as "documents" or "folders" depending on how they are used in the Batch.

Batch Page: contract Batch Page nodes represent individual pages within a inventory_2 Batch. Batch Pages are created in one of two ways: (1) When images are scanned into a Batch using the Scan Viewer. (2) Or, when split from a PDF or TIFF file using the Split Pages activity.

  • Batch Pages are frequently referred to simply as "pages".

Batch Process: settings Batch Process nodes are crucial components in Grooper's architecture. A Batch Process is the step-by-step processing instructions given to a inventory_2 Batch. Each step is comprised of a "Code Activity" or a Review activity. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Processes by themselves do nothing. Instead, they execute edit_document Batch Process Steps which are added as children nodes.
  • A Batch Process is often referred to as simply a "process".

Batch: inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

Classify: unknown_document Classify is an Activity that "classifies" folder Batch Folders in a inventory_2 Batch by assigning them a description Document Type.

  • Classification is key to Grooper's document processing. It affects how data is extracted from a document (during the Extract activity) and how Behaviors are applied.
  • Classification logic is controlled by a Content Model's "Classify Method". These methods include using text patterns, previously trained document examples, and Label Sets to identify documents.

Export: output Export is an Activity that transfers documents and extracted information to external file systems and content management systems, completing the data processing workflow.

OCR: OCR is stands for Optical Character Recognition. It allows text on paper documents to be digitized, in order to be searched or edited by other software applications. OCR converts typed or printed text from digital images of physical documents into machine readable, encoded text.

Recognize: format_letter_spacing_wide Recognize is an Activity that obtains machine-readable text from contract Batch Pages and folder Batch Folders. When properly configured with an library_booksOCR Profile, Recognize will selectively perform OCR for images and native-text extraction for digital text in PDFs. Recognize can also reference an perm_mediaIP Profile to collect "layout data" like lines, checkboxes, and barcodes. Other Activities then use this machine-readable text and layout data for document analysis and data extraction.

Review Queue: person_play Review Queues help organize and filter human-performed Review activity tasks. User groups are assigned to each Review Queue, which is then set either on a settings Batch Process or a Review step. Based on a user's membership in Review Queues, this will affect how inventory_2 Batches are distributed in the Batches page and how Review tasks are distributed in the Tasks page.

Review: person_search Review is an Activity that allows user attended review of Grooper's results. This allows human operators to validate processed contract Batch Page and folder Batch Folder content using specialized user interfaces called "Viewers". Different kinds of Viewers assist users in reviewing Grooper's image processing, document classification, data extraction and operating document scanners.

Scope: The Scope property of a edit_document Batch Process Step, as it relates to an Activity, determines at which level in a inventory_2 Batch hierarchy the Activity runs.

Separation: Separation is the process of taking an unorganized inventory_2 Batch of loose contract Batch Pages and organizing them into documents represented by folder Batch Folders in Grooper. This is done so Grooper can later assign a description Document Type to each document folder in a process known as "classification".

Service: Grooper Services are various executable programs that run as a Windows Service to facilitate Grooper processing. Service instances are installed, configured, started and stopped using Grooper Command Console (or in older Grooper versions, Grooper Config).

Test Batch: "Test Batch" is a specialized Import Provider designed to facilitate the import of content from an existing inventory_2 Batch in the test environment. This provider is most commonly used for testing, development, and validation scenarios, and is not intended for production use.

  • Looking for information on "production" vs "test" Batches in Grooper? See here.

About

What is a Batch?

A Batch is an object in Grooper that contains the documents brought into Grooper via scanning or import.

There are three components to a batch:

  1. The Batch itself
  2. Batch Folders
  3. Batch Pages

Batch objects in Grooper contain two child objects:

  • The root Batch Folder, containing a hierarchy of Batch Folders and Batch Pages.
  • A read-only Batch Process, containing the list of processing instructions for the Batch Folders and Batch Pages

Below is an example of a Batch.

  1. The Test Batch is located here on the node tree.
  2. The Test Batch has two child objects, the root Batch Folder and a read-only Batch Process.


  1. If we open up the root Batch Folder...
  2. We can see the Batch Pages. We can also view the hierarchy of Batch Folders and Batch Pages here after Separation.


  1. If we click on the Batch object in the node tree...
  2. We can click on the "Viewer" tab to see the "Batch Viewer".
  3. Here we can see the contents of the Batch.


  1. At the top level is the Batch itself.
  2. Here we have the Batch Pages in the Batch.
  3. Here we can see the Batch and Batch Pages as objects in the node tree.


  1. Through the process of separation, Batch Pages will be separated into document folders.
  2. These Batch Pages are at the Scope of Page
  3. Here we can see the hierarchy of Batch Folders and Batch Pages within the Batch in the node tree.

Folder Levels

When scanning paper into Grooper, the Batch Pages come in one at a time and there is no differentiation between one document or another. As part of Grooper's workflow, Batch Pages are normally separated into Batch Folders (each folder containing one complete document) so that Grooper knows where one document begins and ends.

A Batch can be as simple as a series of Batch Pages. A Batch may also consist of a complex hierarchy of Batch Folders.

For certain activities, it is important to tell Grooper which Folder Level the Activity needs to be executed on.

  • The Scope of Batch refers to the top most Batch Folder. All Batch Folders and Batch Pages exist within this main Batch Folder. While never referred to as "Level 0" anywhere in Grooper, considering 0 indexing, it may be easy to think of it as such.
  • The first set of Batch Folders under the main Batch Folder is considered Folder Level 1.
  • A Batch Folder that is a child of a Batch Folder at Folder Level 1 is considered at Folder Level 2. A Batch Folder that is a child of a Batch Folder at Folder Level 2 is considered at Folder Level 3, etc.
  • A Batch Page is always considered to be at the Scope of Page.
    • Sometimes you will have Batch Pages inside Batch Folders at different Folder Levels in the Scope of the Batch, but you always want to run certain activities on all Bath Pages. You would set those activities to a Scope of Page.

For example, OCR text is obtained from images by running a Recognize Activity at the Scope of Page. Document classification is done by running a Classify Activity at the Scope of Folder. Export is an example of an Activity that could possibly be run at either a Scope of Batch or Folder.




Production vs Test Batches

Batches exist in two environments:

  • "Production"
    • Stored in the "Batches > Production" branch of the Grooper node tree.
    • The Batch is contained in a folder according to the Batch Process being applied to the Batch.
  • "Test"
    • Stored in the "Batches > Test" branch of the Grooper node tree.
  1. In the image below you can see a Batch within the "Production" folder.
  2. A Batch is also seen existing within the "Test" folder.


So, what are the differences between a "Test" and "Production" Batch?

  • "Test Batches": These are only visible to Grooper Design users. They are used to test extraction and Batch Process steps being designed. These Batches are not exposed to Activity Processing Services.
  • "Production Batches": These Batches are visible from the "Batches" page and. Production Batches are "visible" to Activity Processing Services and are actively run through a Batch Process that has previously been designed and published.

Both "Test" and "Production" Batches can be created and processed from the Grooper Design page by Design users. However, typically, production Batches are created and processed using the "Batches" page. This also means that different users that are part of different Review Queues can affect Batch workflow.

Test Batches, however, will only be seen by "Design" users.

  1. This is the "Design" page icon
  2. This is the "Batches" page icon

How To

Creating a Test Batch

Creating a Test Batch is relatively simple. First, you must create an empty Batch. Then you can just drag and drop the files from your computer into the Batch.

  1. To add a Test Batch, right-click on the "Test" folder in the node tree.
  2. Hover over "Add" and then click on "Batch..."


  1. In the "Add" dialog box give your Batch a name.
  2. Click "EXECUTE" to create the Batch.


  1. Select the newly created Batch.
  2. Select the "Viewer" tab.
  3. Now you can see that we have an empty Batch. All you need to do now is to drag and drop a file from your computer to this area and your file(s) will be added to Grooper.


  1. A PDF file has been "drag-and-dropped" onto the Batch Folder of the Batch, thus creating a Batch Folder with the PDF as an attachment.

Creating a Production Batch

There are two ways to create a Production Batch in Grooper:

  1. Scanned content: Scanned documents are brought into Grooper from the "Batches" page. For more information on scanning documents into Grooper, see our Desktop Scanning in Grooper article.
  2. Imported content: Importing digital content into Grooper happens in the "Imports" Page. An article detailing how to import Batches via the "Imports" page will be coming soon.