2023.1:Control Sheet Separation (Separation Provider): Difference between revisions

Revision as of 08:18, 28 August 2024

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

2025

2023.1

2023

Control Sheet Separation is a Separation Provider that uses Grooper document_scanner Control Sheets to separate documents.

Grooper Control Sheets are special pages that can be printed out and placed in between documents before scanning. These sheets use patch code barcodes to direct Grooper to perform certain actions, such as creating new Batch Folders in a Batch.

The Control Sheet Separation provider will then create a new Batch Folder (and thus new document) every time it encounters a Control Sheet (if it is configured to do so). All subsequent Batch Pages are placed in that folder until it encounters a new Control Sheet, at which point a new folder is created. The process repeats until the end of the Batch.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). The first contains a Project with resources used in examples throughout this article. The second contains one or more Batches of sample documents.

About

Control Sheet Separation utilizes Control Sheets to determine how folders are created during document separation.

Control Sheets are printable pages used to automate document separation. These pages are printed and placed at the beginning of a new batch or document in a stack of loose pages before they are scanned in to Grooper. They contain specialized barcodes called "patch codes", which Grooper can read and process its instructions.

Below is an example of a Grooper Control Sheet:

Control Sheets can perform three main actions:

Create a Batch Folder in a Batch and add subsequent Batch Pages to that folder.
Assign that Batch Folder's folder level in the Batch's folder hierarchy.
Assign that Batch Folder's Content Type property.
- Effectively, this can classify the document, assigning that folder a Document Type of a Content Model.

How it works: Basic Separation

So, how does Control Sheet Separation actually work?

The Control Sheet Separation provider and Control Sheets in general are designed to be used while scanning physical paper documents. These sheets are printed from Grooper and placed before the first page of each new document.

In the screenshot below, you can see a Batch that was scanned into Grooper and processed no further. There are just loose pages in the Batch. Document separation has not occurred yet.

Imagine the Batch as a physical stack of papers. We have placed a Control Sheet at the beginning of each new document in the stack. This will tell Grooper where to separate.

After separation, Grooper will create folders and separate where it finds a Control Sheet (see the below screenshot).

How it works: Separation and Classification

You can also Classify your documents while you are Separating them when using Control Sheet Separation. All it takes is configuring a few settings, and then making sure your printed Control Sheets are placed in the correct order in your stack of documents before scanning.

When setting up Control Sheets to Separate and Classify documents, we need to create multiple Control Sheets: one for each Document Type in our Content Model.
In addition to the Create Folder and the Folder Level properties, we need to set the Content Type property. Just click on the hamburger icon to access the drop down menu.
In the drop down menu, navigate through the Content Models and select the Document Type you want to apply to the Control Sheet.

In the screenshot below, we have placed a Control Sheet before the start of each document in the Batch according to that document's intended Document Type. You will need to print out your Control Sheets after they have been created and place them in your physical stack of documents in a similar fashion.
The Document Type should appear on the Control Sheet under the folder level indicating where Grooper will create the new folder.

Now when the documents are separated, not only will they be separated into the different folders, but those folders will also automatically be classified according to the Control Sheets that were placed before each document.

How To

Creating a New Control Sheet

In the node tree in your Project right click on the folder where you want to add a Control Sheet.
Hover over "Add" in the menu that pops up, then click on "Control Sheet...".

In the "Add" window that pops up, enter in a name for your Control Sheet.
Click the "EXECUTE" button at the top of the "Add" window to create the Control Sheet.

You should now see the new Control Sheet as an object in the node tree.
With the Control Sheet object selected, you can see a preview of the sheet on the right.

You can edit the Control Sheet properties based on your needs. In this example, we have turned the Create Folder property to "True" and set the Folder Level to 1.
On the Control Sheet preview, we can see there is now an icon with the words "new folder Level 1" to tell us (the human) what Grooper will do when it encounters this Control Sheet.
At the bottom of the control Sheet preview, it might be difficult to tell but the barcode has now changed. The barcode is what Grooper actually reads to understand what to do when it encounters the Control Sheet.

Once you are satisfied with your Control Sheet, make sure to save out your properties by clicking the save icon at the top of the "Properties" section.
Now you can click on the printer icon located in the top right of the preview panel to print the Control Sheet. You can then place the paper copies of the Control Sheets within the batch of documents you will be scanning into Grooper based on where you want Separation to occur.

Glossary

Batch Folder: The folder Batch Folder is an organizational unit within a inventory_2 Batch, allowing for a structured approach to managing and processing a collection of documents. Batch Folder nodes serve two purposes in a Batch. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other Batch Folders and/or contract Batch Page nodes as children.

Batch Folders are frequently referred to simply as "documents" or "folders" depending on how they are used in the Batch.

Batch Page: contract Batch Page nodes represent individual pages within a inventory_2 Batch. Batch Pages are created in one of two ways: (1) When images are scanned into a Batch using the Scan Viewer. (2) Or, when split from a PDF or TIFF file using the Split Pages activity.

Batch Pages are frequently referred to simply as "pages".

Batch Process: settings Batch Process nodes are crucial components in Grooper's architecture. A Batch Process is the step-by-step processing instructions given to a inventory_2 Batch. Each step is comprised of a "Code Activity" or a Review activity. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

Batch Processes by themselves do nothing. Instead, they execute edit_document Batch Process Steps which are added as children nodes.
A Batch Process is often referred to as simply a "process".

Batch: inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

Classification: Classification is the process of identifying and organizing documents into categorical types based on their content or layout. Classification is key for efficient document management and data extraction workflows. Grooper has different methods for classifying documents. These include methods that use machine learning and text pattern recognition. In a Grooper Batch Process, the Classify Activity will assign a Content Type to a folder Batch Folder.

Classify: unknown_document Classify is an Activity that "classifies" folder Batch Folders in a inventory_2 Batch by assigning them a description Document Type.

Classification is key to Grooper's document processing. It affects how data is extracted from a document (during the Extract activity) and how Behaviors are applied.
Classification logic is controlled by a Content Model's "Classify Method". These methods include using text patterns, previously trained document examples, and Label Sets to identify documents.

Content Model: stacks Content Model nodes define a classification taxonomy for document sets in Grooper. This taxonomy is defined by the collections_bookmark Content Categories and description Document Types they contain. Content Models serve as the root of a Content Type hierarchy, which defines Data Element inheritance and Behavior inheritance. Content Models are crucial for organizing documents for data extraction and more.

Content Type: Content Types are a class of node types used used to classify folder Batch Folders. They represent categories of documents (stacks Content Models and collections_bookmark Content Categories) or distinct types of documents (description Document Types). Content Types serve an important role in defining Data Elements and Behaviors that apply to a document.

Control Sheet Separation: Control Sheet Separation is a Separation Provider that uses Grooper document_scanner Control Sheets to separate documents.

Document Type: description Document Type nodes represent a distinct type of document, such as an invoice or a contract. Document Types are created as child nodes of a stacks Content Model or a collections_bookmark Content Category. They serve three primary purposes:

They are used to classify documents. Documents are considered "classified" when the folder Batch Folder is assigned a Content Type (most typically, a Document Type).
The Document Type's data_table Data Model defines the Data Elements extracted by the Extract activity (including any Data Elements inherited from parent Content Types).
The Document Type defines all "Behaviors" that apply (whether from the Document Type's Behavior settings or those inherited from a parent Content Type).

Project: package_2 Projects are the primary containers for configuration nodes within Grooper. The Project is where various processing objects such as stacks Content Models, settings Batch Processes, profile objects are stored. This makes resources easier to manage, easier to save, and simplifies how node references are made in a Grooper Repository.

Separate: insert_page_break Separate is an Activity that sorts contract Batch Pages into individual folder Batch Folders. This distinguishes "loose pages" from the documents formed by those pages. Once loose pages are separated into Batch Folder documents, they can be further processed by unknown_document Classify, export_notes Extract, output Export and other Activities that need to run on the folder (i.e. document) level.

Separation Provider: Separation Providers divide a sequence of contract Batch Pages into logical documents. They define the rules and criteria for grouping pages and determining document boundaries. Separation is a foundational step in document workflows, transforming a continuous stream of scanned or imported pages into discrete, classified documents ready for extraction, validation, and export.

The insert_page_break Separate activity's Provider setting specifies which Separation Provider is applied to the selected inventory_2 Batch Scope.
One folder Batch Folder is created for each span of pages identified as a document by the Separation Provider.

Separation: Separation is the process of taking an unorganized inventory_2 Batch of loose contract Batch Pages and organizing them into documents represented by folder Batch Folders in Grooper. This is done so Grooper can later assign a description Document Type to each document folder in a process known as "classification".