Separation Provider

From Grooper Wiki
Jump to navigation Jump to search

Separation Providers are the available methods Grooper has to separate pages into document folders.

Each provider has its own configurable properties. Changing these properties will change the criteria to separate pages into documents.


About

Separation Providers establish the logic used to create "separation points" or "binding points" between loose pages. There are a multitude of methods to separate pages into document folders in Grooper. Each Separation Provider has its own criteria for determining where these separation points occur within a batch. However the basic operation is same for all of them.

  1. Determine what page is the first page of a document.
    • This is the "separation point" or "binding point".
    • Generally, the first page in a batch is always the first separation point.
  2. Insert a Batch Folder into the Batch.
  3. Move pages into that folder until another first page of a document is encountered.
  4. Insert a new Batch Folder into the Batch
    • This is the next "separation point" or "binding point".
  5. Move pages into that folder until another first page of a document is encountered.
  6. Repeat until the end of the Batch.


Separation-provider-07.png


The Separation Provider is selected and configured using the Provider property of the Separate activity or a Separation Profile.

In a Batch Process, you will set the Separation Provider using the Provider property of a Separate step.

  1. Select a Batch Process
  2. Add a Batch Step and assign it the Separate activity type (or select the Separate step in the Batch Process if already present).
  3. Use the Provider property to select a Separation Provider.

Separation-provider-01.png

A Separation Profile is a way to configure a Separation Provider and save it to an object that can be reused multiple times in multiple Batch Processes. Instead of configuring on the Separate step itself, you can reference a Separation Profile with those configurations already set. Either way, separation's configuration is the same. Separation Profiles just allow you to save these settings outside of a single Batch Process.

  1. You add and select a Separation Profile using the Separation Profiles folder of the Global Resources folder.
  2. Select a Separation Profile
  3. Use the Provider property to select a Separation Provider.

Separation-provider-02.png

Provider Types

There are eight total Separation Providers.

  • Control Sheet Separation - New folders are created using Grooper Control Sheets.
  • Event-Based Separation - The Batch is separated using one or more "Separation Events". Each Separation Event triggers the creation of a new folder. The events are as follows:
    • Blank Page - A blank page will trigger a new folder.
    • Barcode - A scanned barcode will trigger a new folder.
    • Content Type - This Separation Event uses Lexical or Visual training examples to trigger folder creation. Whenever a page confidently matches a trained example document's first page, a new folder is created.
    • Page Count - This is for fixed page separation. A new folder is created by a set number of pages for a document.
    • Shape - A new folder is created every time a "shape feature" is detected. Shape features are detected using a Shape Detection IP Command from an IP Profile.
  • Pattern-Based Separation - Folder creation is determined by an extractor. If the extractor returns a result on a page, a new folder is created. Subsequent pages are placed in that folder until another page produces a result.
  • Change in Value Separation - This provider is similar to Pattern-Based Separation in that an extractor also determines folder creation. However, folders are only created when the extractor's result changes.
  • EPI Separation - Separation occurs using embedded page information (EPI) supplied by an extractor. This provider is helpful for separating documents whose page numbers are extractable.
  • ESP Auto Separation - ESP automatic separation performs document separation with multiple operations working together, using Lexical training examples in a Content Model, the Separation properties of Document Types, embedded page information, and merging designated "attachment" Document Types to "host" Document Types.
    • Furthermore, since ESP Auto Separation uses a Content Model's training data (as well as classification rules set on its Document Types), it both separates and classifies documents during the Separate activity.
  • Multi Separator - Performs separation using multiple separation providers.
  • Undo Separation - The anti-separator! As its name implies this provider "undoes" separation, removing all Batch Folders in a Batch or Batch Folder level in the folder hierarchy, leaving only loose pages.

Real Time vs Lexical Providers

There are two different categories these Separation Providers can be placed in:

  • Real Time
  • Lexical

The main distinction between these two is the "Lexical" providers require machine readable text data. They use data extractors (using regular expression pattern matching) to determine the separation points in a Batch. For scanned page images, OCR obtains this data. Digital documents, such as PDFs, have machine readable text encoded in the file, but it needs to be extracted in a way Grooper can use it. Either way, the documents need to be conditioned with a Recognize step in a Batch Process to obtain this text data.

The "Real Time" providers do not require text data in order to separate documents. They use visual page information or fixed page numbers to find the separation points in a Batch. This means these providers can separate documents in real time during scanning. Since no extra document conditioning is required, there is no need for a Separate step in a Batch Process.

  1. Instead, a Separation Profile can be assigned from the Scan client.
  2. After pressing the "Scan" button to bring pages into Grooper...

Separation-provider-03.png

As long as the Separation Provider used is a Real Time provider, the documents will separate as they are scanned in. Folders will be inserted according to the Separation Profile's configuration. Here, using the Control Sheet Separation provider.

  • Note: This does not mean you can't use Real Time Separation Providers in a Separate step. You just have the option of performing separation during scanning using them.

Separation-provider-04.png

The following Separation Providers are "Real Time" providers:

The following Separation Providers are "Lexical" providers: