2.90:Separation Provider (Property): Difference between revisions

From Grooper Wiki
Created page with "Separation Providers are the available methods Grooper has to separate batch pages into documents and batch folders. Each provider has its own configurable properties. Changi..."
 
m Dgreenwood moved page 2.90:Separation Provider to 2.90:Separation Provider (Property) without leaving a redirect
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
Separation Providers are the available methods Grooper has to separate batch pages into documents and batch folders. Each provider has its own configurable properties.  Changing these properties will change the criteria to separate pages into documents.
{{AutoVersion}}
<section begin="glossary" />
<blockquote>
'''''Separation Providers''''' are the available methods Grooper has to [[Separation|separate]] pages into document folders.  
</blockquote>
<section end="glossary" />
Each provider has its own configurable properties.  Changing these properties will change the criteria to separate pages into documents.


The following Separation Providers are available in Grooper:
== About ==


* [[Change in Value Separation]]
'''''Separation Providers''''' establish the logic used to create "separation points" or "binding points" between loose pages.  There are a multitude of methods to separate pages into document folders in Grooper.  Each '''''Separation Provider''''' has its own criteria for determining where these separation points occur within a batch.  However the basic operation is same for all of them.
* [[Control Sheet Separation]]
 
* [[EPI Separation]]
# Determine what page is the first page of a document.
* [[ESP Auto Separation]]
#* This is the "separation point" or "binding point".
* [[Event-Based Separation]]
#* Generally, the first page in a batch is always the first separation point.
* [[Multi Separator]]
# Insert a '''Batch Folder''' into the '''Batch'''.
* [[Pattern-Based Separation]]
# Move pages into that folder until another first page of a document is encountered.
* [[Undo Separation]]
# Insert a new '''Batch Folder''' into the '''Batch'''
#* This is the next "separation point" or "binding point".
# Move pages into that folder until another first page of a document is encountered.
# Repeat until the end of the '''Batch'''.
 
 
[[File:Separation-provider-07.png|center|1200px]]
 
 
The '''''Separation Provider''''' is selected and configured using the '''''Provider''''' property of the '''Separate''' activity or a '''Separation Profile'''.
 
{|cellpadding=10 cellspacing=5
|style="width:40%" valign=top|
In a '''Batch Process''', you will set the '''''Separation Provider''''' using the '''''Provider''''' property of a '''Separate''' step.
 
# Select a '''Batch Process'''
# Add a '''Batch Step''' and assign it the '''Separate''' activity type (or select the '''Separate''' step in the '''Batch Process''' if already present).
# Use the '''''Provider''''' property to select a '''''Separation Provider'''''.
|
[[File:Separation-provider-01.png]]
|}
 
{|cellpadding=10 cellspacing=5
|style="width:40%" valign=top|
A '''Separation Profile''' is a way to configure a '''''Separation Provider''''' and save it to an object that can be reused multiple times in multiple '''Batch Processes'''.  Instead of configuring on the '''Separate''' step itself, you can reference a '''Separation Profile''' with those configurations already set.  Either way, separation's configuration is the same.  '''Separation  Profiles''' just allow you to save these settings outside of a single '''Batch Process'''.
 
# You add and select a '''Separation Profile''' using the '''Separation Profiles''' folder of the '''Global Resources''' folder.
# Select a '''Separation Profile'''
# Use the '''''Provider''''' property to select a '''''Separation Provider'''''.
|
[[File:Separation-provider-02.png]]
|}
 
== Provider Types ==
 
There are eight total '''''Separation Providers'''''.
 
* ''[[Control Sheet Separation]]'' - New folders are created using Grooper [[Control Sheet]]s.
* ''[[Event-Based Separation]]'' - The '''Batch''' is separated using one or more "'''''Separation Events'''''".  Each '''''Separation Event''''' triggers the creation of a new folder.  The events are as follows:
** ''Blank Page'' - A blank page will trigger a new folder.
** ''Barcode'' - A scanned barcode will trigger a new folder.
** ''Content Type'' - This '''''Separation Event''''' uses [[Lexical]] or [[Visual (Classification Method)|Visual]] training examples to trigger folder creation.  Whenever a page confidently matches a trained example document's first page, a new folder is created.
** ''Page Count'' - This is for fixed page separation.  A new folder is created by a set number of pages for a document.
** ''Shape'' - A new folder is created every time a "shape feature" is detected.  Shape features are detected using a '''[[Shape Detection]]''' IP Command from an '''IP Profile'''.
* ''[[Pattern-Based Separation]]'' - Folder creation is determined by an extractor.  If the extractor returns a result on a page, a new folder is created.  Subsequent pages are placed in that folder until another page produces a result.
* ''[[Change in Value Separation]]'' - This provider is similar to ''Pattern-Based Separation'' in that an extractor also determines folder creation.  However, folders are ''only'' created when the extractor's result ''changes''.
* ''[[EPI Separation]]'' - Separation occurs using embedded page information (EPI) supplied by an extractor.  This provider is helpful for separating documents whose page numbers are extractable.
* ''[[ESP Auto Separation]]'' - ESP automatic separation performs document separation with multiple operations working together, using [[Lexical]] training examples in a '''Content Model''', the '''''Separation''''' properties of '''Document Types''', embedded page information, and merging designated "attachment" '''Document Types''' to "host" '''Document Types'''. 
** Furthermore, since ''ESP Auto Separation'' uses a '''Content Model's''' training data (as well as classification [[Rules Based (Classification Method)|rules]] set on its '''Document Types'''), it both separates ''and'' classifies documents during the '''Separate''' activity.
* ''[[Multi Separator]]'' - Performs separation using multiple separation providers.
* ''[[Undo Separation]]'' - The anti-separator!  As its name implies this provider "undoes" separation, removing all '''Batch Folders''' in a '''Batch''' or '''Batch Folder''' level in the folder hierarchy, leaving only loose pages.
 
== Real Time vs Lexical Providers ==
 
There are two different categories these '''''Separation Providers''''' can be placed in:
* Real Time
* Lexical
 
The main distinction between these two is the "Lexical" providers require machine readable text data.  They use data extractors (using regular expression pattern matching) to determine the separation points in a '''Batch'''.  For scanned page images, [[OCR]] obtains this data.  Digital documents, such as PDFs, have machine readable text encoded in the file, but it needs to be extracted in a way Grooper can use it.  Either way, the documents need to be conditioned with a '''Recognize''' step in a '''Batch Process''' to obtain this text data.
 
The "Real Time" providers do ''not'' require text data in order to separate documents.  They use visual page information or fixed page numbers to find the separation points in a '''Batch'''.  This means these providers can separate documents in real time during scanning.  Since no extra document conditioning is required, there is no need for a '''Separate''' step in a '''Batch Process'''. 
 
{|cellpadding=10 cellspacing=5
|style="width:40%" valign=top|
# Instead, a '''''Separation Profile''''' can be assigned from the '''Scan''' client.
# After pressing the "Scan" button to bring pages into Grooper...
|
[[File:Separation-provider-03.png]]
|-
|valign=top|
As long as the '''''Separation Provider''''' used is a Real Time provider, the documents will separate as they are scanned in.  Folders will be inserted according to the '''Separation Profile's''' configuration.  Here, using the ''Control Sheet Separation'' provider.
* Note: This does not mean you ''can't'' use Real Time '''''Separation Providers''''' in a '''Separate''' step.  You just have the option of performing separation during scanning using them.
|
[[File:Separation-provider-04.png]]
|}
 
The following '''''Separation Providers''''' are "Real Time" providers:
* ''[[Control Sheet Separation]]''
* ''[[Event-Based Separation]]''
 
The following '''''Separation Providers''''' are "Lexical" providers:
* ''[[Pattern-Based Separation]]''
* ''[[Change in Value Separation]]''
* ''[[EPI Separation]]''
* ''[[ESP Auto Separation]]''
 
[[Category:Articles]]
[[Category:Stub]]

Latest revision as of 15:17, 18 April 2024

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252023.12.90

Separation Providers are the available methods Grooper has to separate pages into document folders.

Each provider has its own configurable properties. Changing these properties will change the criteria to separate pages into documents.

About

Separation Providers establish the logic used to create "separation points" or "binding points" between loose pages. There are a multitude of methods to separate pages into document folders in Grooper. Each Separation Provider has its own criteria for determining where these separation points occur within a batch. However the basic operation is same for all of them.

  1. Determine what page is the first page of a document.
    • This is the "separation point" or "binding point".
    • Generally, the first page in a batch is always the first separation point.
  2. Insert a Batch Folder into the Batch.
  3. Move pages into that folder until another first page of a document is encountered.
  4. Insert a new Batch Folder into the Batch
    • This is the next "separation point" or "binding point".
  5. Move pages into that folder until another first page of a document is encountered.
  6. Repeat until the end of the Batch.



The Separation Provider is selected and configured using the Provider property of the Separate activity or a Separation Profile.

In a Batch Process, you will set the Separation Provider using the Provider property of a Separate step.

  1. Select a Batch Process
  2. Add a Batch Step and assign it the Separate activity type (or select the Separate step in the Batch Process if already present).
  3. Use the Provider property to select a Separation Provider.

A Separation Profile is a way to configure a Separation Provider and save it to an object that can be reused multiple times in multiple Batch Processes. Instead of configuring on the Separate step itself, you can reference a Separation Profile with those configurations already set. Either way, separation's configuration is the same. Separation Profiles just allow you to save these settings outside of a single Batch Process.

  1. You add and select a Separation Profile using the Separation Profiles folder of the Global Resources folder.
  2. Select a Separation Profile
  3. Use the Provider property to select a Separation Provider.

Provider Types

There are eight total Separation Providers.

  • Control Sheet Separation - New folders are created using Grooper Control Sheets.
  • Event-Based Separation - The Batch is separated using one or more "Separation Events". Each Separation Event triggers the creation of a new folder. The events are as follows:
    • Blank Page - A blank page will trigger a new folder.
    • Barcode - A scanned barcode will trigger a new folder.
    • Content Type - This Separation Event uses Lexical or Visual training examples to trigger folder creation. Whenever a page confidently matches a trained example document's first page, a new folder is created.
    • Page Count - This is for fixed page separation. A new folder is created by a set number of pages for a document.
    • Shape - A new folder is created every time a "shape feature" is detected. Shape features are detected using a Shape Detection IP Command from an IP Profile.
  • Pattern-Based Separation - Folder creation is determined by an extractor. If the extractor returns a result on a page, a new folder is created. Subsequent pages are placed in that folder until another page produces a result.
  • Change in Value Separation - This provider is similar to Pattern-Based Separation in that an extractor also determines folder creation. However, folders are only created when the extractor's result changes.
  • EPI Separation - Separation occurs using embedded page information (EPI) supplied by an extractor. This provider is helpful for separating documents whose page numbers are extractable.
  • ESP Auto Separation - ESP automatic separation performs document separation with multiple operations working together, using Lexical training examples in a Content Model, the Separation properties of Document Types, embedded page information, and merging designated "attachment" Document Types to "host" Document Types.
    • Furthermore, since ESP Auto Separation uses a Content Model's training data (as well as classification rules set on its Document Types), it both separates and classifies documents during the Separate activity.
  • Multi Separator - Performs separation using multiple separation providers.
  • Undo Separation - The anti-separator! As its name implies this provider "undoes" separation, removing all Batch Folders in a Batch or Batch Folder level in the folder hierarchy, leaving only loose pages.

Real Time vs Lexical Providers

There are two different categories these Separation Providers can be placed in:

  • Real Time
  • Lexical

The main distinction between these two is the "Lexical" providers require machine readable text data. They use data extractors (using regular expression pattern matching) to determine the separation points in a Batch. For scanned page images, OCR obtains this data. Digital documents, such as PDFs, have machine readable text encoded in the file, but it needs to be extracted in a way Grooper can use it. Either way, the documents need to be conditioned with a Recognize step in a Batch Process to obtain this text data.

The "Real Time" providers do not require text data in order to separate documents. They use visual page information or fixed page numbers to find the separation points in a Batch. This means these providers can separate documents in real time during scanning. Since no extra document conditioning is required, there is no need for a Separate step in a Batch Process.

  1. Instead, a Separation Profile can be assigned from the Scan client.
  2. After pressing the "Scan" button to bring pages into Grooper...

As long as the Separation Provider used is a Real Time provider, the documents will separate as they are scanned in. Folders will be inserted according to the Separation Profile's configuration. Here, using the Control Sheet Separation provider.

  • Note: This does not mean you can't use Real Time Separation Providers in a Separate step. You just have the option of performing separation during scanning using them.

The following Separation Providers are "Real Time" providers:

The following Separation Providers are "Lexical" providers: