2.90:Separation Provider (Property): Difference between revisions
Configadmin (talk | contribs) Created page with "Separation Providers are the available methods Grooper has to separate batch pages into documents and batch folders. Each provider has its own configurable properties. Changi..." |
Dgreenwood (talk | contribs) No edit summary |
||
Line 1: | Line 1: | ||
Separation Providers are the available methods Grooper has to separate | <blockquote style="font-size:14pt"> | ||
'''''Separation Providers''''' are the available methods Grooper has to [[Separation|separate]] pages into document folders. Each provider has its own configurable properties. Changing these properties will change the criteria to separate pages into documents. | |||
</blockquote> | |||
There are a multitude of methods to separate pages into document folders in Grooper. How documents are separated is controlled by the '''''Separation Provider''''' property of the '''Separate''' activity or a '''Separation Profile'''. | |||
{|cellpadding=10 cellspacing=5 | |||
* [[Control Sheet Separation]] | |style="width:40%" valign=top| | ||
* [[EPI Separation]] | In a '''Batch Process''', you will set the '''''Separation Provider''''' using the '''''Provider''''' property of a '''Separate''' step. | ||
* [[ESP Auto Separation]] | |||
* [[ | # Select a '''Batch Process''' | ||
* [[Multi Separator]] | # Add a '''Batch Step''' and assign it the '''Separate''' activity type (or select the '''Separate''' step in the '''Batch Process''' if already present). | ||
* [[Pattern-Based Separation]] | # Use the '''''Provider''''' property to select a '''''Separation Provider'''''. | ||
* [[ | | | ||
[[File:Separation-provider-01.png]] | |||
|} | |||
{|cellpadding=10 cellspacing=5 | |||
|style="width:40%" valign=top| | |||
A '''Separation Profile''' is a way to configure a '''''Separation Provider''''' and save it to an object that can be reused multiple times in multiple '''Batch Processes'''. Instead of configuring on the '''Separate''' step itself, you can reference a '''Separation Profile''' with those configurations already set. Either way, separation's configuration is the same. '''Separation Profiles''' just allow you to save these settings outside of a single '''Batch Process'''. | |||
# You add and select a '''Separation Profile''' using the '''Separation Profiles''' folder of the '''Global Resources''' folder. | |||
# Select a '''Separation Profile''' | |||
# Use the '''''Provider''''' property to select a '''''Separation Provider'''''. | |||
| | |||
[[File:Separation-provider-02.png]] | |||
|} | |||
=== Provider Types === | |||
There are eight total '''''Separation Providers'''''. | |||
* ''[[Control Sheet Separation]]'' - New folders are created using Grooper [[Control Sheet]]s. | |||
* ''[[Event-Based Separation]]'' - The '''Batch''' is separated using one or more "'''''Separation Events'''''". Each '''''Separation Event''''' triggers the creation of a new folder. The events are as follows: | |||
** ''Blank Page'' - A blank page will trigger a new folder. | |||
** ''Barcode'' - A scanned barcode will trigger a new folder. | |||
** ''Content Type'' - This '''''Separation Event''''' uses [[Lexical]] training examples to trigger folder creation. Whenever a page confidently matches a trained example document's first page, a new folder is created. | |||
** ''Page Count'' - This is for fixed page separation. A new folder is created by a set number of pages for a document. | |||
** ''Shape'' - A new folder is created every time a "shape feature" is detected. Shape features are detected using a '''[[Shape Detection]]''' IP Command from an '''IP Profile'''. | |||
* ''[[Pattern-Based Separation]]'' - Folder creation is determined by an extractor. If the extractor returns a result on a page, a new folder is created. Subsequent pages are placed in that folder until another page produces a result. | |||
* ''[[Change in Value Separation]]'' - This provider is similar to ''Pattern-Based Separation'' in that an extractor also determines folder creation. However, folders are ''only'' created when the extractor's result ''changes''. | |||
* ''[[EPI Separation]]'' - Separation occurs using embedded page information (EPI) supplied by an extractor. This provider is helpful for separating documents whose page numbers are extractable. | |||
* ''[[ESP Auto Separation]]'' - ESP automatic separation performs document separation with multiple operations working together, using [[Lexical]] training examples in a '''Content Model''', the '''''Separation''''' properties of '''Document Types''', embedded page information, and merging designated "attachment" '''Document Types''' to "host" '''Document Types'''. | |||
** Furthermore, since ''ESP Auto Separation'' uses a '''Content Model's''' training data (as well as classification [[Rules Based (Classification Method)|rules]] set on its '''Document Types'''), it both separates ''and'' classifies documents during the '''Separate''' activity. | |||
* ''[[Multi Separator]]'' - Performs separation using multiple separation providers. | |||
* ''[[Undo Separation]]'' - The anti-separator! As its name implies this provider "undoes" separation, removing all '''Batch Folders''' in a '''Batch''' or '''Batch Folder''' level in the folder hierarchy, leaving only loose pages. | |||
=== Real Time vs Lexical Providers === | |||
There are two different categories these '''''Separation Providers''''' can be placed in: | |||
* Real Time | |||
* Lexical | |||
The main distinction between these two is the "Lexical" providers require machine readable text data. They use data extractors (using regular expression pattern matching) to determine the separation points in a '''Batch'''. For scanned page images, [[OCR]] obtains this data. Digital documents, such as PDFs, have machine readable text encoded in the file, but it needs to be extracted in a way Grooper can use it. Either way, the documents need to be conditioned with a '''Recognize''' step in a '''Batch Process''' to obtain this text data. | |||
The "Real Time" providers do ''not'' require text data in order to separate documents. They use visual page information or fixed page numbers to find the separation points in a '''Batch'''. This means these providers can separate documents in real time during scanning. Since no extra document conditioning is required, there is no need for a '''Separate''' step in a '''Batch Process'''. | |||
{|cellpadding=10 cellspacing=5 | |||
|style="width:40%" valign=top| | |||
# Instead, a '''''Separation Profile''''' can be assigned from the '''Scan''' client. | |||
# After pressing the "Scan" button to bring pages into Grooper... | |||
| | |||
[[File:Separation-provider-03.png]] | |||
|- | |||
|valign=top| | |||
As long as the '''''Separation Provider''''' used is a Real Time provider, the documents will separate as they are scanned in. Folders will be inserted according to the '''Separation Profile's''' configuration. Here, using the ''Control Sheet Separation'' provider. | |||
* Note: This does not mean you ''can't'' use Real Time '''''Separation Providers''''' in a '''Separate''' step. You just have the option of performing separation during scanning using them. | |||
| | |||
[[File:Separation-provider-04.png]] | |||
|} | |||
The following '''''Separation Providers''''' are "Real Time" providers: | |||
* ''[[Control Sheet Separation]]'' | |||
* ''[[Event-Based Separation]]'' | |||
The following '''''Separation Providers''''' are "Lexical" providers: | |||
* ''[[Pattern-Based Separation]]'' | |||
* ''[[Change in Value Separation]]'' | |||
* ''[[EPI Separation]]'' | |||
* ''[[ESP Auto Separation]]'' |
Revision as of 10:41, 15 October 2020
Separation Providers are the available methods Grooper has to separate pages into document folders. Each provider has its own configurable properties. Changing these properties will change the criteria to separate pages into documents.
There are a multitude of methods to separate pages into document folders in Grooper. How documents are separated is controlled by the Separation Provider property of the Separate activity or a Separation Profile.
In a Batch Process, you will set the Separation Provider using the Provider property of a Separate step.
|
A Separation Profile is a way to configure a Separation Provider and save it to an object that can be reused multiple times in multiple Batch Processes. Instead of configuring on the Separate step itself, you can reference a Separation Profile with those configurations already set. Either way, separation's configuration is the same. Separation Profiles just allow you to save these settings outside of a single Batch Process.
|
Provider Types
There are eight total Separation Providers.
- Control Sheet Separation - New folders are created using Grooper Control Sheets.
- Event-Based Separation - The Batch is separated using one or more "Separation Events". Each Separation Event triggers the creation of a new folder. The events are as follows:
- Blank Page - A blank page will trigger a new folder.
- Barcode - A scanned barcode will trigger a new folder.
- Content Type - This Separation Event uses Lexical training examples to trigger folder creation. Whenever a page confidently matches a trained example document's first page, a new folder is created.
- Page Count - This is for fixed page separation. A new folder is created by a set number of pages for a document.
- Shape - A new folder is created every time a "shape feature" is detected. Shape features are detected using a Shape Detection IP Command from an IP Profile.
- Pattern-Based Separation - Folder creation is determined by an extractor. If the extractor returns a result on a page, a new folder is created. Subsequent pages are placed in that folder until another page produces a result.
- Change in Value Separation - This provider is similar to Pattern-Based Separation in that an extractor also determines folder creation. However, folders are only created when the extractor's result changes.
- EPI Separation - Separation occurs using embedded page information (EPI) supplied by an extractor. This provider is helpful for separating documents whose page numbers are extractable.
- ESP Auto Separation - ESP automatic separation performs document separation with multiple operations working together, using Lexical training examples in a Content Model, the Separation properties of Document Types, embedded page information, and merging designated "attachment" Document Types to "host" Document Types.
- Furthermore, since ESP Auto Separation uses a Content Model's training data (as well as classification rules set on its Document Types), it both separates and classifies documents during the Separate activity.
- Multi Separator - Performs separation using multiple separation providers.
- Undo Separation - The anti-separator! As its name implies this provider "undoes" separation, removing all Batch Folders in a Batch or Batch Folder level in the folder hierarchy, leaving only loose pages.
Real Time vs Lexical Providers
There are two different categories these Separation Providers can be placed in:
- Real Time
- Lexical
The main distinction between these two is the "Lexical" providers require machine readable text data. They use data extractors (using regular expression pattern matching) to determine the separation points in a Batch. For scanned page images, OCR obtains this data. Digital documents, such as PDFs, have machine readable text encoded in the file, but it needs to be extracted in a way Grooper can use it. Either way, the documents need to be conditioned with a Recognize step in a Batch Process to obtain this text data.
The "Real Time" providers do not require text data in order to separate documents. They use visual page information or fixed page numbers to find the separation points in a Batch. This means these providers can separate documents in real time during scanning. Since no extra document conditioning is required, there is no need for a Separate step in a Batch Process.
|
|
As long as the Separation Provider used is a Real Time provider, the documents will separate as they are scanned in. Folders will be inserted according to the Separation Profile's configuration. Here, using the Control Sheet Separation provider.
|
The following Separation Providers are "Real Time" providers:
The following Separation Providers are "Lexical" providers: