2023.1:Separation Provider (Property): Difference between revisions

From Grooper Wiki
No edit summary
Tag: Reverted
No edit summary
Tag: Reverted
Line 12: Line 12:
* [[Media:2023.1 Wiki Separation-Provider Project.zip]]
* [[Media:2023.1 Wiki Separation-Provider Project.zip]]
|}
|}
== About ==
'''''Separation Providers''''' establish the logic used to create "separation points" between loose pages. Each '''''Separation Provider''''' has its own criteria for determining where these separation points occur within a '''[[Batch]]'''. However, the basic operation is same for all of them.
# Determine what page is the first page of a document.
#* This is the "separation point".
#* Generally, the first page in a '''Batch''' is always the first separation point.
# Insert a '''[[Batch Folder]]''' into the '''Batch'''.
# Move pages into that folder until another first page of a document is encountered.
# Insert a new '''Batch Folder''' into the '''Batch'''
#* This is the next "separation point".
# Move pages into that folder until another first page of a document is encountered.
# Repeat until the end of the '''Batch'''.
{|cellpadding=10 cellspacing=5
|
[[File:23.1 Separation Provider 01 About 01 Separate Example 01.png]]
|
[[File:23.1 Separation Provider 01 About 01 Separate Example 02.png]]
|
[[File:23.1 Separation Provider 01 About 01 Separate Example 03.png]]
|
[[File:23.1 Separation Provider 01 About 01 Separate Example 04.png]] 
|}
=== List of Providers ===
{{#lst:Separation|Separation Providers1}}
== How To ==
The '''''Separation Provider''''' is selected and configured using the '''''Provider''''' property of the '''Separate''' Activity or a '''Separation Profile'''.
=== Setting up the Separate Step ===
{{#lst:Separate (Activity)|Separate Activity 1}}
For more information on how to set up your Sepation '''''Provider''''' reference the [[#List of Providers]] section in this article and click on the '''''Provider''''' you would like to configure to navigate to an article with a tutorial on configuration.


== Glossary ==
== Glossary ==
Line 67: Line 102:


<u><big>'''Visual'''</big></u>: {{#lst:Glossary|Visual}}
<u><big>'''Visual'''</big></u>: {{#lst:Glossary|Visual}}
== About ==
'''''Separation Providers''''' establish the logic used to create "separation points" between loose pages. Each '''''Separation Provider''''' has its own criteria for determining where these separation points occur within a '''[[Batch]]'''. However, the basic operation is same for all of them.
# Determine what page is the first page of a document.
#* This is the "separation point".
#* Generally, the first page in a '''Batch''' is always the first separation point.
# Insert a '''[[Batch Folder]]''' into the '''Batch'''.
# Move pages into that folder until another first page of a document is encountered.
# Insert a new '''Batch Folder''' into the '''Batch'''
#* This is the next "separation point".
# Move pages into that folder until another first page of a document is encountered.
# Repeat until the end of the '''Batch'''.
{|cellpadding=10 cellspacing=5
|
[[File:23.1 Separation Provider 01 About 01 Separate Example 01.png]]
|
[[File:23.1 Separation Provider 01 About 01 Separate Example 02.png]]
|
[[File:23.1 Separation Provider 01 About 01 Separate Example 03.png]]
|
[[File:23.1 Separation Provider 01 About 01 Separate Example 04.png]] 
|}
=== List of Providers ===
{{#lst:Separation|Separation Providers1}}
== How To ==
The '''''Separation Provider''''' is selected and configured using the '''''Provider''''' property of the '''Separate''' Activity or a '''Separation Profile'''.
=== Setting up the Separate Step ===
{{#lst:Separate (Activity)|Separate Activity 1}}
For more information on how to set up your Sepation '''''Provider''''' reference the [[#List of Providers]] section in this article and click on the '''''Provider''''' you would like to configure to navigate to an article with a tutorial on configuration.

Revision as of 10:23, 27 August 2024

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252023.12.90

The Provider property of the Separate Activity defines the type of separation to be performed at the designated Scope.

Each provider has its own configurable properties. Changing these properties will change the criteria to separate pages into documents.

You may download the ZIP below and upload it into your own Grooper environment (version 2023.1). This containes one or more Projects with resources used in examples throughout this article.

About

Separation Providers establish the logic used to create "separation points" between loose pages. Each Separation Provider has its own criteria for determining where these separation points occur within a Batch. However, the basic operation is same for all of them.

  1. Determine what page is the first page of a document.
    • This is the "separation point".
    • Generally, the first page in a Batch is always the first separation point.
  2. Insert a Batch Folder into the Batch.
  3. Move pages into that folder until another first page of a document is encountered.
  4. Insert a new Batch Folder into the Batch
    • This is the next "separation point".
  5. Move pages into that folder until another first page of a document is encountered.
  6. Repeat until the end of the Batch.

List of Providers

How To

The Separation Provider is selected and configured using the Provider property of the Separate Activity or a Separation Profile.

Setting up the Separate Step

For more information on how to set up your Sepation Provider reference the #List of Providers section in this article and click on the Provider you would like to configure to navigate to an article with a tutorial on configuration.

Glossary

Activity: Grooper Activities define specific document processing operations done to a inventory_2 Batch, folder Batch Folder, or contract Batch Page. In a settings Batch Process, each edit_document Batch Process Step executes a single Activity (determined by the step's "Activity" property).

  • Batch Process Steps are frequently referred by the name of their configured Activity followed by the word "step". For example: "Classify step".

Batch Folder: The folder Batch Folder is an organizational unit within a inventory_2 Batch, allowing for a structured approach to managing and processing a collection of documents. Batch Folder nodes serve two purposes in a Batch. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other Batch Folders and/or contract Batch Page nodes as children.

  • Batch Folders are frequently referred to simply as "documents" or "folders" depending on how they are used in the Batch.

Batch Process Step: edit_document Batch Process Steps are specific actions within a settings Batch Process sequence. Each Batch Process Step performs an "Activity" specific to some document processing task. These Activities will either be a "Code Activity" or "Review" activities. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Process Steps are frequently referred to as simply "steps".
  • Because a single Batch Process Step executes a single Activity configuration, they are often referred to by their referenced Activity as well. For example, a "Recognize step".

Batch Process: settings Batch Process nodes are crucial components in Grooper's architecture. A Batch Process is the step-by-step processing instructions given to a inventory_2 Batch. Each step is comprised of a "Code Activity" or a Review activity. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Processes by themselves do nothing. Instead, they execute edit_document Batch Process Steps which are added as children nodes.
  • A Batch Process is often referred to as simply a "process".

Batch: inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

Change in Value Separation: The Change in Value Separation Separation Provider creates a new folder and separates every time an extracted value changes from one contract Batch Page to another.

Content Model: stacks Content Model nodes define a classification taxonomy for document sets in Grooper. This taxonomy is defined by the collections_bookmark Content Categories and description Document Types they contain. Content Models serve as the root of a Content Type hierarchy, which defines Data Element inheritance and Behavior inheritance. Content Models are crucial for organizing documents for data extraction and more.

Content Type: Content Types are a class of node types used used to classify folder Batch Folders. They represent categories of documents (stacks Content Models and collections_bookmark Content Categories) or distinct types of documents (description Document Types). Content Types serve an important role in defining Data Elements and Behaviors that apply to a document.

Control Sheet Separation: Control Sheet Separation is a Separation Provider that uses Grooper document_scanner Control Sheets to separate documents.

Document Type: description Document Type nodes represent a distinct type of document, such as an invoice or a contract. Document Types are created as child nodes of a stacks Content Model or a collections_bookmark Content Category. They serve three primary purposes:

  1. They are used to classify documents. Documents are considered "classified" when the folder Batch Folder is assigned a Content Type (most typically, a Document Type).
  2. The Document Type's data_table Data Model defines the Data Elements extracted by the Extract activity (including any Data Elements inherited from parent Content Types).
  3. The Document Type defines all "Behaviors" that apply (whether from the Document Type's Behavior settings or those inherited from a parent Content Type).

EPI Separation: The EPI Separation Separation Provider uses embedded page information ("EPI") to Separate loose pages into document folders. A Data Extractor is used to find page numbers from the text on a page and Grooper uses this information to separate the pages.

ESP Auto Separation: ESP Auto Separation is a Separation Provider used for document separation. It is unique in that it both separates and classifies documents at the same time. It uses page-level classification training examples (among other things) to determine where to insert document folders in a inventory_2 Batch.

Event-Based Separation: Event-Based Separation is a Separation Provider that Separates documents using one or more "Separation Events". Each Separation Event triggers the creation of a new folder.

Execute: tv_options_edit_channels Execute is an Activity that runs one or more specified object commands. This gives access to a variety of Grooper commands in a settings Batch Process for which there is no Activity, such as the "Sort Children" command for Batch Folders or the "Expand Attachments" command for email attachments.

Lexical: "Lexical" is a Classify Method that classifies folder Batch Folders based on the text content of trained document examples. This is achieved through the statistical analysis of word frequencies that identify description Document Types.

Multi Separator: The Multi Separator Separation Provider performs separation using multiple Separation Providers. It allows users to create a list of any of the other Separation Providers. If the first provider on the list fails to separate a page (or, as more often is the case, a series of pages), the next one will be applied. If that fails, the next, and so on.

OCR: OCR is stands for Optical Character Recognition. It allows text on paper documents to be digitized, in order to be searched or edited by other software applications. OCR converts typed or printed text from digital images of physical documents into machine readable, encoded text.

Pattern-Based Separation: Pattern-Based Separation is a Separation Provider that creates a new document folder every time a value returned by a defined pattern is encountered on a page.

Pattern-Based: Pattern-Based is a Collation Provider option for pin Data Type extractors. Pattern-Based uses regular expressions to sequence returned results into a final result set.

Project: package_2 Projects are the primary containers for configuration nodes within Grooper. The Project is where various processing objects such as stacks Content Models, settings Batch Processes, profile objects are stored. This makes resources easier to manage, easier to save, and simplifies how node references are made in a Grooper Repository.

Scope: The Scope property of a edit_document Batch Process Step, as it relates to an Activity, determines at which level in a inventory_2 Batch hierarchy the Activity runs.

Separate: insert_page_break Separate is an Activity that sorts contract Batch Pages into individual folder Batch Folders. This distinguishes "loose pages" from the documents formed by those pages. Once loose pages are separated into Batch Folder documents, they can be further processed by unknown_document Classify, export_notes Extract, output Export and other Activities that need to run on the folder (i.e. document) level.

Separation Profile: insert_page_break Separation Profiles store settings that determine how contract Batch Pages are separated into folder Batch Folders. Separation Profiles can be referenced in two ways:

  • In a Review activity's Scan Viewer settings to control how pages are separated in real time during scanning.
  • In a Separate activity as an alternative to configuring separation settings locally.

Separation Provider: The Provider property of the Separate Activity defines the type of separation to be performed at the designated Scope.

Separation: Separation is the process of taking an unorganized inventory_2 Batch of loose contract Batch Pages and organizing them into documents represented by folder Batch Folders in Grooper. This is done so Grooper can later assign a description Document Type to each document folder in a process known as "classification".

Undo Separation: Undo Separation is a Separation Provider. Instead of putting loose contract Batch Pages into folder Batch Folders, this Separation Provider removes Batch Folders, leaving only loose pages.

Visual: "Visual" is a Classify Method that uses image analysis instead of text data to determine the description Document Type assigned to a folder Batch Folder during classification. Instead of using text-based extractors, an "Extract Features" IP Command in an perm_media IP Profile is used to collect image-based data from a Batch Folder's image(s). This image-based data is compared against that of previously trained document examples of each Document Type to classify the Batch Folder.