2.72:What Is Separation - DSmith: Difference between revisions

From Grooper Wiki
Line 34: Line 34:
The other kind of '''Separation Providers''' are Text-Based. Unlike Scan-Supported Providers, Text-Based Providers do require OCR. After all, you can't expect Grooper to perform separation based on whatever text the documents have when Grooper can't recognize the text! The Text-Based Providers are:
The other kind of '''Separation Providers''' are Text-Based. Unlike Scan-Supported Providers, Text-Based Providers do require OCR. After all, you can't expect Grooper to perform separation based on whatever text the documents have when Grooper can't recognize the text! The Text-Based Providers are:


* Pattern-Based Separation
* Pattern-Based Separation: Creates a new Document Folder each time the extractor returns a result that matches the pattern input by the user.
* Change in Value Separation
* Change in Value Separation: Creates a new Document Folder each time the extracted value changes.
* EPI Separation
* EPI Separation: Creates a new Document Folder using embedded page information such as page numbers.
* ESP Auto Separation
* ESP Auto Separation: Creates a new Document Folder using page data, such as page number, the page's structure, or information related to the Document Type.
<br>
<br>
<br>
<br>
For more information on Separation Providers, click [[Separation Provider|here]].
For more information on Separation Providers, click [[Separation Provider|here]].

Revision as of 11:04, 12 January 2024

Overview

To put it bluntly, Separation is an organizational action in Grooper that takes a loose collection of pages and, as the name states, separates them into their own folders. It does so by finding where one document begins and ends, placing it within its own folder, and repeating the process until everything has been properly sorted. To use an analogy, imagine your boss walks into your office with a box full of assorted documents and tells you he wants the documents sorted and filed.

How would you first approach the task? Simple: You would separate the documents, finding where each document begins and ends, setting it aside from the rest, and repeating until all documents were separated. Separation in Grooper works much the same way.

How Separation works in Grooper.

Before Separation

However, before separation begins, you must first assess whether or not separation is necessary. Are your documents being physically scanned into Grooper? Then separation is a must. Are your documents being digitally migrated into Grooper? Well, in that case it depends. Are the documents discrete or packeted? Discrete, or as they're sometimes called, individual documents don't need to be separated; They come into Grooper already organized into Batch Folders. Packeted documents on the other hand, are a different story. With packeted, there might be multiple documents within one file. Therefore, to get each individual document within its own Batch Folder, separation must be performed.

To illustrate this, we've included a helpful graphic:

Separation Methods

Once the need for separation has been determined, the next question to ask is how documents are to be separated. Grooper various separation methods that one can use. Such methods can be set on either a Separation Profile or a Separation Provider, detailed here: [1]

However, for quick synopsis, a Separation Profile is where the Separation step can be configured outside of a Batch Process, where it can be used multiple times on multiple Batch Processes. A Separation Provider is where Separation is configured directly. This can either be configured on the Batch Process Step itself or on a Separation Profile. Either way, the Separation Provider is where the "separation point" for the documents is established. This "separation point" can be determined by any number of separation methods that Grooper provides. Such methods are divided into two categories, scan-supported and text-based.

Scan-Supported

Scan-supported Separation Providers perform the separation step during scanning. Unlike Text-Based Providers, OCR is not required in order to perform separation, as there is no need to be able to recognize text at the scan step.

The Scan-Supported Separation Providers are:

  • Control Sheet Separation: Separates pages in a Batch through the use of a Control Sheet that is placed between the end of one set of pages and the beginning of another. During scanning, Grooper reads the barcode on the Control Sheet, recognizing it as such, and performs separation accordingly.
  • Event-Based Separation: Separation dictated by one of five events. They are:
    • Barcode Detected: Separates when Grooper detects a barcode
    • Blank Page Detected: Separates when a blank page is detected.
    • Content Type Detected: Separates based upon detection of Content Types set up by the user/ [[<-- Confirm that this is accurate.]]
    • Shape Detected: Separates when a shape, such as a stamp, is detected during scanning.
    • Page Count: Separates by counting the pages as they are being scanned.

Text-Based

The other kind of Separation Providers are Text-Based. Unlike Scan-Supported Providers, Text-Based Providers do require OCR. After all, you can't expect Grooper to perform separation based on whatever text the documents have when Grooper can't recognize the text! The Text-Based Providers are:

  • Pattern-Based Separation: Creates a new Document Folder each time the extractor returns a result that matches the pattern input by the user.
  • Change in Value Separation: Creates a new Document Folder each time the extracted value changes.
  • EPI Separation: Creates a new Document Folder using embedded page information such as page numbers.
  • ESP Auto Separation: Creates a new Document Folder using page data, such as page number, the page's structure, or information related to the Document Type.



For more information on Separation Providers, click here.