2.72:What Is Separation - DSmith
Overview
To put it bluntly, Separation is an organizational action in Grooper that takes a loose collection of pages and, as the name states, separates them into their own Document Folders. It does so by finding where one set of pages begins and ends, placing it within its own Document Folder, and repeating the process until everything has been properly sorted. To use an analogy, imagine your boss walks into your office with a box full of assorted pages and tells you he wants the documents sorted and filed.
How would you first approach the task? Simple: You would separate the pages, finding where each set begins and ends, setting it aside from the rest, and repeating until all pages were separated. Separation in Grooper works much the same way.
Before Separation
However, before separation begins, you must first assess whether or not separation is necessary. Are the pages being physically scanned into Grooper? Then separation is a must. Are the pages being digitally migrated into Grooper? Well, in that case it depends. Are the documents discrete or packeted? Discrete, or as they're sometimes called, individual documents don't need to be separated; They come into Grooper already organized into Batch Folders. Packeted documents on the other hand, are a different story. With packeted, there might be multiple documents within one file. Therefore, to get each individual document within its own Batch Folder, separation must be performed.
To illustrate this, we've included a helpful graphic:
Why Separation?
Before we discuss how to separate, you might be wondering by we're bothering to separate in the first place. Simple: Grooper can't classify until separation has been performed. Think about how your Batch begins in Grooper. It's just a collection of loose pages. Grooper can't classify them. However, if we separate those loose pages into Document Folders, then they can be classified.
In summary, separation is necessary for classification.
Separation Methods
Once the need for separation has been determined, the next question to ask is how documents are to be separated. Grooper has various separation methods that one can use. These methods are called Separation Providers. There are six providers in total; which one users choose depends on both how the pages come into Grooper, as well as their own preference. For a detailed description of Separation Providers, click here: [1]
However, for quick synopsis, a Separation Provider is where Separation is configured directly. This can either be configured on the Batch Process Step itself or on a Separation Profile. Either way, the Separation Provider is where the "separation point" for the documents is established. This "separation point" can be determined by any number of separation methods that Grooper provides. Such methods are divided into two categories, scan-supported and text-based.
Scan-Supported
Scan-supported Separation Providers perform the separation step during scanning. Unlike Text-Based Providers, OCR is not required in order to perform separation, as there is no need to be able to recognize text at the scan step.
The Scan-Supported Separation Providers are:
- Control Sheet Separation: Separates pages in a Batch through the use of a Control Sheet that is placed between the end of one set of pages and the beginning of another. During scanning, Grooper reads the barcode on the Control Sheet, recognizing it as such, and performs separation accordingly.
- Event-Based Separation: Separation dictated by one of five events. They are:
- Barcode Detected: Separates when Grooper detects a barcode
- Blank Page Detected: Separates when a blank page is detected.
- Content Type Detected: Separates based upon detection of page one of a Content Type.
- Shape Detected: Separates when a shape, such as a stamp, is detected during scanning.
- Page Count: Separates by counting the pages as they are being scanned.
FYI |
Barcode, Blank Page, Content Type, and Shape will need to be configured before scanning. Just like OCR and text, Grooper can't determine barcode type and location, what makes a page blank, which Content Type you want, or what shape you're looking for if you don't tell it. With Page Count however, Grooper will count the pages as they are being scanned in and separate automatically. |
Text-Based
The other kind of Separation Providers are Text-Based. Unlike Scan-Supported Providers, Text-Based Providers do require OCR. After all, you can't expect Grooper to perform separation based on whatever text the documents have when Grooper can't recognize the text! The Text-Based Providers are:
- Pattern-Based Separation: Creates a new Document Folder each time the extractor returns a result that matches the pattern input by the user.
- Change in Value Separation: Creates a new Document Folder each time the extracted value changes.
- EPI Separation: Creates a new Document Folder using embedded page information such as page numbers.
- ESP Auto Separation: Creates a new Document Folder using page data, such as page number, the page's structure, or information related to the Document Type.
Profiles vs. Providers
Now that you know how to configure separation, the next question is where? Separation can be configured either right on the Separate Batch Process Step via the Separation Provider, or on a Separation Profile which you can then reference on the Separate step in your Batch Process.
What is a Separation Profile?
As stated earlier, a Separation Profile can be used to configure separation. A Separation Profile is an object in the Grooper Node Tree that can be created within a Project, and is separate from the steps within a Batch Process. Once separation is configured on the Separation Provider within the Separation Profile, you can reference that profile on the Separate step.
While it is more convenient to configure separation on the provider of the Separate step, a Separation Profile does have its purpose. What if you wanted to create multiple Batch Processes that involve separation? Without a profile to reference, you would have to configure the separation step every single time on each of the multiple Batch Processes. With a Separation Profile, you can just configure separation once, then reference the profile whenever you need to perform separation.