2023.1:Event-Based Separation (Separation Provider)

From Grooper Wiki

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252023.1202320212.90

Event-Based Separation is a Separation Provider that Separates documents using one or more "Separation Events". Each Separation Event triggers the creation of a new folder.

About

The Event Based Separation Provider separates documents according to a specific Separation Event that occurs within the Batch. Whenever Grooper encounters that event in the Batch it will create a new Document Folder and separate out those pages.


The Separation Events are as follows:

  • Blank Page - A blank page will trigger a new folder.
  • Barcode - A scanned barcode will trigger a new folder.
  • Page Count - This is for fixed page separation. A new folder is created by a set number of pages for a document.
  • Shape - A new folder is created every time a "shape feature" is detected. Shape features are detected using a Shape Detection IP Command from an IP Profile.
  • Content Type - This Separation Event uses Lexical or Visual training examples to trigger folder creation. Whenever a page confidently matches a trained example document's first page, a new folder is created.


When the event is triggered, one of two things can happen:

  1. A new folder is created and subsequent pages are appended until a new event is triggered (resulting in a new document)
  2. The page triggering the event may be deleted
    • Sometimes, a cover sheet may be placed before a document. That page does not necessarily carry any useful data once the document is separated. If it is truly "junk" this can automate its deletion.

How it Works

Take a Batch of loose pages. The Event-Based Separation provider analyzes each page, looking for a configured Separation Event, one page after the next.

Let's say the yellow pages in this Batch meet the Separation Event's requirements.

Every time a Separation Event occurs, a "Separation Point" (or "Binding Point") is established.

This indicates the start of a new document.

Separation occurs when the Event-Based Separation provider executes these Separation Points, creating a new Batch Folder.

All subsequent Batch Pages are placed in the created Batch Folder, separating the loose pages in the Batch into document folders.

Configured Separation Events are called Trigger Events or Triggers because they "trigger" document folder creation. For example the Blank Page event uses blank pages to trigger folder creation. Every time a blank page is found in a batch, a new document folder is created.

Optionally, the Separation Event may be configured to delete the triggering page. Often it is the case the trigger page is used simply as a separator sheet, indicating the start of a new document. If that is the case, the page carries no meaningful data.

If it's truly just junk, the Event-Based Provider can automatically delete the page during document separation.

This can quickly clean up the documents in your Batch, leaving the document folders with only the important pages of the document.

But again, this is optional functionality. Sometimes it will be the case the triggering page is part of the whole document and should be included in the document folder.

How To

Setting the Event on the Provider

  1. In the node tree, select the Separate Step of a Batch Process. You can also set a Separation Provider on a Separation Profile.
  2. On the right side under the "Activity Properties" you will need to select "Event-Based" as your Provider.
  3. Then, click on the ellipsis button to the right of the Events property to set up your Events.


  1. When the "Events" window pops up, click on the "+" button at the top of the new window. This will bring up a drop down menu.
  2. Select one of the Events from the drop down list you wish to add. You can add more than one event if you wish.


  1. The Events you choose will show up in a list on the left side of the Events window.
  2. After you finish configuring your Events, click the "OK" button to save your changes (see further in the article for instructions on configuring Events).


  1. Each Event that you add will have at least two properties: Folder Level and Delete Page. These two properties dictate what happens when the Event is triggered on a page.

Configuring Separation Events

Blank Page

The Blank Page Separation Event acts just like you might expect. Grooper will go through your Batch and separate pages into a new Document Folder every time it comes across a blank page.

Most of the time you can just use default settings on your Blank Page Event, but there are times that you may have to tweak some of the settings to get the results you want. We are going to start with just looking at how to set up the Blank Page Detected Separation Event on your Provider.

Setting Up Blank Page Separation
  1. In the below screenshot, we have a Batch. Each document has a blank page before the first page of the actual document.


  1. We have set up a Separate Step in the "Blank Page Separation Tester" Batch Process with an Event-Based Separation Provider. We have added the "Blank Page Detected" Event to the list of Separation Events.
  2. Right now we are using all the default settings for the "Blank Page Detected" Event.


  1. After setting up our Event, we can go to the "Activity Tester" tab to test our Separate Step.
  2. If we click the play button icon at the top-right of the BATCH panel, we can test the activity on our Batch.
  3. Now the pages are separated inside folders.
  4. Grooper separated the pages when it came across a blank pages just as we expected.


  1. If we look in Folder (8) in the Batch though, we see that Grooper did not separate appropriately. It looks like we have several blank pages in this Document Folder and separation did not occur.
  2. If we take a closer look at the blank page in the Batch Viewer, we can see that this page is not completely blank. It has the words "THIS PAGE INTENTIONALLY LEFT BLANK" printed at the top. Since Grooper detected pixels on this page, it did not detect this as a blank page. We can fix this.

Editing the Detection Settings
  1. Back in the Events window where we set the Blank Page Detected event, we see that we have these Detection Settings properties.
    • We could go through these properties using trial and error and then test each time to see if we get the result we want, but there is an easier way to figure out what settings to use.


  1. In the screenshot below, we have added an IP Profile to the Node Tree. We are going to use this to determine our settings for the Blank Page event.
  2. Right-Click on the IP Profile.
  3. Hover over "Add Command", then hover over "Feature Detection". Then click on "Blank Page Detection".


  1. When the "Add Command" window pops up, click "EXECUTE" to add the IP Step.


  1. Select the newly created Blank Page Detection IP Step in the node tree.
  2. Click on the "Tester" tab.
  3. In the Batch Viewer, select the page in your Batch that you want Grooper to detect as blank.
  4. Click the play button icon located in the top right of the "STEP PROPERTIES" panel.
  5. In the Diagnostics panel, click on the "Execution Log".
  6. This will show you what the "Detection Limits" are and whether Grooper detects this as a blank page.
    • In the screenshot below, "IsBlank" is False, so Grooper is not seeing this as a blank page.


  1. In the screenshot below we adjusted the Detection Limits properties. We have raised the Maximum to 30pt.
  2. After making any changes to properties, you will need to re-test the IP Profile by clicking the play button icon again.
  3. Looking at the Execution Log, we can see the Detection Limits section has changetd to reflect our edits, but this is not enough for Grooper to detect this as a Blank Page. IsBlank is still coming up as False.


  1. We can also adjust the Speck Threshold to improve our results. In the screenshot below, we have changed the Speck Threshold to 6pt.
  2. With both the Maximum and the Speck Threshold properties adjusted, Grooper is now detecting the page as blank (IsBlank is "True").


  1. Now, back in the properties for the Blank Page Detected event, we can take the settings from the IP Profile test earlier and put them in the Detection Settings here.
  2. After updating your settings, click "OK" to save the changes.


  1. On the "Activity Tester" tab, we can test the separation again.
  2. Now the Batch is being separated appropriately. The page is now being seen as a blank page and Grooper is separating when it finds the blank page.

Deleting Blank Pages
  1. If you want Grooper to delete the blank page after separation, you can go back into the Events List and set the Delete Page property to True.


  1. After running separation on this Batch, we can see that the blank pages at the beginning of each document have been deleted since we turned the Delete Page property to True.


  1. For this next part of the tutorial, we have turned the Delete Page property back to False.


  1. In the below screenshot, we have a Batch where we have three blank pages in a row. Grooper has separated at each blank page so we ended up with two Document Folders with a single blank page in each.


  1. If we turn the Delete Pages property back to True and run separate on the Batch of loose pages, we get something a little different. Grooper deletes the pages AFTER the pages have been separated so we still get Document Folders but the blank pages have been deleted out of them, so they are empty.


Barcode

Virtually scanned barcodes are used to trigger folder creation. Grooper will read barcodes on a page during separation. If a barcode is found (according to the property settings configured on the event), a new folder will be created.

You will set the barcode symbology(ies) on the document(s), using the property panel. You may also set the barcode's value the separator should watch for. If no value is set, the presence of any barcode (of the assigned symbology) at all will trigger the event.


  1. In this example, we have added the Barcode Detected event to our list of Separation Events.
  2. The first property we need to look at is Symbology. This is the type of barcode you have on your documents.


  1. Opening the Symbology property we find a long list of different types of codes. You will need to select the one that applies to your documents by clicking the checkbox to the right of the code.


  1. In our example in the screenshot below, the first document is a Federal W-4. On the first page we have a barcode. Grooper will detect the barcode and separate when it encounters any page with a barcode.
  2. In the Batch we are using in this example, each document has a barcode on its first page.


  1. After testing separation, a folder was created every time Grooper came to a barcode. In our Batch the first three documents were separated appropriately.


  1. The last two Document Folders in our Batch were not separated appropriately. Two folders were made and one page was inserted into each folder, but both pages are actually part of one document.
  2. If we look closer, we see that the second page also has a barcode on it, so when Grooper detected the barcode, separation was triggered. There is a way we can fix this.


  1. Barcodes have an encoded value. The barcode on the first page of the W-4 document translates to "W-4". The barcodes on the other documents also have an encoded value that is specific to that type of document.


  1. We can actually tell Grooper to separate only when it finds barcodes with a certain value. In the List of Separation Events window for the Event-Based Provider, we just need to add the value of the barcode in the Value property for the Barcode Detected event.


  1. We will need to add a Barcode Detected event for each different barcode value where we want Grooper to separate.
  2. Each event has a different Value according to the barcode values on the first page of each document.


  1. If we run separation again, Grooper will separate appropriately. Only the barcodes with the values we specified will trigger separation. All other barcodes will be ignored.


Page Count

The PageCount event is used for fixed page separation. For example, if you expect a new document to exist in a Batch every four pages, you can set the Page Count of this event to "4". A new Batch Folder will be created every four pages, with four Batch Pages placed in the created folder.


  1. To use the Page Count event, we need to add it to the List of Separation Events.
  2. Then we simply need to enter how many pages each document in the Batch contains.


  1. In our example we have a Batch where each document is exactly four pages. So we set the Page Count property to 4.


  1. After running separation, folders will be created every four pages and the document separated into them.

Shape

The Shape event allows images on a page, such as a stamp or a logo, to be used to separate documents. "Shape features" are used as the trigger event for Event-Based Separation. First, shape features must be saved to the page using a Shape Detection IP Command from an IP Profile. This allows Grooper to "see" whether or not a shape is on a page. Once the feature is encountered on the page, the event is triggered, allowing a new document folder to be created.

Adding your Sample Image
  1. In this example, we have an IP Profile in our node tree. We need to add an IP Step. To add the IP Step, right-click on the IP Profile.
  2. Hover over "Add Command", then hover over "Feature Detection". Finally, click on "Shape Detection..." to add the Shape Detection IP Step.


  1. When the "Add" window pops up, feel free to change the name of the IP Step, or just leave it as the default. In the below example, we have changed the name of the step to "Stamp".
  2. After you are satisfied with the name, click "EXECUTE" to add the IP Step.


  1. After creating the IP Step, go to the "Tester" tab.
  2. In this example, the shape we are going to be using to separate documents is a "FILED" stamp.


  1. In the top left of the Document Viewer panel, click the marquee tool. This icon should look like a square surrounded by dots.
  2. Draw a box around the shape on the document you want to use for separation.
  3. Click the copy icon located to the right of the marquee tool to copy the selection to the clipboard.


  1. With the shape copied to the clipboard, under the Command properties, click on the ellipsis button to the right of Sample Images.


  1. When the "Sample Image Editor" pops up, click on the clipboard icon above the Sample Name list.


  1. When the "Paste Sample Image" window pops up, type in a name for your sample image.
  2. Click "OK" to save.


  1. Any samples you add should appear in a list under "# Sample Name" on the left of the Sample Image Editor.
  2. When you select a sample from the list, you should see a preview of the sample to the right.
  3. To close the Sample Image Editor, click "OK".


  1. Now that we have our Sample Image set, click the play icon at the top of the screen in the "STEP PROPERTIES" panel to test the IP Profile.
  2. Under the Diagnostics panel inside of the IP Step's folder, there should be a "Shape Locations.jpg" if Grooper detected a shape on the page.
  3. If we click on the "Shape Locations.jpg" we can see where Grooper found the shape on the page. The areas will be highlightd in orange. In this example, we see that Grooper found shapes all over the page, even where the shape is not present. We are getting a lot of false positives.


We will go over some ways to improve our results in the following sections.


Background Differencing
  1. Our shape included a lot of white space. This can make it difficult for Grooper to differentiate the shape from areas on the document with a lot of white space. By turning Background Differencing to True, Grooper is better able to differentiate the shape from the background white space.
  2. If you retest the IP Profile, the "Shape Locations.jpg" in the Diagnostics panel now shows that Grooper now is only detecting one shape and it is the correct one.

Minimum Confidence and Dilation Factor
  1. In the screenshot below, we have selected the next page in the Batch that contains a stamp.
  2. In the Diagnostics panel, we do not have a "Shape Locations.jpg". We can look at the Execution Log though.
  3. In the Execution Log, we can see whether or not Grooper found any shape matches. In this case, Grooper did not find any matches on the page.


  1. We can adjust the Minimum Confidence property lower, but we don't want to go too low and get false positives. Here we have adjusted it to 60%.
  2. After testing the IP Profile again, we see that lowering the Minimum Confidence percentage wasn't quite enough on its own to get us what we want.


  1. We can look at the "Stamp Similarity.jpg" in the Diagnostics panel to see what Grooper is using for a comparison of the shape we are looking for.
  2. In the bottom right hand corner of the Document Viewer, you can see what Grooper is currently using to look for a shape on the page.
  3. We can adjust the image Grooper is looking for using certain properties such as the Dilation Factor.


  1. If we set the Dilation Factor to 10, it will dilate the pixels in the image, making the lines bolder.
  2. After testing the IP Profile again, we can see how the image of our shape has changed in the "Stamp Similarity.jpg". Grooper will now use this new version.


  1. Now if we look at the "Shape Locations.jpg"...
  2. ... we see Grooper is accurately detecting the stamp that we want on this page.


Binarization
  1. In the screenshot below, we have selected the next page in the Batch with a stamp.
  2. If we look at the Execution Log in the Diagnostics panel...
  3. ... We see that Grooper is not detecting the stamp that is on this page.


  1. By clicking on the "Input Image.jpg" in the Diagnostics, we can see a preview of the page.
  2. The page we are currently looking at is a different color than the other pages we have seen so far. For Grooper to detect the stamp, we need to turn the page to a black and white image.
  3. Turn the Binarization property to Enabled and test the IP Profile again.


  1. Now if we look at the Execution Log after testing the IP Profile...
  2. ... we see that just by turning Binarization to Enabled, Grooper is now detecting the stamp.


  1. If we take a look at the "Shape Locations.jpg" in Diagnostics...
  2. ... we can see where Grooper is detecting the shape. It is now accurately detecting the stamp.


Maximum Angle
  1. Now we are going to look at the next page with a stamp in the Batch in the screenshot below. Test the IP Profile.
  2. The stamp on this page is skewed at an angle. Grooper will not be able to find the skewed shape.


  1. Click on the "Execution Log.txt".
  2. As expected, the Execution Log shows that Grooper is not detecting the stamp.
  3. We can adjust the Maximum Angle property to try and get the result we are looking for.


  1. In the screenshot below, we have set the Maximum Angle property to 10 degrees.
  2. That was enough for Grooper to now detect the angled stamp.


Applying the IP Profile

Right now, the shape is only being detected as part of the IP Profile. We need to apply the IP Profile to the Batch for Grooper to be able to recognize the shape and properly separate the documents during the Separate Batch Process Step. We do this through the Image Processing Activity.


  1. In any Batch Viewer, select all the pages in your Batch and right-click.
  2. Hover over "Activities" and then hover over "Cleanup & Recognition."
  3. Then click on "Image Processing..."


  1. When the Image Processing window pops up, click the hamburger icon next to the IP Profile property and select the IP Profile from the dropdown.
  2. Click "EXECUTE" to apply the IP Profile to the Batch.


  1. Now, if we go back to our Separate Step in our List of Separation Events, we can add the Shape Detected Event.
  2. In the Shape Name property, make sure to enter the name you gave the shape for the IP Profile. We named our shape "Stamp".
  3. After finishing configuring the Shape Detected Event properties, including the Folder Level and Delete Page, click "OK".


  1. Now on your Separate Batch Process Step, you should see that 1 Separation Event has been added.


  1. Go to the "Activity Tester" tab to test the separation.
  2. Click the play button to test.


  1. We see in the screenshot below that after testing the Separate Step, folders have been created and each document separated appropriately.

Content Type

This Separation Event uses trained examples of documents to establish the separation points between them. If you can match the first page of every document in a Batch with the first page of trained examples of Document Types in a Content Model you can start separation when you match a first page of a Document Type and stop once you see another page that matches a first page of Document Type. Furthermore, you can go ahead and classify the created folder as that Document Type that matches the page.

Any page classified as Page 1 of a Document Type in a Content Model will trigger the event.  A new Batch Folder will be created and the page will be placed inside.  Subsequent pages will be included in the folder until a new Page 1 of a Document Type is found.

Training data from both Lexical and Visual Classification Methods can be used. The Content Type event works particularly well when using Visual classification. This event can allow Visual classification and separation of documents within a single Separate step in Batch Process and even separation and classification in real time during scanning.