2023.1:Pattern-Based Separation (Separation Provider): Difference between revisions

From Grooper Wiki
draft // via Wikitext Extension for VSCode
draft // via Wikitext Extension for VSCode
Line 25: Line 25:


== How To ==
== How To ==
'''''Pattern-Based Separation''''' is achieved through setting up an extractor that will return results ONLY on the pages where you wish separation to occur. One of the simplest ways to do this is to use a document title for the extractor. In the simple example below, we will walk you through how to set up the '''''Pattern-Based Separation Provider''''' using a ''List Match''.


=== Simple Example ===
=== Simple Example ===
Line 60: Line 62:


=== Practical Example ===
=== Practical Example ===
In this next example, we are going to walk through the steps of setting up the '''''Pattern-Based Separation Provider''''' again, but using a more practical real-world example. We will also find that when we test separation, we run into a slight issue.


# For this tutorial we have started by following the same steps as in the previous simple example. For our '''''List Match''''', we have enetered in titles that can be found on the first page of each document.
# For this tutorial we have started by following the same steps as in the previous simple example. For our '''''List Match''''', we have enetered in titles that can be found on the first page of each document.
Line 93: Line 97:


=== Exclusion Extractor ===
=== Exclusion Extractor ===
The way the '''''Exclusion Extractor''''' works is simple, but can be difficult conceptually to understand at first.
In our practical example our '''''Value Extractor''''' for our '''''Pattern-Based Separation Provider''''' returned a result on two separate pages of the same document. In the tutorial below, we are going to set an '''''Exclusion Extractor''''' to return a value from the second page. If Grooper returns a value from the '''''Value Extractor''''' on a page and also returns a value from the '''''Exclusion Extractor''''' on the same page, it will ''Exclude'' or ignore that page entirely and will not run separation.


#Back in our Provider window we have access to an '''''Exclusion Extractor''''' property. We're going to set this to a ''Pattern Match'' and use this to tell Grooper to exclude any pages from separation that return something for both the '''''Value Extractor''''' and the '''''Exclusion Extractor'''''.  
#Back in our Provider window we have access to an '''''Exclusion Extractor''''' property. We're going to set this to a ''Pattern Match'' and use this to tell Grooper to exclude any pages from separation that return something for both the '''''Value Extractor''''' and the '''''Exclusion Extractor'''''.  

Revision as of 09:38, 20 March 2024

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.

Pattern-Based Separation is a Separation Provider that creates a new document folder every time a value returned by a defined pattern is encountered on a page.

About

The Pattern-Based Separation Provider separates documents based on whether or not a defined pattern returns a value from a page in your Batch.

A Data Extractor is used to find a value on a page. When the extractor returns a result on a page, the page is placed in a new folder, creating a new document. If the extractor does not return a result on the following page, that page is included behind the previous page in the newly created folder. Once the extractor does produce a result on a subsequent page (even if it is the same result as the previous page) it will be placed in a new folder, creating a new document.



How To

Pattern-Based Separation is achieved through setting up an extractor that will return results ONLY on the pages where you wish separation to occur. One of the simplest ways to do this is to use a document title for the extractor. In the simple example below, we will walk you through how to set up the Pattern-Based Separation Provider using a List Match.

Simple Example

  1. Add a Separate Batch Process Step to your Batch Process.
  2. Set the Provider property to Pattern-Based Separation.


  1. When the "Provider" window pops up, click the hamburger icon next to the Value Extractor property to access the drop-down menu and select a value extractor.
  2. For this tutorial we are going to use a List Match, but you can use any value extractor you wish.


  1. Once your extractor is selected, click the ellipsis button to the right of the property.


  1. When the "Value Extractor" window pops up, configure your extractor. Here we have entered in the titles located on the first page of each document.
  2. Click "OK" in the top right of the window when you are finished configuring your extractor.


  1. Click "OK" on the "Provider" window to save your changes.


  1. Click over to the "Activity Tester" tab to test separation.
  2. Select the Batch Folder in the Batch Viewer.
  3. Click the play button in the top right corner of the Batch Viewer to test.


  1. In the screenshot below, you can see that Grooper has created folders and separated the documents appropriately.

Practical Example

In this next example, we are going to walk through the steps of setting up the Pattern-Based Separation Provider again, but using a more practical real-world example. We will also find that when we test separation, we run into a slight issue.

  1. For this tutorial we have started by following the same steps as in the previous simple example. For our List Match, we have enetered in titles that can be found on the first page of each document.


  1. With the Separation Provider set, click on the "Activity Tester" tab to test separation.
  2. Select the Batch Folder in the Batch Viewer containing the pages you want to separate.
  3. The click the play button in the top right corner of the Batch Viewer to test separation.


  1. At first glance, it may look like Grooper did a good job of separating the Batch.


  1. Upon closer inspecion, we see tha the second page of the W-4 document was incorrectly separated out.
  2. This is because the title of the docuent also appears on the second page of the W-4.


  1. Go ahead and undo separation by selecting all of the folders in the Batch in the level that you wan to remove and right-click.
  2. Hover over "Foldering".
  3. Click on "Remove Level".


  1. Click "EXECUTE" to apply changes.

In the next section we will discuss how to fix the issue with our W-4 document using an Exclusion Extractor.


Exclusion Extractor

The way the Exclusion Extractor works is simple, but can be difficult conceptually to understand at first.

In our practical example our Value Extractor for our Pattern-Based Separation Provider returned a result on two separate pages of the same document. In the tutorial below, we are going to set an Exclusion Extractor to return a value from the second page. If Grooper returns a value from the Value Extractor on a page and also returns a value from the Exclusion Extractor on the same page, it will Exclude or ignore that page entirely and will not run separation.

  1. Back in our Provider window we have access to an Exclusion Extractor property. We're going to set this to a Pattern Match and use this to tell Grooper to exclude any pages from separation that return something for both the Value Extractor and the Exclusion Extractor.


  1. After clicking the ellipsis button for the Patter Match and the "Exclusion Extractor" window pops up, select the second page of the W-4 document from the TEST BATCH window.
  2. In the top right corner of the Document Viewer, click the drop down to change to the Text View. You should then be able to see what text was recognized by Grooper on the page.
  3. At the top of the document we can see the title "Form W-4 (2015)" followed immediately by "Page 2". We can use this to help improve our separation.


  1. In the screenshot below we have added a Value Pattern to return the title and "Page 2" from the document using the following pattern: Form W-4 \(2015\) Page 2.
  2. Click "OK" to apply the changes to the Exclusion Extractor.


  1. Click "OK" to save changes.


  1. Now we can test our separation again by going to the "Activity Tester" tab again.
  2. Highlight the Batch Folder containing the pages you wish to separate.
  3. Click on the play button in the top right corner of the TEST BATCH window to test separation.


  1. Now the W-4 is separating appropriately, with both pages in a single folder.