2023.1:Pattern-Based Separation (Separation Provider): Difference between revisions
draft // via Wikitext Extension for VSCode |
draft // via Wikitext Extension for VSCode |
||
| Line 58: | Line 58: | ||
# In the screenshot below, you can see that Grooper has created folders and separated the documents appropriately. | # In the screenshot below, you can see that Grooper has created folders and separated the documents appropriately. | ||
[[File:2023.1 Pattern-Based-Separation 02 How-To 01 Basic-Example 07.png]] | [[File:2023.1 Pattern-Based-Separation 02 How-To 01 Basic-Example 07.png]] | ||
=== Practical Example === | |||
# For this tutorial we have started by following the same steps as in the previous simple example. For our '''''List Match''''', we have enetered in titles that can be found on the first page of each document. | |||
[[File:2023.1 Pattern-Based-Separation 02 How-To 02 Practical-Example 01.png]] | |||
# With the '''''Separation Provider''''' set, click on the "Activity Tester" tab to test separation. | |||
# Select the '''Batch Folder''' in the Batch Viewer containing the pages you want to separate. | |||
# The click the play button in the top right corner of the Batch Viewer to test separation. | |||
[[File:2023.1 Pattern-Based-Separation 02 How-To 02 Practical-Example 02.png]] | |||
# At first glance, it may look like Grooper did a good job of separating the '''Batch'''. | |||
[[File:2023.1 Pattern-Based-Separation 02 How-To 02 Practical-Example 03.png]] | |||
# Upon closer inspecion, we see tha the second page of the W-4 document was incorrectly separated out. | |||
# This is because the title of the docuent also appears on the second page of the W-4. | |||
[[File:2023.1 Pattern-Based-Separation 02 How-To 02 Practical-Example 04.png]] | |||
# Go ahead and undo separation by selecting all of the folders in the '''Batch''' in the level that you wan to remove and right-click. | |||
# Hover over "Foldering". | |||
# Click on "Remove Level". | |||
[[File:2023.1 Pattern-Based-Separation 02 How-To 02 Practical-Example 05.png]] | |||
# Click "EXECUTE" to apply changes. | |||
In the next section we will discuss how to fix the issue with our W-4 document using an '''''Exclusion Extractor'''''. | |||
[[File:2023.1 Pattern-Based-Separation 02 How-To 02 Practical-Example 06.png]] | |||
=== Exclusion Extractor === | |||
#Back in our Provider window we have access to an '''''Exclusion Extractor''''' property. We're going to set this to a ''Pattern Match'' and use this to tell Grooper to exclude any pages from separation that return something for both the '''''Value Extractor''''' and the '''''Exclusion Extractor'''''. | |||
[[File:2023.1 Pattern-Based-Separation 02 How-To 03 Exclusion Extractor 01.png]] | |||
# After clicking the ellipsis button for the ''Patter Match'' and the "Exclusion Extractor" window pops up, select the second page of the W-4 document from the TEST BATCH window. | |||
# In the top right corner of the Document Viewer, click the drop down to change to the Text View. You should then be able to see what text was recognized by Grooper on the page. | |||
# At the top of the document we can see the title "Form W-4 (2015)" followed immediately by "Page 2". We can use this to help improve our separation. | |||
[[File:2023.1 Pattern-Based-Separation 02 How-To 03 Exclusion Extractor 02.png]] | |||
# In the screenshot below we have added a Value Pattern to return the title and "Page 2" from the document using the following pattern: <code>Form W-4 \(2015\) Page 2</code>. | |||
# Click "OK" to apply the changes to the '''''Exclusion Extractor'''''. | |||
[[File:2023.1 Pattern-Based-Separation 02 How-To 03 Exclusion Extractor 03.png]] | |||
# Click "OK" to save changes. | |||
[[File:2023.1 Pattern-Based-Separation 02 How-To 03 Exclusion Extractor 04.png]] | |||
# Now we can test our separation again by going to the "Activity Tester" tab again. | |||
# Highlight the Batch Folder containing the pages you wish to separate. | |||
# Click on the play button in the top right corner of the TEST BATCH window to test separation. | |||
[[File:2023.1 Pattern-Based-Separation 02 How-To 03 Exclusion Extractor 05.png]] | |||
# Now the W-4 is separating appropriately, with both pages in a single folder. | |||
[[File:2023.1 Pattern-Based-Separation 02 How-To 03 Exclusion Extractor 06.png]] | |||
Revision as of 15:58, 19 March 2024
|
WIP |
This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly. This tag will be removed upon draft completion. |
Pattern-Based Separation is a Separation Provider that creates a new document folder every time a value returned by a defined pattern is encountered on a page.
About
The Pattern-Based Separation Provider separates documents based on whether or not a defined pattern returns a value from a page in your Batch.
A Data Extractor is used to find a value on a page. When the extractor returns a result on a page, the page is placed in a new folder, creating a new document. If the extractor does not return a result on the following page, that page is included behind the previous page in the newly created folder. Once the extractor does produce a result on a subsequent page (even if it is the same result as the previous page) it will be placed in a new folder, creating a new document.
How To
Simple Example
- Add a Separate Batch Process Step to your Batch Process.
- Set the Provider property to Pattern-Based Separation.
- When the "Provider" window pops up, click the hamburger icon next to the Value Extractor property to access the drop-down menu and select a value extractor.
- For this tutorial we are going to use a List Match, but you can use any value extractor you wish.
- Once your extractor is selected, click the ellipsis button to the right of the property.
- When the "Value Extractor" window pops up, configure your extractor. Here we have entered in the titles located on the first page of each document.
- Click "OK" in the top right of the window when you are finished configuring your extractor.
- Click "OK" on the "Provider" window to save your changes.
- Click over to the "Activity Tester" tab to test separation.
- Select the Batch Folder in the Batch Viewer.
- Click the play button in the top right corner of the Batch Viewer to test.
- In the screenshot below, you can see that Grooper has created folders and separated the documents appropriately.
Practical Example
- For this tutorial we have started by following the same steps as in the previous simple example. For our List Match, we have enetered in titles that can be found on the first page of each document.
- With the Separation Provider set, click on the "Activity Tester" tab to test separation.
- Select the Batch Folder in the Batch Viewer containing the pages you want to separate.
- The click the play button in the top right corner of the Batch Viewer to test separation.
- At first glance, it may look like Grooper did a good job of separating the Batch.
- Upon closer inspecion, we see tha the second page of the W-4 document was incorrectly separated out.
- This is because the title of the docuent also appears on the second page of the W-4.
- Go ahead and undo separation by selecting all of the folders in the Batch in the level that you wan to remove and right-click.
- Hover over "Foldering".
- Click on "Remove Level".
- Click "EXECUTE" to apply changes.
In the next section we will discuss how to fix the issue with our W-4 document using an Exclusion Extractor.
Exclusion Extractor
- Back in our Provider window we have access to an Exclusion Extractor property. We're going to set this to a Pattern Match and use this to tell Grooper to exclude any pages from separation that return something for both the Value Extractor and the Exclusion Extractor.
- After clicking the ellipsis button for the Patter Match and the "Exclusion Extractor" window pops up, select the second page of the W-4 document from the TEST BATCH window.
- In the top right corner of the Document Viewer, click the drop down to change to the Text View. You should then be able to see what text was recognized by Grooper on the page.
- At the top of the document we can see the title "Form W-4 (2015)" followed immediately by "Page 2". We can use this to help improve our separation.
- In the screenshot below we have added a Value Pattern to return the title and "Page 2" from the document using the following pattern:
Form W-4 \(2015\) Page 2. - Click "OK" to apply the changes to the Exclusion Extractor.
- Click "OK" to save changes.
- Now we can test our separation again by going to the "Activity Tester" tab again.
- Highlight the Batch Folder containing the pages you wish to separate.
- Click on the play button in the top right corner of the TEST BATCH window to test separation.
- Now the W-4 is separating appropriately, with both pages in a single folder.



















