2023.1:Change in Value Separation (Separation Provider)

From Grooper Wiki

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252023.1

The Change in Value Separation Separation Provider creates a new folder and separates every time an extracted value changes from one contract Batch Page to another.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.


About

A Data Extractor is written to find a value on a page (such as an invoice number on invoices or a report number on a report). This is set on the Value Extractor property. When the extractor returns a result on a page, the page is placed in a new folder, creating a new document. All subsequent pages returning the same value are included in the folder. Once a page is encountered returning a different value, a new Document Folder (and thus new document) is created.

If the extractor fails to produce a result, no folder will be created. The page will remain loose in the Batch and the provider will move on to the next page to check if its value is different from the last one produced. If this is not the desired result, the Miss Disposition property can be used to Append or Merge the pages to another folder.

How To

Setting the Provider

  1. In this example we have added a Separate Step to the Batch Process.
  2. We have set the Provider to Change in Value Separation.
  3. Click the hamburger menu to ther ight of the Value Extractor property.
  4. For this example wer are going to use a Pattern Match.


  1. We have put in a Value Pattern of Report #: (\d+|[A-Z]\d{2}-\d{3}) to return the report numbers from the documents in our Batch.


  1. When we run Separation, at first glance it looks like all of the document separated appropriately.


  1. If we look closer, we see that we have several pages that were not separated into a folder and remain as loose pages.
  2. We see that on page 2 and all subsequent pages of the fifth report, the report number is missing. Since Grooper did not return anything on the page, it didn't know what to do with the document so it left it as a loose page.


The Miss Disposition Property

In the previous section we ended up with several documents that were not separated into folders and remained loose pages. This was because Grooper did not know what to do with the documents that did not return a result. In this section, we are going to look at how the Miss Disposition can solve this problem for us.

  1. We are going to go back into our Separate Step.
  2. Take a look at the Miss Disposition property located under the "ACTIVITY PROPERTIES" panel. Click on the hamburger icon to access the drop-down menu.
  3. For this example, we are going to set the Miss Disposition to Append.


  1. With the Miss Disposition property set to Append, any document that does not return a result will be appended to the previous folder. Now when we run separation, these pages will be separated appropriately.

FYI

If you set the Miss Disposition property to Merge it will work the same way as Append but there will be an additional setting called the Maximum Gap. This allows you to set the maximum number of pages it can append to the folder.

Glossary

Batch: inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

Batch Folder: The folder Batch Folder is an organizational unit within a inventory_2 Batch, allowing for a structured approach to managing and processing a collection of documents. Batch Folder nodes serve two purposes in a Batch. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other Batch Folders and/or contract Batch Page nodes as children.

  • Batch Folders are frequently referred to simply as "documents" or "folders" depending on how they are used in the Batch.

Batch Page: contract Batch Page nodes represent individual pages within a inventory_2 Batch. Batch Pages are created in one of two ways: (1) When images are scanned into a Batch using the Scan Viewer. (2) Or, when split from a PDF or TIFF file using the Split Pages activity.

  • Batch Pages are frequently referred to simply as "pages".

Batch Process Step: edit_document Batch Process Steps are specific actions within a settings Batch Process sequence. Each Batch Process Step performs an "Activity" specific to some document processing task. These Activities will either be a "Code Activity" or "Review" activities. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Process Steps are frequently referred to as simply "steps".
  • Because a single Batch Process Step executes a single Activity configuration, they are often referred to by their referenced Activity as well. For example, a "Recognize step".

Batch Process: settings Batch Process nodes are crucial components in Grooper's architecture. A Batch Process is the step-by-step processing instructions given to a inventory_2 Batch. Each step is comprised of a "Code Activity" or a Review activity. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Processes by themselves do nothing. Instead, they execute edit_document Batch Process Steps which are added as children nodes.
  • A Batch Process is often referred to as simply a "process".

Change in Value Separation: The Change in Value Separation Separation Provider creates a new folder and separates every time an extracted value changes from one contract Batch Page to another.

Data Extractor: Data Extractor (or just "extractor") refers to all Value Extractors and Extractor Nodes. Extractors define the logic used to return data from a document's text content, including general data (such as a date) and specific data (such as an agreement date on a contract).

Extract: export_notes Extract is an Activity that retrieves information from folder Batch Folder documents, as defined by Data Elements in a data_table Data Model. This is how Grooper locates unstructured data on your documents and collects it in a structured, usable format.

Pattern Match: Pattern Match is a Value Extractor that extracts values from a document that match a specified regular expression, providing data collection following a known format or pattern.

Project: package_2 Projects are the primary containers for configuration nodes within Grooper. The Project is where various processing objects such as stacks Content Models, settings Batch Processes, profile objects are stored. This makes resources easier to manage, easier to save, and simplifies how node references are made in a Grooper Repository.

Separate: insert_page_break Separate is an Activity that sorts contract Batch Pages into individual folder Batch Folders. This distinguishes "loose pages" from the documents formed by those pages. Once loose pages are separated into Batch Folder documents, they can be further processed by unknown_document Classify, export_notes Extract, output Export and other Activities that need to run on the folder (i.e. document) level.

Separation: Separation is the process of taking an unorganized inventory_2 Batch of loose contract Batch Pages and organizing them into documents represented by folder Batch Folders in Grooper. This is done so Grooper can later assign a description Document Type to each document folder in a process known as "classification".

Separation Provider: The Provider property of the Separate Activity defines the type of separation to be performed at the designated Scope.