Batch Process (Object)

From Grooper Wiki
(Redirected from Batch Process)

This article was migrated from an older version and has not been updated for the current version of Grooper.

This tag will be removed upon article review and update.

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 2023.1

settings Batch Process objects are crucial components in Grooper's architecture. A Batch Process orchestrates the document processing strategy and ensures each inventory_2 Batch of documents is managed systematically and efficiently.

  • Batch Processes by themselves do nothing. Instead, the workflows they execute are designed by adding child edit_document Batch Process Steps.
  • A Batch Process is often referred to as simply a "process".

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). There is a Batch with the example document(s) discussed in this tutorial, as well as a Project configured according to its instructions.

About

Batch Processes are highly configurable and reusable any time a new Batch is created. They are comprised of child Batch Process Steps which reference different Grooper Activities to perform document processing tasks.

A Batch Process defines a repeatable sequence of steps which achieve a specific information processing objective in Grooper.

Grooper’s goal is to automate the process of Acquiring, Conditioning, Organizing, Collecting and Delivering data from documents. There are many objects in Grooper that facilitate these "Five Phases" of document data acquisition, but it is the Batch Process that is responsible for determining the flow of documents and accomplishing automation within a Grooper system. A Batch Process acts as the assembly line that takes raw documents and converts them into deliverable data.

A Batch Process object has little meaningful configuration on itself and subsequently does nothing on its own. Instead, it acts as a container for one or more Batch Process Steps, which are configured to execute specific Activities. Activities may represent automated system tasks, or human-attended tasks which require operator interaction. Collectively, these steps represent a workflow process through which Batches of a particular class will travel. While most of the configuration items on Batch Process Steps are specific to their function, all Batch Process Steps may have either a Processing Queue or a Review Queue assigned, so that Grooper knows by which cores or by which individuals that step will be processed or reviewed.

Upon completion of this configuration, you publish the Batch Process which places a read-only copy of the Batch Process into the "Processes" branch of the Grooper "Node Tree". Doing so exposes it to be assigned to Batches created in "Production". This, in turn, allows the tasks of the Batch Process Steps to be submitted to their configured queues, either Processing Queue or Review Queue. Batch Processes automatically submit tasks for their active step, which get picked up by either an Activity Processing Service or a human reviewer, depending on task type. If a Batch Process does not have any attended activities, it generally processes each step sequentially in a completely unattended fashion.

Once a Batch Process has been published, all new Batches will use the current published version. When changes are made to a Batch Process and a new version is published, the changes will apply to new Batches created using that Batch Process but will not impact existing Batches already in progress. One can, however, manually apply the latest published version of a Batch Process to an existing Batch by pausing and updating said Batch.

It is also possible to un-publish a published Batch Process, thus making it unavailable to newly created production Batches.

Batch Process Properties

There are a few properties that can be configured on a Batch. These properties are rarely configured, given their limited use, but are worth understanding, however.

Content Type

This property is a drop-down list of all available "Content Types" found within the parent Project of which one can be chosen. As a result, the root Batch Folder of the Batch will be classified as that type. Consequently, the "Data Elements" of that "Content Type" will be displayed in Review.

Review Queue

This property is a drop-down menu of all available Review Queues of which one can be chosen. This will associate any Review tasks of a Batch that uses this Batch Process to the respective Review Queue. If no Review Queue is selected, a Batch associated with this Batch Process will belong to no queue and be visible to anyone on the "Batch" page.

Priority

This property is an int32 value that expresses an inversely proportional relationship to the priority given to tasks submitted by this Batch Process.

For example, setting this property to 1 would give tasks submitted for a Batch using this Batch Process a higher priority than the default of 3. To understand what this means in practice, let's take the example further. Assume the following:

  • Server where Grooper is installed has a single Activity Processing Service
    • This Activity Processing Service is set to use 10 CPU threads
  • "Batch A" is currently in production using "Batch Process A" which is set to a Priority of 3
    • "Batch A" has 20 tasks related to the Recognize Activity
    • 10 of the 20 Recognize tasks are being worked by the 10 available CPU threads
  • As "Batch A" is being processed, "Batch B", which uses "Batch Process B" with a Priority of 1, submits 10 Split Pages tasks

"Batch A's" 10 Recognize tasks will complete. However, because "Batch B" submitted tasks at a higher priority than "Batch A", its 10 Split Pages tasks will get picked up and completed before the 10 remaining Recognize tasks associated with "Batch A" get processed.

Batch Process Steps

Batch Process Steps are objects within a Batch Process that are assigned an Activity which is a type of processing to be executed on all or a portion of a Batch. Activities generally fall into one of two categories of activity types:

  • Attended
  • Unattended

Attended activities, such as Review, are completed by human reviewers and are assigned via Review Queues. Unattended activities, such as Image Processing, Recognize, and Classify, are completed by Activity Processing Services and are assigned via Processing Queues. Batch Process Steps also allow for unit testing of their configured Activity on the Design page via their respective “Activity Tester” tab.

As a Batch Process progresses through its individual steps, each step creates a series of tasks that are submitted to either the designated Processing Queue or Review Queue. Once the Activity Processing Services or human reviewers pick up and complete all tasks from their respective queues for an individual step, the Batch Process moves to the next step. Batch Processes will not submit tasks for a step before all tasks from the previous step have been completed.

Batch Process Steps are added to Batch Processes in a top-to-bottom linear fashion, although they may be re-ordered at any time. As such, the execution of these steps is also done in a linear fashion, from top to bottom. The only exception to this is if a Batch Process Step is configured with a "Should Submit" expression. These are simple expressions written on Batch Process Steps that determine whether the Activity of the step should be executed, and upon completion what the next step in the Batch Process should be. "Should Submit Expressions" can send the contents of the Batch being processed to any Batch Process Step in any (published) Batch Process, including entirely different Batch Processes, and are powerful tools for configuring workflows for exception queues, requesting additional review, or several other functions.

Batch Process Step Properties

Activity

This property is a drop-down menu of all available Activities in Grooper, and is the main property to consider on a Batch Process Step. Once the Activity property is set, the properties related to the set Activity are exposed on the Batch Process Step. Every Activity has its own property configurations to consider, therefore please refer to the individual articles related to each activity type for more information on their setup.

Scope

This property is a drop-down menu of processing scopes available on an Activity of a Batch Process Step. It is an important topic to understand and more than should be covered here. As such, please visit the article on the topic of Scope to get a full understanding of the issue.

Queue Name

For attended activities, this property is a drop-down menu which can be set to point at a specific Review Queue. This property will override the designated Review Queue of its parent Batch Process.

For unattended activities, this property is a drop-down menu which can be set to point at a specific Processing Queue. Please visit the Processing Queue article for more information on why this property might get used.

Activate Mode

This property is a drop-down menu of several modes of which one can be chosen. The submission of activity tasks related to a Batch Process Step is done when a Batch Process "exits" one step and "enters" another. This property controls how the activity tasks of a Batch Process Step will be submitted as the Batch Process "enters" into a step. The different modes are:

  • Normal - Tasks will be submitted for items which have not already been processed.
  • Retry - Tasks will be submitted for items which previously failed, or have not yet been processed.
  • Always - Tasks will be submitted for all items, overwriting any previous task information.
  • Manual - The batch will be paused when this step is reached, requiring the user to manually start the process step.
Should Submit Expression

The Should Submit Expression leverages a VB.Net expression to determine whether tasks will be submitted for individual folders or pages within the Batch. The expression is executed for each folder or page in scope, and should return a True or False value indicating whether the item should be processed. If False is returned for every item in the batch, this will effectively skip the entire step.

For more information and example expressions visit the Code Expressions, or Expressions Cookbook articles.

Next Step Expression

The Next Step Expression determines which step, if any, will occur next in the Batch Process. Normally, steps occur in a top-to-bottom linear fashion.

For more information and example expressions visit the Code Expressions, or Expressions Cookbook articles.

Activity Processing Options

While these are properties of Activities their function is specifically related to batch processing.

Error Disposition

This property consists of several issue dispositions. These options determine what happens when an error occurs during the processing of individual tasks of unattended activities. The property can be set to a combination of any of the following:

  • None - The issue will be ignored, and the task will complete successfully.
  • Flag - The object of the scope of the task, either Batch Folder or Batch Page, will be flagged.
  • Log - The issue will be logged to the Grooper log. The log can be viewed from the Grooper Root node under the "Batch Event Viewer" tab.
  • Stop - The Batch will stop processing, be set to an error state, and all pending tasks will be deleted.

The default, and most common, settings are to Flag and Log the error, but otherwise allow further tasks to be processed. However, the Stop option is very useful as it can prevent cascading processing issues for a Batch further in its process. The main objective is to prevent "bad data" from ending up in whatever backend system you use to store data. When a Batch in production is stopped it allows someone to review and resolve any issues. Once assessed, the Batch can be updated and resumed, thus re-submitting its tasks for processing and completion.

Maximum Consecutive Errors

This property is an int32 value defining the number of consecutive errors to be allowed, after which a "critical stop" will be raised. This critical stop will cause services to stop running.

The main drawback with the Stop option of Error Disposition is that if even one error on a task is encountered, the entire Batch is stopped. Maximum Consecutive Errors allows for some errors to occur, but after a designated amount, instead of stopping the Batch in process and removing its remaining tasks, it stops the Activity Processing Service doing the processing.

An example of this being useful would be for the Export activity. You may know on occasion one or two errors might occur during export, but if say ... 10 in a row happen, something is wrong. To prevent wasting processing power and perhaps causing futher issues, the Activity Processing Service would be shut down and all processing would cease until this issue is resolved.

Concurrency Mode

This property is not configurable. It merely reports the type of "concurrency" an activity is capable of. As of now, the only activity that isn't Multiple is Render.

How To

The creation, testing, publishing, and updating of a Batch Process and its child Batch Process Steps is a straightforward process.

Add and Configure a Batch Process

Add a Batch Process

Batch Processes are created as child objects within a Project. To create a Batch Process...

  1. Right-click on a Project or a Folder within the Project
  2. In the pop-out menu choose "Add > Batch Process"
  3. In the "Add" dialog box name the Batch Process whatever you like and click the "Execute" button


Configure the Batch Process

Consider the following UI elements:

  1. Set the Content Type property to any "Content Type" if you wish to have the Batch set to that specific type and have its related "Data Elements" displayed in Review. This property is rarely configured as classifying a Batch isn't common.
  2. Set the Review Queue property to any Review Que to have the entire Batch related to a specific Review Queue. If no Review Queue is selected, a Batch associated with this Batch Process will belong to no queue and be visible to anyone on the "Batch" page.
  3. Use the "Save" and "Cancel Changes" buttons to perform their respective functions.
  4. Next to the "Save" and "Cancel Changes" buttons are the buttons that allow you to (in order from left to right) "Validate", "Publish", and "Un-Publish" the Batch Process.
    • The "Validate" button will scan the configuration of all child Batch Process Steps and inform if any configurable properties are in an error state.
    • The "Publish" button will create a read-only copy of the Batch Process (and all it's child objects) in the "Processes" folder of the "Node Tree". If the Batch Process has previously been published, this button will perform the publish again by overwriting the copy.
    • The "Un-Publish" button will remove the read-only copy of the Batch Process from the "Processes" folder of the "Node Tree".
  5. Use the "Scripting" tab if you are an advanced user familiar with .NET scripting that wants to expand what a Batch Process can do.

Add, Configure, and Test Batch Process Steps

Add a Batch Process Step (Split Pages)

Batch Process Steps are created as child objects within a Batch Process. To create a Batch Process Step...

  1. Right-click on a Batch Process
  2. In the pop-out menu choose "Add Activity > Transform > Split Pages"
    • The "Add Activity" command is different than the "Add" command. "Add" will create a Batch Process Step, which you can name, and the Activity property will be unconfigured. "Add Activity", however, will create a Batch Process Step and, depending on the choice made within the sub-menu, fill in the Step Name property in the "Add Activity" dialog box. It will also set the Activity property on the newly created Batch Process Step to the chosen activity.
    • For this example the Split Pages Activity is being used, but this process can be used for any Activity.
  3. In the "Add Activity" dialog box name the Batch Process Step whatever you like, or leave the name given for the respective Activity, and click the "Execute" button.


Configure the Batch Process Step (Split Pages)

  1. All of the "General" properties' defaults in this example will suffice.
    • Because the "Add Activity" command was used, the Activity property will be set to the chosen command. For this example, Split Pages was chosen.
    • Because Activity is set to Split Pages, the Scope property will automatically be set to Folder (the only option for this particular Activity).
    • The default for the Folder Level property is 1. The document in the supplied example Batch is at "Level 1" of the Batch, so the default in this case is fine.
    • No queues will be used in the example provided.
    • The Activate Mode of Normal is used in most cases, and this example is no exception.
  2. No Should Submit Expression or Next Step Expression will be used for this example.
  3. "Activity Properties" determine how the configured Activity will operate. Please refer to articles about specific Activities for more information on their configuration.
  4. Click the "Activity Tester" tab to test how this Batch Process Step will operate against a test Batch.


Test the Batch Process Step (Split Pages)

  1. On the "Activity Tester" tab...
  2. ...click the "Browse" button and in the dialog box that opens, select the Batch supplied with this article.
  3. Click the "OK" button to close the dialog box.
  4. By default the Batch will be selected. This step is configured to a Scope of Folder and a Folder Level of 1, therefore the "Test" button will be grayed out.
  5. The "Process All" button can be used to submit a job to be processed by a Grooper Activity Processing Service. In doing so a series of tasks will be created for each object in the Scope of the Activity of the Batch Process Step. As a result, an Activity Processing Service must be installed for this repository and be running. The Activity Processing Service will pick up and process a number of tasks equal to the amount of processing threads available to it. This is useful when you want to test in a multi-threaded fashion, like if you are running the Recognize Activity on multiple Batch Page objects.
    • For the purposes of the example being used here this will not be done. Instead, steps will be taken to use the "Test" button instead.


  1. Selecting the document at the scope that matches the configuration of the Batch Process Step, in this case Folder Level 1...
  2. ...will un-gray the "Test" button and allow it to be clicked.


  1. The test produced the desired result for this Batch Process Step configured for Split Pages: a Batch Page has been split out from the document.

More Steps

Following will be the addition, configuration, and testing of more Batch Process Steps to flesh out the Batch Process. While these additional steps would give the Batch Process more meaning, the process of adding them won't necessarily enhance your knowledge of the base process. Feel free to skip this portion if you want.

Add a Batch Process Step (Recognize)

  1. Right-click on the Batch Process
  2. In the pop-out menu choose "Add > Cleanup & Recognition > Recognize"
  3. In the "Add Activity" dialog box name the Batch Process Step whatever you like, or leave the name given for the respective Activity, and click the "Execute" button

Configure the Batch Process Step (Recognize)

  1. Using the "Add Activity" command has set the Activity property to the desired setting
  2. The default setting for the Scope property is Page when the Activity is Recognize, which is correct for our Batch
  3. The default settings of the "Activity Properties" will be kept for this step
  4. Click the "Activity Tester" tab to test the configuration of this step

Test the Batch Process Step (Recognize)

  1. Select the appropriate scope in the "Batch Viewer"
  2. Click the "Test" button to test the activity


  1. Using the "Renditions" button you can see a new text file for the recognized text
  2. In the "Document Viewer" you can see the recognized text

Add a Batch Process Step (Classify)

  1. Right-click on the Batch Process
  2. In the pop-out menu choose "Add > Document Processing > Classify"
  3. In the "Add Activity" dialog box name the Batch Process Step whatever you like, or leave the name given for the respective Activity, and click the "Execute" button

Configure the Batch Process Step (Classify)

  1. Using the "Add Activity" command has set the Activity property to the desired setting
  2. The default setting for the Scope property is Folder when the Activity is Classify, which is correct for our Batch
  3. The default setting of 1 for the Folder Level property is correct for this test Batch
  4. Click the drop-down for the Content Model Scope and choose the Content Model provided by this article
  5. Click the "Activity Tester" tab to test the configuration of this step

Test the Batch Process Step (Classify)

  1. Select the appropriate scope in the "Batch Viewer"
  2. Click the "Test" button to test the activity


  1. The name of the document has changed to reflect the classificaiton

Add a Batch Process Step (Extract)

  1. Right-click on the Batch Process
  2. In the pop-out menu choose "Add > Document Processing > Extract"
  3. In the "Add Activity" dialog box name the Batch Process Step whatever you like, or leave the name given for the respective Activity, and click the "Execute" button

Configure the Batch Process Step (Extract)

  1. Using the "Add Activity" command has set the Activity property to the desired setting
  2. The default setting for the Scope property is Folder when the Activity is Extract, which is correct for our Batch
  3. The default setting of 1 for the Folder Level property is correct for this test Batch
  4. The default settings of the "Activity Properties" will be kept for this step
  5. Click the "Activity Tester" tab to test the configuration of this step

Test the Batch Process Step (Extract)

  1. Select the appropriate scope in the "Batch Viewer"
  2. Click the "Test" button to test the activity


  1. With successful extraction the "Diagnostics" button will highlight. You can click it and view the results of extraction.

Add a Batch Process Step (Export)

  1. Right-click on the Batch Process
  2. In the pop-out menu choose "Add > Document Processing > Export"
  3. In the "Add Activity" dialog box name the Batch Process Step whatever you like, or leave the name given for the respective Activity, and click the "Execute" button

Configure the Batch Process Step (Export)

  1. Using the "Add Activity" command has set the Activity property to the desired setting
  2. The default setting for the Scope property is Folder when the Activity is Export, which is correct for our Batch
  3. The default setting of 1 for the Folder Level property is correct for this test Batch
  4. The default settings of the "Activity Properties" will be kept for this step
  5. Click the "Activity Tester" tab to test the configuration of this step

Test the Batch Process Step (Export)

  1. Select the appropriate scope in the "Batch Viewer"
  2. Click the "Test" button to test the activity


  1. There is an Export Behavior on the Content Model sending a .JSON of the extracted data to C:\ of the Grooper server

Validate, Publish, and Un-Publish a Batch Process

Validate the Batch Process Validating a Batch Process allows for a quick check to see if properties on all Batch Process Steps are not in an error state.

  1. Select the Batch Process
  2. Click the "Validate" button
    • If all properties on all Batch Process Steps are configured properly and not in an error state, the "Validate Branch" dialog box will show and state that no errors were found.
    • If any properties on any child Batch Process Steps are in an error state, the "Validate Branch" dialog box will show and list all properties that are in an error state.

Publish the Batch Process

  1. Select the Batch Process
  2. Click the "Publish" button
  3. Click the "Execute" button in the "Publish" dialog box
  4. This will put a read-only copy of the Batch Process in the "Processes" folder.

Un-Publish the Batch Process

  1. Select the Batch Process
  2. Click the "Unpublish" button
  3. In the "Unpublish" dialog box click the "Execute" button


  1. The "Unpublish" button will gray out.
  2. The read-only copy of the Batch Process will be removed from the "Processes" folder.

Update the Batch Process on a Production Batch

Sometimes it may be necessary to make changes to a Batch that is currently in production. A Batch in production is using a read-only copy of a read-only copy of the original Batch Process. As a result, changes will need to be made to the original Batch Process and then it will need to be re-published. At that point the Batch in production can be paused and updated.


  1. In this example, the change that was made to the Batch Process was that a Dispose Batch step was added to the Batch Process.
  2. Making this change does not affect the published read-only copy in the "Processes" folder.
  3. It also does not affect the read-only copy of the published Batch Process that the Batch in production is using.


  1. The Batch Process has been re-published
  2. This has put a new read-only copy of the Batch Process in the "Processes" folder of the node Tree
  3. However, the Batch in production is still using an old copy of the original Batch Process


  1. In the "Production" area of "Batches"...
  2. ...select the Batch in production
  3. Click the "Pause" button
  4. Click "Apply" in the "Pause" dialog box


  • Click the "Update" button
    1. Click the drop-down for the Target Step property in the "Update Process" dialog box
    2. Select the newly added step in the drop-down menu
    3. Click the "Apply" button

    1. The copy of the Batch Process for the Batch in production is now updated
    2. Click the "Resume" button to continue processing the Batch