2024:Activity Processing (Service)

From Grooper Wiki
Revision as of 14:46, 15 November 2024 by Randallkinard (talk | contribs) (// via Wikitext Extension for VSCode)

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252024202320222.90
Activity Processing Services process unattended activities.

Activity Processing is a Grooper Service that executes Activities assigned to edit_document Batch Process Steps in a settings Batch Process. This allows Grooper to automate Batch Steps that do not require a human operator.

When configuring an Activity Processing service, a Processing Queue may or may not be specified.

  • When unspecified, the service will assign work using the "default" Processing Queue. The Activity Processing service will pick up tasks for any Batch Process Steps that do not have an assigned Processing Queue.
  • When specified, only the Activity Processing service will only process tasks for Batch Process steps with that Processing Queue assigned.
    • Be aware, while an Activity Processing service can only be configured to point to a single Processing Queue, multiple Activity Processing services may be added to the Grooper service list (each referencing their own Processing Queues).

The Grooper Service user account must have the following permissions:

File store access

  • Type: NTFS\Share
  • Reason: Read and write access to the Grooper file store location

Database access

  • Type: SQL
  • Reason: Read and write access to the Grooper database

Logon As Service

  • Type: Local Security Policy
  • Reason: Run services installed via Grooper Command Console

About

Unattended Activities in a Batch Process can be automated using an Activity Processing Grooper service. The Activity Processing service will act like a Windows service and automatically start tasks in a Batch, as processing threads in your system's resources become available. This is one of the ways Grooper leverages your system resources for parallel processing.

Imagine you're running Grooper on a machine with eight (8) processing threads. If you have a Batch with five (5) Batch Folders, and each one is on the Recognize step of the Batch Process, there's no need for your system to process each Batch Folder sequentially (with each Batch Folder waiting to be processed until the one before it is finished).

  • You have 8 threads and 5 Batch Folders in this scenario.
  • Each one of those threads can process one Batch Folder as a single task.
  • With 8 available threads, all 5 Batch Folders could be processed concurrently by 5 individual threads.
  • This is multi-threaded Activity processing.

You could then set up an Activity Processing service to process all Code Activity steps in a Batch Process with the maximum allowable processing threads available.

How To

Install an Activity Processing service without a Processing Queue

Activity Processing services allow you to automate Code Activity tasks. If you do not assign a Processing Queue to the Activity Processing service, it will pick up tasks in the "default queue", automatically processing any tasks that aren't in a specific Processing Queue. Creating an Activity Processing service without a Processing Queue is generally the easiest way to get started automating steps in a Batch Process.


In this scenario, we will install an Activity Processing service with no associated Processing Queue and give it a certain number of threads to use.

  • This service will pick up and process any Code Activity tasks for Batch Process Steps with no Processing Queue assigned.

Be aware of the "n minus one" rule!

Services are assigned a number of CPU threads when you install them. For some services, like Import Watcher, the will always run using a single thread. Activity Processing services can run using multiple threads.

Keep in mind, your machine only has a certain number of processing threads available. You will run into errors if you over-allocate your available threads.

Remember too, the operating system itself must always have a single thread available to run. So, the absolute maximum number of threads you can assign to all your services should not go beyond the total number of threads available minus one reserved for the operating system. Hence, the "n minus one" rule.

The "n minus one" rule is as follows:

  • If "n" is the maximum number of threads available on your machine, the maximum number of threads you can distribute to Grooper services is "n" minus one.

Be aware of the "n minus x" rule!

Other programs running in the background will need threads to run as well.

  • If SQL is installed on the same machine as your Grooper services, you should follow an "n minus two" rule, reserving one for the OS and one for SQL.
  • If IIS and SQL are installed on the same machine as your Grooper services, you should follow an "n minus three" rule, reserving one for the OS, one for SQL, and one for IIS.
  • If other applications, such as anti-virus software, are running in the background you will need to reserve threads for those applications as well.

THE LONG STORY SHORT HERE IS DO NOT OVER-ALLOCATE YOUR AVAILABLE THREADS! GROOPER CAN BEHAVE ERRATICALLY IF YOU DO!

Open Grooper Command Console

Grooper Command Console must be run as an administrator to install, edit, start and stop services.

The Grooper Service user account must have the following permissions:

File store access

  • Type: NTFS\Share
  • Reason: Read and write access to the Grooper file store location

Database access

  • Type: SQL
  • Reason: Read and write access to the Grooper database

Logon As Service

  • Type: Local Security Policy
  • Reason: Run services installed via Grooper Command Console
  1. In Grooper Command Console enter the following command:
services install <connectionNo> <typeName> <userName> <password> [threadCount] [queueName]
  • <connectionNo> (required):
  • Replace this with the integer representing the appropriate connection. Use the connections list command to get a list of your connections.
  • <typeName> (required):
  • Since we are installing an Activity Processing service, replace this with ActivityProcessing.
  • <userName> and <password> (required):
  • Replace these with the appropriate Active Directory credentials of your Grooper Service User.
  • [threadCount] and [queueName] (optional):
  • If you want to specify a specific thread count, replace [threadCount] with an appropriate integer. Not setting an integer here will assume the default setting of "multiple" threads.
  • If you want to specify a queue name, replace [queueName] with the name of an appropriate Processing Queue object. Leaving this blank will assume the Default processing queue.


Install an Activity Processing service with a Processing Queue

Processing Queues allow Activity Processing services to divide CPU threads amongst different steps in a Batch Process. This is mechanism that allows you to control how many threads are utilized by specified steps.

Commonly, a Processing Queue is created and implemented to throttle certain Activities, such as Recognize or Export, restricting the maximum number of threads that can be used to process those steps (thus freeing up compute for other Activities). In the example below, we describe how to implement a single-threaded Activity Processing service that executes Export steps using a Processing Queue.


When automating Export steps in a Batch Process, you may need to execute the activity single threaded.

Depending on which external storage system you're exporting to, you may run into errors if you attempt to run the Export activity multi-threaded.

  • This may be due to a storage platform limiting the number of concurrent connections to the repository.
    • For example, licensing limitations for the platform may restrict how many connections can be made to the repository at a time (as is the case for ApplicationXtender).
  • This may be a self-imposed throttle to avoid network/latency related errors when uploading to cloud based platforms (such as Box.com or Microsoft SharePoint).
  • This may otherwise be required or preferable for platforms whose file transfer protocol expects users to upload files one at a time.
    • If you have 5 threads all attempting to upload 5 different Batch Folders from the same machine, 4 of those Batch Folders are going to kick back to Grooper in an error state in this scenario.


For scenarios like these, it is preferable to run the Export activity single-threaded, ensuring only one Batch Folder is processed at a time. As well as automating Batch Processing activities, Activity Processing services allow you to control thread resources by assigning activities a Processing Queue and limiting the number of maximum threads available for that Processing Queue.

Next, we will show you how to create a single threaded Processing Queue for an Export activity, and set up an Activity Processing service that utilizes it. This will effectively throttle your export, so Batch Folders are indeed only exported one at a time, avoiding any issues with external platforms that cannot handle multi-threaded exports.

1. Add a Processing Queue

The first thing you'll need to do is add a Processing Queue object. A Processing Queue defines the "bucket" of threads available to one step or another in a Batch Process. In our case, this will allow us to limit the number of threads the Export step uses to a single thread.

Click here for an interactive walkthrough

To add a Processing Queue:

  1. Right-click the Queues folder in the Node Tree.
  2. Select "Add".
  3. Select "Processing Queue"
  4. This will bring up a new window to name the Processing Queue. Enter a name.
    • We named ours "Export Throttle"
  5. Select "Execute."
  6. This will add a new Processing Queue object to the Node Tree.
  7. FYI: No further object configuration is technically required at this point.
    • However, if you want the safest implementation of a single-threaded Processing Queue, totally ensuring only a single Export task is processed per repository environment, you can change the Concurrency Mode property from Multiple to Single. With the Single mode, only a single task will run per Grooper repository.

2. Assign the Processing Queue

Next, we need to tell our Batch Process which step should use our new Processing Queue.

Click here for an interactive walkthrough.

  1. By default, all Batch Process steps use the "Default" Processing Queue.
  2. In the Batch Step property grid, Processing Queues are assigned with the Processing Queue property.



We want to tell the Export step of this Batch Process to use a different Processing Queue, the new one we just created.

  1. Select the Export step in the Batch Process.
  2. Select the Queue Name property.
  3. Using the dropdown menu, select the Processing Queue you wish to use.
    • In our case, the "Export Throttle" Processing Queue.

|

3. Configure an Activity Processing Service

On to Grooper Command Console! Grooper services are installed and edited using Grooper Command Console. Open Grooper Command Console to install a new Activity Processing service.

Grooper Command Console must be run as an administrator to install and edit services.

  1. In Grooper Command Console enter the following command:
services install <connectionNo> <typeName> <userName> <password> [threadCount] [queueName]
  • <connectionNo> (required):
  • Replace this with the integer representing the appropriate connection. Use the connections list command to get a list of your connections.
  • <typeName> (required):
  • Since we are installing an Activity Processing service, replace this with ActivityProcessing.
  • <userName> and <password> (required):
  • Replace these with the appropriate Active Directory credentials of your Grooper Service User.
  • [threadCount] and [queueName] (optional):
  • If you want to specify a specific thread count, replace [threadCount] with an appropriate integer. Setting this to an integer of "1" will specify this service to only use a single procssing thread.
  • If you want to specify a queue name, replace [queueName] with the name of an appropriate Processing Queue object. Enter the name here of the Processing Queue that was created in the Queues folder object on the "Design" page.


Install an Activity Processing service for the Render activity

An Activity Processing service is required to run the Render activity. This Activity Processing service must use a specially configured Processing Queue.

  • The Processing Queue assigned to any Render step must have its Concurrency Mode set to PerMachine.

The following tutorial will instruct you how to set up a Processing Queue and Activity Processing service for a Render step in a Batch Process.


Addressing "ghost services" - Deleting services from Windows

Very rarely, a Grooper service will not uninstall properly when you uninstall a Grooper service service. Or, a user may delete a Grooper Repository connection or purge a Grooper Repository without uninstalling services first.

This can make it appear as though a duplicate or "ghost" Windows service is installed without being listed in GCC (or Grooper Config before version 2024).


If this does occur, you will need to manually delete the service. If you know the name of the service instance (something like Grooper.ServiceTypeName.##) you need to delete, you can use the following command lines to stop the service (if necessary) and manually delete it.

SC STOP Grooper.ServiceTypeName.##
SC DELETE Grooper.ServiceTypeName.##

OR

You can delete the service from the Windows Registry Editor, using the following steps:

  1. Open the Registry Editor (regedit.exe)
  2. Navigate to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services.
  3. Select the key of the service you want to delete.
    • Grooper services will always be named something like Grooper.ServiceTypeName.##
  4. From the "Edit" menu select "Delete.
  5. You will be prompted "Are you sure you want to delete this Key?". Click Yes.
  6. Exit the Registry Editor.

Glossary

Activity Processing: Activity Processing is a Grooper Service that executes Activities assigned to edit_document Batch Process Steps in a settings Batch Process. This allows Grooper to automate Batch Steps that do not require a human operator.

Activity Processing: Activity Processing is the execution of a sequence of configured tasks which are performed within a settings Batch Process to transform raw data from documents into structured and actionable information. Tasks are defined by Grooper Activities, configurated to perform document classification, extraction, or data enhancement.

Activity: Grooper Activities define specific document processing operations done to a inventory_2 Batch, folder Batch Folder, or contract Batch Page. In a settings Batch Process, each edit_document Batch Process Step executes a single Activity (determined by the step's "Activity" property).

  • Batch Process Steps are frequently referred by the name of their configured Activity followed by the word "step". For example: "Classify step".

Batch Folder: The folder Batch Folder is an organizational unit within a inventory_2 Batch, allowing for a structured approach to managing and processing a collection of documents. Batch Folder nodes serve two purposes in a Batch. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other Batch Folders and/or contract Batch Page nodes as children.

  • Batch Folders are frequently referred to simply as "documents" or "folders" depending on how they are used in the Batch.

Batch Process Step: edit_document Batch Process Steps are specific actions within a settings Batch Process sequence. Each Batch Process Step performs an "Activity" specific to some document processing task. These Activities will either be a "Code Activity" or "Review" activities. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Process Steps are frequently referred to as simply "steps".
  • Because a single Batch Process Step executes a single Activity configuration, they are often referred to by their referenced Activity as well. For example, a "Recognize step".

Batch Process: settings Batch Process nodes are crucial components in Grooper's architecture. A Batch Process is the step-by-step processing instructions given to a inventory_2 Batch. Each step is comprised of a "Code Activity" or a Review activity. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Processes by themselves do nothing. Instead, they execute edit_document Batch Process Steps which are added as children nodes.
  • A Batch Process is often referred to as simply a "process".

Batch: inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

Box: Box is a connection option for cloud CMIS Connections. It Grooper to the Box content management system for import and export operations.

Execute: tv_options_edit_channels Execute is an Activity that runs one or more specified object commands. This gives access to a variety of Grooper commands in a settings Batch Process for which there is no Activity, such as the "Sort Children" command for Batch Folders or the "Expand Attachments" command for email attachments.

Export: output Export is an Activity that transfers documents and extracted information to external file systems and content management systems, completing the data processing workflow.

Grooper Repository: A Grooper Repository is the environment used to create, configure and execute objects in Grooper. It provides the framework to "do work" in Grooper. Fundamentally, a Grooper Repository is a connection to a database and file store location, which store the node configurations and their associated file content. The Grooper application interacts with the Grooper Repository to automate tasks and provide the Grooper user interface.

Grooper Service:

Machine: computer Machine nodes represent servers that have connected to the Grooper Repository. They are essential for distributing task processing loads across multiple servers. Grooper creates Machine nodes automatically whenever a server makes a new connection to a Grooper Repository's database. Once added, Machine nodes can be used to view server information and to manage Grooper Service instances.

Node Tree: The Node Tree is the hierarchical list of Grooper node objects found in the left panel in the Design Page. It is the basis for navigation and creation in the Design Page.

Processing Queue: memory Processing Queues help automate "machine performed tasks" (Those are Code Activity tasks performed by computer Machines and their Activity Processing services). Processing Queues are assigned to Batch Process Steps to distribute tasks, control the maximum processing rate, and set the "concurrency mode" (specifying if and how parallelism can occur across one or more servers).

  • Processing Queues are used to dedicate Activity Processing services with a capped number of processing threads to resource intensive activities, such as Recognize. That way, these compute hungry tasks won't gobble up all available system resources.
  • Processing Queues are also used to manage activities, such as Render, who can only have one activity instance running per machine (This is done by changing the queue's Concurrency Mode from "Maximum" to "Per Machine").
  • Processing Queues are also used to throttle Export tasks in scenarios where the export destination can only accept one document at a time.

Recognize: format_letter_spacing_wide Recognize is an Activity that obtains machine-readable text from contract Batch Pages and folder Batch Folders. When properly configured with an library_booksOCR Profile, Recognize will selectively perform OCR for images and native-text extraction for digital text in PDFs. Recognize can also reference an perm_mediaIP Profile to collect "layout data" like lines, checkboxes, and barcodes. Other Activities then use this machine-readable text and layout data for document analysis and data extraction.

Render: print Render is an Activity that converts files of various formats to PDF. It does this by digitally printing the file to PDF using the Grooper Render Printer. This normalizes electronic document content from file formats Grooper cannot read natively to PDF (which it can read natively), allowing Grooper to extract the text via the format_letter_spacing_wide Recognize Activity.

Repository: A "repository" is a general term in computer science referring to where files and/or data is stored and managed. In Grooper, the term "repository" may refer to:

Service: Grooper Services are various executable programs that run as a Windows Service to facilitate Grooper processing. Service instances are installed, configured, started and stopped using Grooper Command Console (or in older Grooper versions, Grooper Config).

SharePoint: SharePoint is a connection option for cloud CMIS Connections. It Grooper to Microsoft SharePoint, providing access to content stored in "document libraries" and "picture libraries" for import and export operations.

Thread: A Thread is the smallest unit of processing that can be performed within an operating system. In Grooper, threads are allocated for processing by Activity Processing services.