Processing Queue (Node Type)
|
STUB |
This article is a stub. It contains minimal information on the topic and should be expanded. |
memory Processing Queues help automate "machine performed tasks" (Those are Code Activity tasks performed by computer Machines and their Activity Processing services). Processing Queues are assigned to Batch Process Steps to distribute tasks, control the maximum processing rate, and set the "concurrency mode" (specifying if and how parallelism can occur across one or more servers).
- Processing Queues are used to dedicate Activity Processing services with a capped number of processing threads to resource intensive activities, such as Recognize. That way, these compute hungry tasks won't gobble up all available system resources.
- Processing Queues are also used to manage activities, such as Render, who can only have one activity instance running per machine (This is done by changing the queue's Concurrency Mode from "Maximum" to "Per Machine").
- Processing Queues are also used to throttle Export tasks in scenarios where the export destination can only accept one document at a time.
This is accomplished by following these general steps:
- Create a new Processing Queue
- Assign it to a step in a Batch Process using its Queue Name property.
- Create an Activity Processing Grooper Service from Grooper Config.
- When configuring the Activity Processing service, select the Processing Queue using the Queue Name property.
- Specify how many threads you want to run using the Number of Threads property.
|
FYI |
Prior to version 2022, Processing Queues were known as "Thread Pools". While the name is different, their purpose, configuration, and functionality is largely identical.
|
About
Processing Queues meter how many tasks are able to be processed at a single time by restricting how many CPU threads are available for processing. There are a number of reasons why you would want to restrict the number of available threads:
- You may want to throttle certain processor intensive Grooper Activities in order to free up compute for other Activities. Processor intensive Activities (such as the Recognize activity) can "bottleneck" computing resources. By limiting the number of threads that process such tasks, you can free up threads for less processor intensive activities. This can allow for more Batches to be processed at the same time, reducing the number of Batches that otherwise would be waiting for available threads while all system resources were devoted to a series of tasks for a single step.
- Processing Queues can control the concurrency of operations to an external system. Imagine you are exporting to a content management system using a CMIS Connection. Assume your system has 64 threads available for processing. If the content management system allows 64 concurrent connections (allowing 64 documents exporting at a time in this case), then there's no problem. The system has one thread for each document exporting to the content management system. The Export activity would run with no problems. However, what if the content management system only allows for 16 connections at a time? Grooper is going to try and use all 64 threads to export unless told otherwise. The first 16 documents might export with no problems, but the next 48 (64 threads minus 16 threads) will error out. By creating a Processing Queue using only 16 threads for the Export activity, Grooper will hold off from using all 64 threads for the activity. Only 16 threads (one for each allowable concurrent connection to the content management system) will be used at a time.
- Some activities only allow one instance of the activity to execute at a time per machine executing the activity (Namely, the Render activity). In such cases, you would first create a new Processing Queue, specifying a PerMachine for the Concurrency Mode property. Then, you would create an Activity Processing service, assign it the newly created Processing Queue and drop the number of available threads to one. Last, you would configure the activity's step in the Batch Process to run using the newly created Processing Queue instead of the "Default". By doing this, the activity will be forced to run on a single thread, one instance at a time, one server or workstation at a time. If another instance of that activity tries to run while the first is running (such as a second Batch running on the same machine trying to run the same activity already running in another Batch Process), the second will be forced to wait its turn.
|
FYI |
More detailed information about how to add a new Processing Queue and assign it to an Activity Processing service can be found in the Activity Processing article. |
Concurrency Mode
This property specifies the parallel processing mode for a Processing Queue. It controls how multiple Activities (or the machines processing Activities) pool system resources.
This can be set to one of three modes:
- Multiple - Multiple instances can run concurrently. Multiple occurrences of the Activities using the Processing Queue can run at the same time
- PerMachine - Only a single instance can run on a single machine. Only one occurrence of an activity can run on a machine at a time.
- Single - Only a single instance can run per Grooper Repository. Regardless of how many machines are connected to the repository attempting to run an activity, only one occurrence of an activity using the Processing Queue can run at a time.
Glossary
Activity Processing: Activity Processing is the execution of a sequence of configured tasks which are performed within a settings Batch Process to transform raw data from documents into structured and actionable information. Tasks are defined by Grooper Activities, configurated to perform document classification, extraction, or data enhancement.
Activity Processing: Activity Processing is a Grooper Service that executes Activities assigned to edit_document Batch Process Steps in a settings Batch Process. This allows Grooper to automate Batch Steps that do not require a human operator.
Activity: Grooper Activities define specific document processing operations done to a inventory_2 Batch, folder Batch Folder, or contract Batch Page. In a settings Batch Process, each edit_document Batch Process Step executes a single Activity (determined by the step's "Activity" property).
- Batch Process Steps are frequently referred by the name of their configured Activity followed by the word "step". For example: "Classify step".
Batch Process: settings Batch Process nodes are crucial components in Grooper's architecture. A Batch Process is the step-by-step processing instructions given to a inventory_2 Batch. Each step is comprised of a "Code Activity" or a Review activity. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.
- Batch Processes by themselves do nothing. Instead, they execute edit_document Batch Process Steps which are added as children nodes.
- A Batch Process is often referred to as simply a "process".
Batch: inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.
CMIS Connection: cloud CMIS Connections provide a standardized way of connecting to various content management systems (CMS). CMIS Connections allow Grooper to communicate with multiple external storage platforms, enabling access to documents and document metadata that reside outside of Grooper's immediate environment.
- For those that support the CMIS standard, the CMIS Connection connects to the CMS using the CMIS standard.
- For those that do not, the CMIS Connection normalizes connection and transfer protocol as if they were a CMIS platform.
CMIS: CMIS (Content Management Interoperability Services) is open standard allowing different content management systems to "interoperate", sharing files, folders and their metadata as well as programmatic control of the platform over the internet.
Export: output Export is an Activity that transfers documents and extracted information to external file systems and content management systems, completing the data processing workflow.
Grooper Repository: A Grooper Repository is the environment used to create, configure and execute objects in Grooper. It provides the framework to "do work" in Grooper. Fundamentally, a Grooper Repository is a connection to a database and file store location, which store the node configurations and their associated file content. The Grooper application interacts with the Grooper Repository to automate tasks and provide the Grooper user interface.
Grooper Service:
Machine: computer Machine nodes represent servers that have connected to the Grooper Repository. They are essential for distributing task processing loads across multiple servers. Grooper creates Machine nodes automatically whenever a server makes a new connection to a Grooper Repository's database. Once added, Machine nodes can be used to view server information and to manage Grooper Service instances.
Processing Queue: memory Processing Queues help automate "machine performed tasks" (Those are Code Activity tasks performed by computer Machines and their Activity Processing services). Processing Queues are assigned to Batch Process Steps to distribute tasks, control the maximum processing rate, and set the "concurrency mode" (specifying if and how parallelism can occur across one or more servers).
- Processing Queues are used to dedicate Activity Processing services with a capped number of processing threads to resource intensive activities, such as Recognize. That way, these compute hungry tasks won't gobble up all available system resources.
- Processing Queues are also used to manage activities, such as Render, who can only have one activity instance running per machine (This is done by changing the queue's Concurrency Mode from "Maximum" to "Per Machine").
- Processing Queues are also used to throttle Export tasks in scenarios where the export destination can only accept one document at a time.
Recognize: format_letter_spacing_wide Recognize is an Activity that obtains machine-readable text from contract Batch Pages and folder Batch Folders. When properly configured with an library_booksOCR Profile, Recognize will selectively perform OCR for images and native-text extraction for digital text in PDFs. Recognize can also reference an perm_mediaIP Profile to collect "layout data" like lines, checkboxes, and barcodes. Other Activities then use this machine-readable text and layout data for document analysis and data extraction.
Render: print Render is an Activity that converts files of various formats to PDF. It does this by digitally printing the file to PDF using the Grooper Render Printer. This normalizes electronic document content from file formats Grooper cannot read natively to PDF (which it can read natively), allowing Grooper to extract the text via the format_letter_spacing_wide Recognize Activity.
Service: Grooper Services are various executable programs that run as a Windows Service to facilitate Grooper processing. Service instances are installed, configured, started and stopped using Grooper Command Console (or in older Grooper versions, Grooper Config).
Thread: A Thread is the smallest unit of processing that can be performed within an operating system. In Grooper, threads are allocated for processing by Activity Processing services.