Thread Processing Guidance: Difference between revisions

From Grooper Wiki
Created page with "Grooper Activity Processing services distribute processing resources to automated tasks in a Batch Process. This article seeks to answer questions about thread processing and document best practices. === What is a thread? === A "thread" is the smallest unit of execution that can be performed within an operating system. * They are the "workers" that carry out "tasks" in a software application. * Performance cores have 2 threads per core. * Efficiency cores have 1 th..."
 
No edit summary
Line 1: Line 1:
Grooper [[Activity Processing]] services distribute processing resources to automated tasks in a Batch Process. This article seeks to answer questions about thread processing and document best practices.
Grooper [[Activity Processing]] services distribute processing resources to automated tasks in a Batch Process. This article explains how thread processing works and outlines best practices for using threads effectively.


=== What is a thread? ===
=== What is a thread? ===


A "thread" is the smallest unit of execution that can be performed within an operating system.
A '''thread''' is the smallest unit of execution within an operating system.
* They are the "workers" that carry out "tasks" in a software application.
* Threads are the "workers" that carry out tasks in a software application.
* Performance cores have 2 threads per core.
* Performance cores typically support '''2 threads per core'''.
* Efficiency cores have 1 thread per core.
* Efficiency cores typically support '''1 thread per core'''.


=== How can I find out how many threads my machine has? ===
=== How can I find out how many threads my machine has? ===
Physical machines:
* Open Windows Task Manager.
* Go to "Performance".
* Threads are listed as "logical processors".


Virtual machines:
'''Physical machines'''
* Open Windows Task Manager.
* Open Windows '''Task Manager'''.
* Go to "Performance".
* Go to '''Performance'''.
* Threads are listed as "virtual processors".
* Threads are listed as '''Logical processors'''.
 
'''Virtual machines'''
* Open Windows '''Task Manager'''.
* Go to '''Performance'''.
* Threads are listed as '''Virtual processors'''.


=== How does Grooper utilize threads? ===
=== How does Grooper utilize threads? ===
Line 23: Line 24:
'''Primary'''
'''Primary'''


Threads are utilized by Activity Processing services to automate Activity tasks in a Batch Process.
Threads are used by [[Activity Processing]] services to automate Activity tasks in a Batch Process.
* The number of threads an Activity Processing service can use is defined by the '''Number of Threads''' setting.
* The number of threads an Activity Processing service can use is defined by the '''Number of Threads''' setting.
* Multiple Activity Processing services can be installed on a single machine.
* Multiple Activity Processing services may be installed on a single machine.
* Activity Processing services can be installed on additional machines connected to a Grooper Repository to distribute task processing across several servers.
* Activity Processing services may also be installed on additional machines connected to the same Grooper Repository to distribute processing across multiple servers.


'''Secondary'''
'''Secondary'''


The Grooper App Pool uses threads on the Grooper Web Server to run the application as users interact with the Grooper Web App.
The Grooper Application Pool uses threads on the Grooper Web Server to run the Grooper Web Application as users interact with it.
 
=== Is it better to have one Activity Processing service with a large number of threads or several with a smaller number? ===
 
It is more efficient to spread available threads to multiple Activity Processing services on a machine.
* The sweet spot appears to be '''3-4 threads per service'''.
* Example: If you have a single processing server with 50 available threads, you'll see better overall throughput if you have 16 Activity Processing services with 3 threads each compared to 1 Activity Processing service with 50 threads.
 
=== How should I be using Processing Queues?===
 
Processing Queues facilitate thread distribution across specific steps in a Batch Process.
* Processing Queues are assigned to Batch Process Steps in a Batch Process and an Activity Process to facilitate this.
** Processing Queues are assigned to Batch Process Steps using their '''Queue Name''' property.
** Processing Queues are assigned to Activity Processing services using their '''Queue Name''' property.
* If threads are "workers", Processing Queues are "managers" marshalling workers to particular activities.
* Processing Queues should be assigned to resource intensive or long-running Activity tasks to avoid bottlenecks in Batch processing.
** This bottleneck may not slow down total processing over a long period of time, but can lead to less steady flow to Review or Export steps.
** Example: Recognize tends to be a longer running step in a Batch Process (OCR takes time). If all of your threads are stuck processing Recognize tasks, other tasks will sit idle until all Recognize tasks are finished.
 
== About Thread Allocation for Grooper Services ==
 
=== Understand the "n minus one" rule ===
 
When Grooper services are installed, each service is assigned a number of CPU threads.
 
* Some services (for example, '''Import Watcher''') always run on a '''single thread'''.
* '''Activity Processing''' services can use '''multiple threads'''.
 
Your machine has a finite number of processing threads, and '''over-allocating them will cause errors'''.
 
The operating system must always have at least '''one thread available'''. For that reason, the total number of threads assigned to Grooper services must never exceed the number of available threads minus one.
 
This is known as the '''"n minus one" rule''':
 
* If '''n''' is the total number of threads available on the machine, the maximum number of threads you may assign to Grooper services is '''n − 1'''.
 
=== Consider the "n minus x" rule ===
 
In real-world environments, Grooper is rarely the only software running.
 
Other applications also require CPU threads, so additional reservations may be necessary:
 
* '''SQL installed on the same machine''' 
** Follow an '''n − 2''' rule 
** (1 thread for the OS, 1 for SQL)
 
* '''SQL and IIS installed on the same machine''' 
** Follow an '''n − 3''' rule 
** (1 thread for the OS, 1 for SQL, 1 for IIS)


* '''Other background applications''' (for example, antivirus software) 
=== Is it better to have one Activity Processing service with many threads, or several with fewer threads? ===
* Reserve additional threads as needed


This is one of the key reasons we recommend '''distributing SQL, IIS, and Grooper processing services across multiple machines''' whenever possible.
It is generally more efficient to distribute available threads across multiple Activity Processing services on a machine.
* The typical sweet spot is '''3–4 threads per service'''.
* Example: On a processing server with 50 available threads, overall throughput is usually better with '''16 Activity Processing services using 3 threads each''' than with '''a single service using all 50 threads'''.


=== Bottom line ===
=== How should I be using Processing Queues? ===


'''Do not over-allocate your available threads.'''
Processing Queues help control how threads are distributed across specific steps in a Batch Process.
* If you do, '''Grooper can behave erratically or fail unexpectedly'''.
* Processing Queues are assigned to Batch Process Steps and Activity Processing services.
* Always leave sufficient CPU resources for the operating system and any supporting services.
** Batch Process Steps use the '''Queue Name''' property to specify a Processing Queue.
** Activity Processing services also use the '''Queue Name''' property to bind to a Processing Queue.
* If threads are considered "workers", Processing Queues act as "managers" that route workers to specific activities.
* Processing Queues should be used for resource-intensive or long-running Activity tasks to prevent bottlenecks.
** While such bottlenecks may not reduce total throughput over time, they can result in uneven flow to Review or Export steps.
** Example: '''Recognize''' is often a long-running step due to OCR processing. If all threads are occupied with Recognize tasks, other steps will remain idle until those tasks complete.

Revision as of 10:37, 6 February 2026

Grooper Activity Processing services distribute processing resources to automated tasks in a Batch Process. This article explains how thread processing works and outlines best practices for using threads effectively.

What is a thread?

A thread is the smallest unit of execution within an operating system.

  • Threads are the "workers" that carry out tasks in a software application.
  • Performance cores typically support 2 threads per core.
  • Efficiency cores typically support 1 thread per core.

How can I find out how many threads my machine has?

Physical machines

  • Open Windows Task Manager.
  • Go to Performance.
  • Threads are listed as Logical processors.

Virtual machines

  • Open Windows Task Manager.
  • Go to Performance.
  • Threads are listed as Virtual processors.

How does Grooper utilize threads?

Primary

Threads are used by Activity Processing services to automate Activity tasks in a Batch Process.

  • The number of threads an Activity Processing service can use is defined by the Number of Threads setting.
  • Multiple Activity Processing services may be installed on a single machine.
  • Activity Processing services may also be installed on additional machines connected to the same Grooper Repository to distribute processing across multiple servers.

Secondary

The Grooper Application Pool uses threads on the Grooper Web Server to run the Grooper Web Application as users interact with it.

Is it better to have one Activity Processing service with many threads, or several with fewer threads?

It is generally more efficient to distribute available threads across multiple Activity Processing services on a machine.

  • The typical sweet spot is 3–4 threads per service.
  • Example: On a processing server with 50 available threads, overall throughput is usually better with 16 Activity Processing services using 3 threads each than with a single service using all 50 threads.

How should I be using Processing Queues?

Processing Queues help control how threads are distributed across specific steps in a Batch Process.

  • Processing Queues are assigned to Batch Process Steps and Activity Processing services.
    • Batch Process Steps use the Queue Name property to specify a Processing Queue.
    • Activity Processing services also use the Queue Name property to bind to a Processing Queue.
  • If threads are considered "workers", Processing Queues act as "managers" that route workers to specific activities.
  • Processing Queues should be used for resource-intensive or long-running Activity tasks to prevent bottlenecks.
    • While such bottlenecks may not reduce total throughput over time, they can result in uneven flow to Review or Export steps.
    • Example: Recognize is often a long-running step due to OCR processing. If all threads are occupied with Recognize tasks, other steps will remain idle until those tasks complete.