Thread Processing Guidance: Difference between revisions

From Grooper Wiki
Created page with "Grooper Activity Processing services distribute processing resources to automated tasks in a Batch Process. This article seeks to answer questions about thread processing and document best practices. === What is a thread? === A "thread" is the smallest unit of execution that can be performed within an operating system. * They are the "workers" that carry out "tasks" in a software application. * Performance cores have 2 threads per core. * Efficiency cores have 1 th..."
 
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
Grooper [[Activity Processing]] services distribute processing resources to automated tasks in a Batch Process. This article seeks to answer questions about thread processing and document best practices.
Grooper [[Activity Processing]] services distribute processing resources to automated tasks in a Batch Process using your system's processing threads. This article explains how thread processing works and outlines best practices for using threads effectively.


=== What is a thread? ===
=== What is a thread? ===


A "thread" is the smallest unit of execution that can be performed within an operating system.
A '''thread''' is the smallest unit of execution within an operating system.
* They are the "workers" that carry out "tasks" in a software application.
* Threads are the "workers" that carry out tasks in a software application.
* Performance cores have 2 threads per core.
* Performance cores typically support '''2 threads per core'''.
* Efficiency cores have 1 thread per core.
* Efficiency cores typically support '''1 thread per core'''.


=== How can I find out how many threads my machine has? ===
=== How can I find out how many threads my machine has? ===
Physical machines:
* Open Windows Task Manager.
* Go to "Performance".
* Threads are listed as "logical processors".


Virtual machines:
'''Physical machines'''
* Open Windows Task Manager.
* Open Windows '''Task Manager'''.
* Go to "Performance".
* Go to '''Performance'''.
* Threads are listed as "virtual processors".
* Threads are listed as '''Logical processors'''.
 
'''Virtual machines'''
* Open Windows '''Task Manager'''.
* Go to '''Performance'''.
* Threads are listed as '''Virtual processors'''.


=== How does Grooper utilize threads? ===
=== How does Grooper utilize threads? ===


'''Primary'''
<big>Primary</big>
 
Threads are primarily used by '''[[Activity Processing]]''' services to automate tasks within a Batch Process:


Threads are utilized by Activity Processing services to automate Activity tasks in a Batch Process.
* The '''Number of Threads''' setting determines how many threads an Activity Processing service can use.
* The number of threads an Activity Processing service can use is defined by the '''Number of Threads''' setting.
* Multiple Activity Processing service instances can be installed on a single machine.
* Multiple Activity Processing services can be installed on a single machine.
* Activity Processing services can also be installed across multiple machines connected to the same Grooper Repository, allowing processing to be distributed across servers.
* Activity Processing services can be installed on additional machines connected to a Grooper Repository to distribute task processing across several servers.
* '''Key benefit:'''
** Allocating more threads across one or more Activity Processing services allows more tasks to be processed concurrently.
** Increased concurrency improves throughput, making Batch processing faster.


'''Secondary'''
<big>Secondary</big>


The Grooper App Pool uses threads on the Grooper Web Server to run the application as users interact with the Grooper Web App.
* Other Grooper services (e.g., '''Import Watcher''') also utilize threads, but each service instance consumes only '''one thread'''.
* The '''Grooper Application Pool''' uses threads on the Grooper Web Server to run the Grooper Web Application and handle user interactions.


=== Is it better to have one Activity Processing service with a large number of threads or several with a smaller number? ===
=== Is it better to have one Activity Processing service with many threads, or several with fewer threads? ===


It is more efficient to spread available threads to multiple Activity Processing services on a machine.
It is generally more efficient to distribute available threads across multiple Activity Processing services on a machine.
* The sweet spot appears to be '''3-4 threads per service'''.
* The typical sweet spot is '''3–4 threads per service'''.
* Example: If you have a single processing server with 50 available threads, you'll see better overall throughput if you have 16 Activity Processing services with 3 threads each compared to 1 Activity Processing service with 50 threads.
* Example: On a processing server with 50 available threads, overall throughput is usually better with '''16 Activity Processing services using 3 threads each''' than with '''a single service using all 50 threads'''.


=== How should I be using Processing Queues?===
=== How should I be using Processing Queues? ===


Processing Queues facilitate thread distribution across specific steps in a Batch Process.
Processing Queues help control how threads are distributed across specific steps in a Batch Process.
* Processing Queues are assigned to Batch Process Steps in a Batch Process and an Activity Process to facilitate this.
* Processing Queues are assigned to Batch Process Steps and Activity Processing services.
** Processing Queues are assigned to Batch Process Steps using their '''Queue Name''' property.
** Batch Process Steps use the '''Queue Name''' property to specify a Processing Queue.
** Processing Queues are assigned to Activity Processing services using their '''Queue Name''' property.
** Activity Processing services also use the '''Queue Name''' property to bind to a Processing Queue.
* If threads are "workers", Processing Queues are "managers" marshalling workers to particular activities.
* If threads are considered "workers", Processing Queues act as "managers" that route workers to specific activities.
* Processing Queues should be assigned to resource intensive or long-running Activity tasks to avoid bottlenecks in Batch processing.
* Processing Queues should be used for resource-intensive or long-running Activity tasks to prevent bottlenecks.
** This bottleneck may not slow down total processing over a long period of time, but can lead to less steady flow to Review or Export steps.
** While such bottlenecks may not reduce total throughput over time, they can result in uneven flow to Review or Export steps.
** Example: Recognize tends to be a longer running step in a Batch Process (OCR takes time). If all of your threads are stuck processing Recognize tasks, other tasks will sit idle until all Recognize tasks are finished.
** Example: '''Recognize''' is often a long-running step due to OCR processing. If all threads are occupied with Recognize tasks, other steps will remain idle until those tasks complete.


== About Thread Allocation for Grooper Services ==
=== Thread allocation best practices for Grooper services ===


=== Understand the "n minus one" rule ===
==== Understand the "n minus one" rule ====


When Grooper services are installed, each service is assigned a number of CPU threads.
When Grooper services are installed, each service is assigned a number of CPU threads.


* Some services (for example, '''Import Watcher''') always run on a '''single thread'''.
* Certain services (for example, '''Import Watcher''') always run on a '''single thread'''.
* '''Activity Processing''' services can use '''multiple threads'''.
* '''Activity Processing''' services can use '''multiple threads'''.


Your machine has a finite number of processing threads, and '''over-allocating them will cause errors'''.
The machine provides a finite number of processing threads. '''Over-allocating threads may cause errors'''.
 
The operating system must always have at least '''one thread available'''. Therefore, the total number of threads assigned to Grooper services must not exceed the number of available threads minus one.
The operating system must always have at least '''one thread available'''. For that reason, the total number of threads assigned to Grooper services must never exceed the number of available threads minus one.
 
This is known as the '''"n minus one" rule''':


* If '''n''' is the total number of threads available on the machine, the maximum number of threads you may assign to Grooper services is '''n − 1'''.
This principle is known as the '''"n minus one" rule'''


=== Consider the "n minus x" rule ===
* If '''n''' is the total number of threads available on the machine, the maximum number of threads that may be assigned to Grooper services is '''n − 1'''.


In real-world environments, Grooper is rarely the only software running.
==== Consider the "n minus x" rule ====


Other applications also require CPU threads, so additional reservations may be necessary:
In production environments, Grooper is rarely the only application running. Other software requires CPU threads, so additional reservations may be necessary:


* '''SQL installed on the same machine'''   
* '''SQL installed on the same machine'''   
** Follow an '''n − 2''' rule
** Follow an '''n − 2''' rule: 1 thread for the OS, 1 for SQL.
** (1 thread for the OS, 1 for SQL)


* '''SQL and IIS installed on the same machine'''   
* '''SQL and IIS installed on the same machine'''   
** Follow an '''n − 3''' rule
** Follow an '''n − 3''' rule: 1 thread for the OS, 1 for SQL, 1 for IIS.
** (1 thread for the OS, 1 for SQL, 1 for IIS)


* '''Other background applications''' (for example, antivirus software)   
* '''Other background applications''' (for example, antivirus software)   
* Reserve additional threads as needed
** Reserve additional threads as needed.


This is one of the key reasons we recommend '''distributing SQL, IIS, and Grooper processing services across multiple machines''' whenever possible.
Therefore, it is recommended to '''distribute SQL, IIS, and Grooper processing services across multiple machines''' whenever possible.


=== Bottom line ===
==== Bottom line ====


'''Do not over-allocate your available threads.'''   
* '''Do not over-allocate available threads.'''   
* If you do, '''Grooper can behave erratically or fail unexpectedly'''.
* If threads are over-allocated, '''Grooper may behave erratically or fail unexpectedly'''.
* Always leave sufficient CPU resources for the operating system and any supporting services.
* Always leave sufficient CPU resources for the operating system and any supporting services.

Latest revision as of 13:14, 6 February 2026

Grooper Activity Processing services distribute processing resources to automated tasks in a Batch Process using your system's processing threads. This article explains how thread processing works and outlines best practices for using threads effectively.

What is a thread?

A thread is the smallest unit of execution within an operating system.

  • Threads are the "workers" that carry out tasks in a software application.
  • Performance cores typically support 2 threads per core.
  • Efficiency cores typically support 1 thread per core.

How can I find out how many threads my machine has?

Physical machines

  • Open Windows Task Manager.
  • Go to Performance.
  • Threads are listed as Logical processors.

Virtual machines

  • Open Windows Task Manager.
  • Go to Performance.
  • Threads are listed as Virtual processors.

How does Grooper utilize threads?

Primary

Threads are primarily used by Activity Processing services to automate tasks within a Batch Process:

  • The Number of Threads setting determines how many threads an Activity Processing service can use.
  • Multiple Activity Processing service instances can be installed on a single machine.
  • Activity Processing services can also be installed across multiple machines connected to the same Grooper Repository, allowing processing to be distributed across servers.
  • Key benefit:
    • Allocating more threads across one or more Activity Processing services allows more tasks to be processed concurrently.
    • Increased concurrency improves throughput, making Batch processing faster.

Secondary

  • Other Grooper services (e.g., Import Watcher) also utilize threads, but each service instance consumes only one thread.
  • The Grooper Application Pool uses threads on the Grooper Web Server to run the Grooper Web Application and handle user interactions.

Is it better to have one Activity Processing service with many threads, or several with fewer threads?

It is generally more efficient to distribute available threads across multiple Activity Processing services on a machine.

  • The typical sweet spot is 3–4 threads per service.
  • Example: On a processing server with 50 available threads, overall throughput is usually better with 16 Activity Processing services using 3 threads each than with a single service using all 50 threads.

How should I be using Processing Queues?

Processing Queues help control how threads are distributed across specific steps in a Batch Process.

  • Processing Queues are assigned to Batch Process Steps and Activity Processing services.
    • Batch Process Steps use the Queue Name property to specify a Processing Queue.
    • Activity Processing services also use the Queue Name property to bind to a Processing Queue.
  • If threads are considered "workers", Processing Queues act as "managers" that route workers to specific activities.
  • Processing Queues should be used for resource-intensive or long-running Activity tasks to prevent bottlenecks.
    • While such bottlenecks may not reduce total throughput over time, they can result in uneven flow to Review or Export steps.
    • Example: Recognize is often a long-running step due to OCR processing. If all threads are occupied with Recognize tasks, other steps will remain idle until those tasks complete.

Thread allocation best practices for Grooper services

Understand the "n minus one" rule

When Grooper services are installed, each service is assigned a number of CPU threads.

  • Certain services (for example, Import Watcher) always run on a single thread.
  • Activity Processing services can use multiple threads.

The machine provides a finite number of processing threads. Over-allocating threads may cause errors. The operating system must always have at least one thread available. Therefore, the total number of threads assigned to Grooper services must not exceed the number of available threads minus one.

This principle is known as the "n minus one" rule:

  • If n is the total number of threads available on the machine, the maximum number of threads that may be assigned to Grooper services is n − 1.

Consider the "n minus x" rule

In production environments, Grooper is rarely the only application running. Other software requires CPU threads, so additional reservations may be necessary:

  • SQL installed on the same machine
    • Follow an n − 2 rule: 1 thread for the OS, 1 for SQL.
  • SQL and IIS installed on the same machine
    • Follow an n − 3 rule: 1 thread for the OS, 1 for SQL, 1 for IIS.
  • Other background applications (for example, antivirus software)
    • Reserve additional threads as needed.

Therefore, it is recommended to distribute SQL, IIS, and Grooper processing services across multiple machines whenever possible.

Bottom line

  • Do not over-allocate available threads.
  • If threads are over-allocated, Grooper may behave erratically or fail unexpectedly.
  • Always leave sufficient CPU resources for the operating system and any supporting services.