Thread Processing Guidance: Difference between revisions

From Grooper Wiki
No edit summary
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
Grooper [[Activity Processing]] services distribute processing resources to automated tasks in a Batch Process. This article explains how thread processing works and outlines best practices for using threads effectively.
Grooper [[Activity Processing]] services distribute processing resources to automated tasks in a Batch Process using your system's processing threads. This article explains how thread processing works and outlines best practices for using threads effectively.


=== What is a thread? ===
=== What is a thread? ===
Line 22: Line 22:
=== How does Grooper utilize threads? ===
=== How does Grooper utilize threads? ===


'''Primary'''
<big>Primary</big>


Threads are used by [[Activity Processing]] services to automate Activity tasks in a Batch Process.
Threads are primarily used by '''[[Activity Processing]]''' services to automate tasks within a Batch Process:
* The number of threads an Activity Processing service can use is defined by the '''Number of Threads''' setting.
* Multiple Activity Processing services may be installed on a single machine.
* Activity Processing services may also be installed on additional machines connected to the same Grooper Repository to distribute processing across multiple servers.


'''Secondary'''
* The '''Number of Threads''' setting determines how many threads an Activity Processing service can use.
* Multiple Activity Processing service instances can be installed on a single machine.
* Activity Processing services can also be installed across multiple machines connected to the same Grooper Repository, allowing processing to be distributed across servers.
* '''Key benefit:'''
** Allocating more threads across one or more Activity Processing services allows more tasks to be processed concurrently.
** Increased concurrency improves throughput, making Batch processing faster.


The Grooper Application Pool uses threads on the Grooper Web Server to run the Grooper Web Application as users interact with it.
<big>Secondary</big>
 
* Other Grooper services (e.g., '''Import Watcher''') also utilize threads, but each service instance consumes only '''one thread'''.
* The '''Grooper Application Pool''' uses threads on the Grooper Web Server to run the Grooper Web Application and handle user interactions.


=== Is it better to have one Activity Processing service with many threads, or several with fewer threads? ===
=== Is it better to have one Activity Processing service with many threads, or several with fewer threads? ===
Line 49: Line 54:
** While such bottlenecks may not reduce total throughput over time, they can result in uneven flow to Review or Export steps.
** While such bottlenecks may not reduce total throughput over time, they can result in uneven flow to Review or Export steps.
** Example: '''Recognize''' is often a long-running step due to OCR processing. If all threads are occupied with Recognize tasks, other steps will remain idle until those tasks complete.
** Example: '''Recognize''' is often a long-running step due to OCR processing. If all threads are occupied with Recognize tasks, other steps will remain idle until those tasks complete.
=== Thread allocation best practices for Grooper services ===
==== Understand the "n minus one" rule ====
When Grooper services are installed, each service is assigned a number of CPU threads.
* Certain services (for example, '''Import Watcher''') always run on a '''single thread'''.
* '''Activity Processing''' services can use '''multiple threads'''.
The machine provides a finite number of processing threads. '''Over-allocating threads may cause errors'''. 
The operating system must always have at least '''one thread available'''. Therefore, the total number of threads assigned to Grooper services must not exceed the number of available threads minus one.
This principle is known as the '''"n minus one" rule''': 
* If '''n''' is the total number of threads available on the machine, the maximum number of threads that may be assigned to Grooper services is '''n − 1'''.
==== Consider the "n minus x" rule ====
In production environments, Grooper is rarely the only application running. Other software requires CPU threads, so additional reservations may be necessary:
* '''SQL installed on the same machine''' 
** Follow an '''n − 2''' rule: 1 thread for the OS, 1 for SQL.
* '''SQL and IIS installed on the same machine''' 
** Follow an '''n − 3''' rule: 1 thread for the OS, 1 for SQL, 1 for IIS.
* '''Other background applications''' (for example, antivirus software) 
** Reserve additional threads as needed.
Therefore, it is recommended to '''distribute SQL, IIS, and Grooper processing services across multiple machines''' whenever possible.
==== Bottom line ====
* '''Do not over-allocate available threads.''' 
* If threads are over-allocated, '''Grooper may behave erratically or fail unexpectedly'''. 
* Always leave sufficient CPU resources for the operating system and any supporting services.

Latest revision as of 13:14, 6 February 2026

Grooper Activity Processing services distribute processing resources to automated tasks in a Batch Process using your system's processing threads. This article explains how thread processing works and outlines best practices for using threads effectively.

What is a thread?

A thread is the smallest unit of execution within an operating system.

  • Threads are the "workers" that carry out tasks in a software application.
  • Performance cores typically support 2 threads per core.
  • Efficiency cores typically support 1 thread per core.

How can I find out how many threads my machine has?

Physical machines

  • Open Windows Task Manager.
  • Go to Performance.
  • Threads are listed as Logical processors.

Virtual machines

  • Open Windows Task Manager.
  • Go to Performance.
  • Threads are listed as Virtual processors.

How does Grooper utilize threads?

Primary

Threads are primarily used by Activity Processing services to automate tasks within a Batch Process:

  • The Number of Threads setting determines how many threads an Activity Processing service can use.
  • Multiple Activity Processing service instances can be installed on a single machine.
  • Activity Processing services can also be installed across multiple machines connected to the same Grooper Repository, allowing processing to be distributed across servers.
  • Key benefit:
    • Allocating more threads across one or more Activity Processing services allows more tasks to be processed concurrently.
    • Increased concurrency improves throughput, making Batch processing faster.

Secondary

  • Other Grooper services (e.g., Import Watcher) also utilize threads, but each service instance consumes only one thread.
  • The Grooper Application Pool uses threads on the Grooper Web Server to run the Grooper Web Application and handle user interactions.

Is it better to have one Activity Processing service with many threads, or several with fewer threads?

It is generally more efficient to distribute available threads across multiple Activity Processing services on a machine.

  • The typical sweet spot is 3–4 threads per service.
  • Example: On a processing server with 50 available threads, overall throughput is usually better with 16 Activity Processing services using 3 threads each than with a single service using all 50 threads.

How should I be using Processing Queues?

Processing Queues help control how threads are distributed across specific steps in a Batch Process.

  • Processing Queues are assigned to Batch Process Steps and Activity Processing services.
    • Batch Process Steps use the Queue Name property to specify a Processing Queue.
    • Activity Processing services also use the Queue Name property to bind to a Processing Queue.
  • If threads are considered "workers", Processing Queues act as "managers" that route workers to specific activities.
  • Processing Queues should be used for resource-intensive or long-running Activity tasks to prevent bottlenecks.
    • While such bottlenecks may not reduce total throughput over time, they can result in uneven flow to Review or Export steps.
    • Example: Recognize is often a long-running step due to OCR processing. If all threads are occupied with Recognize tasks, other steps will remain idle until those tasks complete.

Thread allocation best practices for Grooper services

Understand the "n minus one" rule

When Grooper services are installed, each service is assigned a number of CPU threads.

  • Certain services (for example, Import Watcher) always run on a single thread.
  • Activity Processing services can use multiple threads.

The machine provides a finite number of processing threads. Over-allocating threads may cause errors. The operating system must always have at least one thread available. Therefore, the total number of threads assigned to Grooper services must not exceed the number of available threads minus one.

This principle is known as the "n minus one" rule:

  • If n is the total number of threads available on the machine, the maximum number of threads that may be assigned to Grooper services is n − 1.

Consider the "n minus x" rule

In production environments, Grooper is rarely the only application running. Other software requires CPU threads, so additional reservations may be necessary:

  • SQL installed on the same machine
    • Follow an n − 2 rule: 1 thread for the OS, 1 for SQL.
  • SQL and IIS installed on the same machine
    • Follow an n − 3 rule: 1 thread for the OS, 1 for SQL, 1 for IIS.
  • Other background applications (for example, antivirus software)
    • Reserve additional threads as needed.

Therefore, it is recommended to distribute SQL, IIS, and Grooper processing services across multiple machines whenever possible.

Bottom line

  • Do not over-allocate available threads.
  • If threads are over-allocated, Grooper may behave erratically or fail unexpectedly.
  • Always leave sufficient CPU resources for the operating system and any supporting services.