Batch Processing Statistics

From Grooper Wiki

Grooper logs a variety of processing statistics for production Batches. These stats can be seen when selecting a Batch's "Statistics" tab in the Batches Page or when returning query results from the Stats Page. Stats are logged for each step in the Batch's Batch Process, according to its Activity. Some of theses stats are obvious by their name. Some of them are less so.

This article seeks to document the different Batch processing statistics for each Activity in Grooper. This will help you better understand the stats Grooper collects and create the kinds of queries you want in the Stats page.

Review stats

Stats collected independent of Review Viewers used

Scan Viewer stats

Data Viewer stats

Classification Viewer stats

Thumbnail Viewer stats

Folder Viewer stats

Code Activity stats

Stats collected independent of Activity type

Regardless of the Activity the Batch Process Step is running, every step will record the following stats:

Tasks Processed - This is the total number of tasks in the Batch Process Step that are successfully processed.

  • Note if the task errors out, it will not be recorded.
    • If there are 100 available tasks and they all complete successfully, 100 tasks are recorded.
    • If there are 100 available tasks and 10 of them error out, 90 tasks are recorded.
  • This is determined by how many tasks are available in scope.
    • If the step is scoped to the Batch level (and the step completes successfully), there will be a single task recorded.
    • If the step is scoped to the Folder level and there are 10 Batch Folders (and they all complete successfully in the step), there will be 10 tasks recorded.
    • If the step is scoped to the Page level and there are 100 Batch Pages (and they all complete successfully in the step), there will be 100 tasks recorded.

Execution Time - This is the total time it takes the Batch Process Step to run from start to finish or the step's "total elapsed time".

Processing Time - This is how much time each machine thread spends processing tasks added together.

  • This is not to be confused with "Execution Time".
  • Example: An Activity Processing service uses 10 threads to process tasks.
    • There are 10 Batch Folders in a Batch at Folder level 1.
    • The Extract step is scoped to Folder level 1. So, there are 10 available tasks to process.
    • Each thread can processes one task in about 5 seconds.
    • All threads pick up the 10 Extract tasks at roughly the same time.
    • The Extract step's "Execution Time" would be 5 seconds.
      • It would take roughly 5 seconds to run the Extract step from start to finish.
      • 10 threads running concurrently process 10 tasks @ 5 seconds per thread = A total elapsed time of 5 seconds.
    • The Extract step's "Processing Time" would be 50 seconds.
      • Each thread in the Batch Process takes 5 seconds to process a single Extract task. With 10 total tasks, that's a total of 50 seconds.
      • 10 threads @ 5 seconds per thread = 50 total seconds (each thread's processing time added together)

Stats collected per Activity

Batch Transfer

"Batches Transferred"

Classify

"Folders Classified"
"Secondary Types Assigned"
"Default Types Assigned"

Clip Frames

"Frames Extracted"

Correct

"Split Count"
"Correction Count"
"Removal Count"

Execute

One stat will be recorded for each command executed

Extract

"Valid Documents"
"Invalid Documents"
"Fields Extracted"
"Fields Confident"
"Fields in Error"
"Fields With Value"
"Sections Extracted"
"Table Rows Extracted"
"Characters Extracted"

Image Processing

"Pages Processed"
"Pages Modified"
"Pages Flagged"
"IP Commands Processed"
"PDF Pages Updated"

Merge

"Pages Merged"
"Source Bytes"
"Destination Bytes"
"Files Created"

Recognize

"Characters - OCR"
"Characters - Native"
"Characters"
"Layout - Barcodes"
"Layout - Lines"
"Layout - Checkboxes"
"Pages Processed"
"Pages Normalized"
"Masked Font Pages"
"Masked Font Characters"

Redact

"Redactions Performed"

Remove Level

"Folders Removed"

Render

"Documents Rendered"
"Pages Rendered"
"Unknown File Types"
"Render Errors"
"Excluded Files"
"Bytes Rendered"

Send Mail

"Messages Sent"

Separate

"Folders Created"
"Pages Processed"
"Pages Classified"
"Pages Separated"
"Pages Deleted"
"Documents Classified"

Spawn Batch

"Folders Moved"
"Folders Copied"

Split Pages

"Pages Created"
"Grade A - High Quality"
"Grade B - Medium Quality"
"Grade C - Low Quality"
"Burst: PDF Page"
"Burst: Single Page"
"Burst: Rendered Image"
"Bytes In"
"Bytes Out"
"Image Transforms"

Split Text

"Documents Created"

Train Lexicon

"Features Trained"

Translate

"Input Characters"
"Output Characters"