Labeled OMR (Value Extractor)

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

2023

2021

Labeled OMR is a Value Extractor used to output OMR checkbox labels. It determines whether labeled checkboxes are checked or not. If checked, it outputs the label(s) or a Boolean true/false value as the result.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2025). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

Introduction

OMR boxes (Optical Mark Recognition) are small shapes printed on documents (typically squares or circles) that users fill or check to indicate a choice (for example, ☑ Yes ☐ No). Grooper detects whether each box is checked and converts these marks into data values for a Data Field.

Labeled OMR detects checkboxes based on nearby labels. It can:

Use a configured Label Extractor to find labels.
Automatically use labels from a Label Set defined on the Data Field (no Label Extractor required).
Optionally use a header label to disambiguate one group of checkboxes from other similar groups on the same page.

How it differs from other Value Extractors:

Unlike text-based extractors (e.g., Pattern, List, or Data Type), Labeled OMR reads visual checkboxes and links them to nearby text labels.
Unlike Ordered OMR (region + ordered positions) and Zonal OMR (manually defined zones), Labeled OMR anchors to labels found on the page and then locates checkboxes near those labels.

Differences vs. Ordered OMR and Zonal OMR:

Ordered OMR assigns values based on the fixed order of checkboxes within a rectangular region. Use it when label text is unreliable or absent, but the positions and order of boxes are consistent.
Zonal OMR uses manually configured zones for each checkbox. Use it when checkbox locations are fixed and known per page layout.
Labeled OMR uses labels detected at runtime and finds nearby checkboxes. Use it when forms have variable placement or multiple repeated groups, and labels are the reliable anchor.

When to use

Use Labeled OMR when:

Checkbox choices are printed with identifiable labels near each box (e.g., Yes, No, Undecided).
The form may contain repeated groups or variable layouts that make fixed zones or strict ordering unsuitable.
You want automatic label awareness via the Data Field's Label Set or Choice List.

Real-world example (preferred over Ordered OMR or Zonal OMR):

A multi-page survey where “Yes ☐ No ☐ Undecided ☐” appears under several different questions with varying positions. Because labels are reliable, Labeled OMR can find the correct group under a specific header (e.g., Attending next semester?) and read the checkboxes near those labels—even if the exact location shifts among pages and documents.

Prerequisites:

For rectangular checkboxes, ensure the page has layout data including Box Removal obtained during Recognize or Image Processing.
Provide labels either by:
- Configuring a Label Extractor, or
- Defining a Label Set on the Data Field.

How to configure Labeled OMR

There are a few different ways to configure the Labeled OMR extractors. You can use Label Extractors, List Values, and Label Sets. We will first discuss obtaining Layout Data for our documents, then go through each method for obtaining labels for Labeled OMR.

Prerequisite: layout data for documents

Before configuring a Labeled OMR extractor, you must ensure you have Layout Data on our documents that includes the detection of boxes. To do this you will need to configure an IP Profile with (at minimum) a Box Detection IP Step. You will then need to reference the IP Profile in either an Image Processing or Recognize Step in a Batch Process.

In your node tree, create a new IP Profile if you do not already have one available and add a Box Detection IP Step.
Reference your IP Profile on your Recognize Step in one of two ways:
- If your documents require OCR, reference the IP Profile on the OCR Profile you will be using on your Recognize Step.
- If your documents do not require OCR, reference the IP Profile on the Alternate IP property on your Recognize Step.
- If you find that your documents need more comprehensive Image Processing, you can run the IP Profile on an Image Processing Batch Process Step.
Save your changes to your Recognize Step.
Navigate to the "Activity Tester" tab of the Recognize Step and test on the Batch.
- If you have an Activity Processor running you can submit a job to run the Recognize Step, otherwise, select the objects you want to run Recognize on and click the test icon.
Now select the object you ran Recognize on and click the Renditions icon located at the top right of the Document Viewer.
Select the "Layout" view from the drop down.
Now you should be able to see the layout data that was collected.
- Empty boxes will be highlighted in pink.
- Checked boxes will be highlighted in green.

FYI

If you would like to follow along with the demo below, download the Project and Batch at the beginning of this Wiki Article.

Configuring Labeled OMR: using Label Extractors

There are three methods to setting up a Labeled OMR Extractor. The first option is going to be setting up Label Extractors on the Labeled OMR.

On the Data Field, set the Value Extractor property to Labeled OMR.
Expand the Labeled OMR sub properties.
Set the Label Extractor to a List Match and open the List Match editor by clicking the "..." icon to the right of the property.
Type in the name of each of the OMR options. Hit Enter on your keyboard after each entry.
When finished, click "OK" in the top right of the pop up window.
(Optional) Set an extractor on the Header Extractor property to return the label or header for the OMR information. In our example below, we used a List Match.
- Setting a Header Extractor is not always necessary, but can help Grooper better understand where the OMR information is located on the document.
- A Header Extractor can be useful when the text for the OMR choices shows up in other areas on the document.
Click over to the "Tester" tab and test your extraction to ensure the desired data is extracted properly.

Configuring Labeled OMR: using List Values

Instead of configuring a Label extractor for the OMR options, you can type in the text of the OMR options into the List Values property. Listing the options in List Values allows the Data Field to have a drop down menu available during review where a Reviewer can select from multiple options if the extraction is missing or incorrect rather than having to type in the full text.

To set up your Labeled OMR extractor using List Values:

On the Data Field, set the Value Extractor property to Labeled OMR.
Scroll to the bottom of the property grid and locate the List Values property.
Click the "⮞" to expand out the List Values sub properties.
Click the "..." to the right of the Local Entries property to open the editor.
Type in the text for the OMR options on the document, hitting enter after each one.
Click "OK" in the top right of the editor.
(Optional) Expand the Value Extractor sub properties and set an extractor on the Header Extractor property to return the label or header for the OMR information.
- Setting a Header Extractor is not always necessary, but can help Grooper better understand where the OMR information is located on the document.
- A Header Extractor can be useful when the text for the OMR choices shows up in other areas on the document.
Click over to the "Tester" tab and test your extraction to ensure the desired data is extracted properly.
- You should see the option to access a drop down for the Data Field which will have all the OMR options you typed into the List Values.

Configuring Labeled OMR: using Labelsets

Labeled OMR is a Labelset aware Value Extractor. That means that Labeled OMR can work with a Labeling Behavior set on your Content Model. Labelsets work in combination with List Values for Labeled OMR. List Values provide the text that will appear in the Data Field upon extraction. The Labelsets will provide Grooper with the information to determine which OMR option is checked on the document.

Prerequisite: You will need to enable a Labeling Behavior on your Content Model before you will have access to the "Labels" tab. For more information on how to set up a Labeling Behavior, visit the Labeling Behavior wiki page.

On the Data Field, set the Value Extractor property to a Labeled OMR.
Scroll to the bottom of the property grid and locate the List Values property.
Click the "⮞" to expand out the List Values sub properties.
Click the "..." to the right of the Local Entries property to open the editor.
Type in text for the OMR options on the document, hitting enter after each one.
- When using Labelsets with Labeled Value, the List Values do not need to match the text on the document.
Click "OK" in the top right of the editor.
Navigate to the Content Model and click on the "Labels" tab.
Make sure your documents are Classified.
Collect text on the document for each of the OMR option labels.
- The names of the labels will be the same as the text you typed in for the List Values.
(Optional) Collect a header label in the Data Field's text box label to give Grooper more context about where the OMR information is located on the document.
Save your changes to your Labels.
Navigate to your Data Model and test your extraction.
- You should see the text from the List Values returned in the Data Field with the ability to select another OMR option using the drop down on the Data Field text box.

Testing tips:

Use the Review UI to verify that each Field Instance shows the linked labels and checkboxes. Each instance will display associated label rectangles (context) and checkbox annotations.
If results are empty, confirm labels are found. If no rectangular boxes are detected, try enabling Box Removal in Recognize or Image Processing, or rely on circular detection.
If the wrong group is selected, add or refine a Header Extractor.

Additional Considerations

Now that we know the basics of how Labeled OMR works, there are a few more things to consider when setting up your OMR extraction.

Maximum Noise

The concept of character noise is important to how Grooper isolates and filters out OMR label groups. A noise character is any alphanumeric character (not punctuation characters) that falls between OMR labels. Typically, OMR labels are grouped close together on a document with little to no other text between the labels. Grooper will filter out label matches with large numbers of characters between them.

For example, take these checkboxes using the labels "True" and "False". Yes, the labels are nearby checkboxes, but those same labels exist in the sentences to the right. How does Grooper distinguish between the OMR labels and those same words otherwise popping up on the document? Noise.
First, Grooper will draw a box around the boundaries of each OMR label.
Then, Grooper will dropout the labels and count the characters remaining within each boundary. These remaining characters are "noise". Grooper will count each character (alphanumeric only) to establish a noise count between potential OMR labels in a group.
In cases where there are multiple label hits on the document, Grooper will use whichever OMR label group has the least amount of noise.

The "Maximum Noise" property dictates how much noise is allowable between OMR labels in a group. By default, only 5 noise characters are allowed. This means if there are more than 5 noise characters between your OMR labels, Grooper will always toss out your OMR labels. However, this can be adjusted.

Configure your Labeled OMR following the instructions in the #How to configure Labeled OMR section.
In the extraction object's property grid, expand out the sub properties of the Labeled OMR property.
Locate the Maximum Noise property and adjust the number of maximum noise allowed.
Save your changes and test extraction.

OMR Modes

Checkboxes detail information in one of three ways:

A single checkbox is selected out of multiple options.
Multiple checkboxes can be selected out of multiple options.
A single checkbox can be checked or left unchecked.

Labeled OMR has three corresponding "Modes" to account for this:

CheckOne
CheckMulti
Boolean

To set the OMR Mode on your Labeled OMR:

Navigate to the object where your Labeled OMR is set up.
Expand out the Labeled OMR sub properties.
Locate the Mode property.
Set the Mode Property to one of the following:
- CheckOne - only one of multiple check boxes will be checked on the document.
- CheckMulti - one or more of multiple check boxes will be checked on the document.
- Boolean - a single check box will either be checked or unchecked on the document.

By default, the Mode is set to CheckOne. When you change the Mode to one of the other two, new properties will appear that you may configure:

CheckMulti: Separator String allows you to enter an alphanumeric character, punctuation, symbol, or white spaces as delimiters to put between each checked option once it's extracted. This can make the responses easier to read.
Boolean: Value if Checked and Value if Unchecked. By default these are set to "True" and "False" respectively, but you can change what text grooper will populate the field with if the OMR box is checked or unchecked.

Troubleshooting

Too many or too few labels grouped:

For custom Label Extractor with "Group Match" = False, adjust "Minimum Label Count".
Enable "Consider Lines" to avoid accidentally grouping across lines.

No OMR boxes found:

Confirm Box Removal was applied for rectangular checkboxes.
Labeled OMR will fall back to circular detection if rectangular detection fails.

Ambiguous repeated groups:

Configure "Header Extractor" to select the correct group.

Multi-select output formatting:

In CheckMulti mode, verify whether the Data Field is an array (uses the field's delimiter) or a single-value field (uses "Separator String").

Properties overview

Below are the Labeled OMR-specific properties and related OMR output behaviors. Property visibility may depend on OMR mode and whether a Label Extractor is set.

"Label Extractor"

Definition: An extractor that matches the labels for each checkbox.
Remarks: Leave empty when using Label Sets or when the Data Field's list values are configured; Labeled OMR will auto-create a list-based label matcher. If "Group Match" is False, configure it to match individual labels (e.g., Pattern or List). If "Group Match" is True, the extractor should match an entire label group and return the individual box labels as children.
Purpose: Supplies label hits that anchor checkbox detection near those labels.

"Header Extractor"

Definition: An extractor that matches the header label for the OMR group.
Remarks: Optional; helps disambiguate between multiple similar checkbox groups. Leave empty when using Label Sets and a header label is defined for the field.
Purpose: Selects the correct group under a specific heading on pages that contain repeated groups.

"Group Match"

Definition: Indicates that the Label Extractor matches an entire label group rather than individual labels.
Remarks: When enabled, configure the Label Extractor to return a single result per group, with children representing the individual labels. When disabled, use separate label hits and specify "Minimum Label Count".
Purpose: Controls grouping strategy based on how your Label Extractor returns results.
Visibility: Only shown when a Label Extractor is set and OMR mode is not Boolean.

"Minimum Label Count"

Definition: The minimum number of labels required for a match to occur.
Remarks: Applies only when using a local Label Extractor and "Group Match" is False. Set to the expected number of labels per group to ensure complete grouping.
Purpose: Prevents incomplete or partial groups from being considered valid.
Visibility: Only shown when a Label Extractor is set, OMR mode is not Boolean, and "Group Match" is False.

"Maximum Noise"

Definition: The maximum number of noise characters allowed in the label group's bounding region.
Remarks: The bounding region is the smallest rectangle enclosing the group's labels (and header, if present). Noise characters are letters or digits in that region that are not part of the key/value; punctuation and whitespace are ignored. Set to zero to reject groups with any noise; leave unset to skip this check.
Purpose: Filters out groups with unwanted text artifacts inside the label area, improving accuracy on noisy forms.

"Consider Lines"

Definition: When enabled, grouping of labels considers intersecting lines to improve accuracy.
Remarks: Useful when multiple identical labels appear on a page and cannot be grouped accurately by proximity alone. Lines are used as additional grouping cues.
Purpose: Reduces incorrect label grouping in complex layouts, such as forms with grid lines or separators.

Related OMR output behaviors (inherited): "Mode" (OMR mode)

Definition: Controls how checkbox selections produce values (Boolean, CheckOne, CheckMulti).
Purpose: Determines whether single or multiple selections are allowed and how the output value is formed.

"Value If Checked"

Definition: Output value when a checkbox is checked (primarily used in Boolean mode).
Visibility: Hidden in contexts where selection comes from labels (e.g., Ordered OMR), but applicable to Labeled OMR's Boolean mode.

"Value If Unchecked"

Definition: Output value when a checkbox is unchecked (primarily used in Boolean mode).
Visibility: Similar to "Value If Checked".

"Separator String"

Definition: The delimiter used to join multiple values in CheckMulti mode.
Visibility: Shown when Mode is CheckMulti and the Data Field is not an array. If the Data Field is an array, the field's delimiter is used instead.

Execution and results

During Extract, Labeled OMR:

Finds labels either via "Label Extractor", via Label Sets on the Data Field, or auto-generates a list-based label matcher using the field's choices.
Groups labels (and header, if provided) into candidate sets, optionally considering intersecting lines.
Detects nearby checkboxes within each group:
- Attempts rectangular detection first (requires Box Removal in preprocessing).
- Falls back to circular detection if rectangular boxes are not found.
Produces one or more Field Instances with:
- The selected value(s) based on checked boxes and OMR mode.
- Linked "Labels" (context) and "Checkboxes" collections for review.
- Confidence and annotations to visualize label groups and boxes in the Review UI.

Tips

Keep label text consistent and distinct across choices (e.g., avoid similar strings like Yes and Yes!).
For CheckMulti outputs, confirm whether the Data Field should be configured as an array. Arrays use the field's delimiter; single-value fields use "Separator String".
Use a header when similar checkbox groups repeat on the same page to avoid ambiguity.
Apply Box Removal to improve rectangular checkbox detection in the page's layout data; otherwise Labeled OMR will rely on circular detection.