Labeled OMR (Value Extractor)

From Grooper Wiki
Revision as of 11:08, 12 January 2026 by Rpatton (talk | contribs)

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.


This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 202320212.90
An example of checkboxes.

Labeled OMR is a Value Extractor used to output OMR checkbox labels. It determines whether labeled checkboxes are checked or not. If checked, it outputs the label(s) or a Boolean true/false value as the result.

Introduction

OMR boxes (Optical Mark Recognition) are small shapes printed on documents (typically squares or circles) that users fill or check to indicate a choice (for example, ☑ Yes ☐ No). Grooper detects whether each box is checked and converts these marks into data values for a Data Field.

Labeled OMR detects checkboxes based on nearby labels. It can:

  • Use a configured Label Extractor to find labels.
  • Automatically use labels from a Label Set defined on the Data Field (no Label Extractor required).
  • Optionally use a header label to disambiguate one group of checkboxes from other similar groups on the same page.

How it differs from other Value Extractors:

  • Unlike text-based extractors (e.g., Pattern, List, or Data Type), Labeled OMR reads visual checkboxes and links them to nearby text labels.
  • Unlike Ordered OMR (region + ordered positions) and Zonal OMR (manually defined zones), Labeled OMR anchors to labels found on the page and then locates checkboxes near those labels.

Differences vs. Ordered OMR and Zonal OMR:

  • Ordered OMR assigns values based on the fixed order of checkboxes within a rectangular region. Use it when label text is unreliable or absent, but the positions and order of boxes are consistent.
  • Zonal OMR uses manually configured zones for each checkbox. Use it when checkbox locations are fixed and known per page layout.
  • Labeled OMR uses labels detected at runtime and finds nearby checkboxes. Use it when forms have variable placement or multiple repeated groups, and labels are the reliable anchor.

When to use

Use Labeled OMR when:

  • Checkbox choices are printed with identifiable labels near each box (e.g., Yes, No, Undecided).
  • The form may contain repeated groups or variable layouts that make fixed zones or strict ordering unsuitable.
  • You want automatic label awareness via the Data Field's Label Set or Choice List.

Real-world example (preferred over Ordered OMR or Zonal OMR):

  1. A multi-page survey where “Yes ☐ No ☐ Undecided ☐” appears under several different questions with varying positions. Because labels are reliable, Labeled OMR can find the correct group under a specific header (e.g., Attending next semester?) and read the checkboxes near those labels—even if the exact location shifts among pages and documents.

Prerequisites:

  • For rectangular checkboxes, ensure the page has layout data including Box Removal obtained during Recognize or Image Processing.
  • Provide labels either by:
    • Configuring a Label Extractor, or
    • Defining a Label Set on the Data Field.

How to configure Labeled OMR

There are a few different ways to configure the Labeled OMR extractors. You can use Label Extractors, List Values, and Label Sets. We will first discuss obtaining Layout Data for our documents, then go through each method for obtaining labels for Labeled OMR.

Prerequisite: layout data for documents

Before configuring a Labeled OMR extractor, you must ensure you have Layout Data on our documents that includes the detection of boxes. To do this you will need to configure an IP Profile with (at minimum) a Box Detection IP Step. You will then need to reference the IP Profile in either an Image Processing or Recognize Step in a Batch Process.

  1. In your node tree, create a new IP Profile if you do not already have one available and add a Box Detection IP Step.
  2. Reference your IP Profile on your Recognize Step in one of two ways:
    • If your documents require OCR, reference the IP Profile on the OCR Profile you will be using on your Recognize Step.
    • If your documents do not require OCR, reference the IP Profile on the Alternate IP property on your Recognize Step.
    • If you find that your documents need more comprehensive Image Processing, you can run the IP Profile on an Image Processing Batch Process Step.
  3. Save your changes to your Recognize Step.
  4. Navigate to the "Activity Tester" tab of the Recognize Step and test on the Batch.
    • If you have an Activity Processor running you can submit a job to run the Recognize Step, otherwise, select the objects you want to run Recognize on and click the test icon.
  5. Now select the object you ran Recognize on and click the Renditions icon located at the top right of the Document Viewer.
  6. Select the "Layout" view from the drop down.
  7. Now you should be able to see the layout data that was collected.
    • Empty boxes will be highlighted in pink.
    • Checked boxes will be highlighted in green.

FYI

If you would like to follow along with the demo below, download the Project and Batch at the beginning of this Wiki Article.

Configuring Labeled OMR: using Label Extractors

There are three methods to setting up a Labeled OMR Extractor. The first option is going to be setting up Label Extractors on the Labeled OMR.

  1. On the Data Field, set the Value Extractor property to Labeled OMR.
  2. Expand the Labeled OMR sub properties.
  3. Set the Label Extractor to a List Match and open the List Match editor by clicking the "..." icon to the right of the property.
  4. Type in the name of each of the OMR options. Hit Enter on your keyboard after each entry.
  5. When finished, click "OK" in the top right of the pop up window.
  6. (Optional) Set an extractor on the Header Extractor property to return the label or header for the OMR information. In our example below, we used a List Match.
    • Setting a Header Extractor is not always necessary, but can help Grooper better understand where the OMR information is located on the document.
    • A Header Extractor can be useful when the text for the OMR choices shows up in other areas on the document.
  7. Click over to the "Tester" tab and test your extraction to ensure the desired data is extracted properly.