2021:Labeled OMR (Extractor Type)

Labeled OMR is an extractor used to output OMR checkbox labels. It determines whether labeled checkboxes are checked or not. If checked, it outputs the label(s) as the result.
About
Documents use checkboxes to make our life easier. They are particularly prevalent on structured forms. It gives the person filling out the form the ability to just check a box next to a series of options rather than typing in the information.
However, most of Grooper's extraction centers around regular expression, matching text patterns and returning the result. There isn't necessarily a character to match a checked checkbox. Regular expression isn't going to cut it to determine if a box is checked or not.
This is where OMR comes into play. OMR stands for "Optical Mark Recognition". OMR determines checkbox states. The basic idea behind it is very simple. First find a box. A box is just four lines connected to each other in a square-like fashion. If that box has a mark of some kind inside it, it is checked. If not, it's not. Checked (or marked) boxes, whether a checked "x" (☒), a checkmark (☑), or a check block (▣), while have more black pixels inside the box than an unchecked (or unmarked) one (☐). If the detected box has a high threshold of black pixels in it, it's checked (or marked). If not, it's unchecked (or unmarked).
A simple example would be a document asking a question and giving two boxes to check “Yes” or “No.” For example, see the portion of the document below asking if the applicant is a U.S. Citizen. “Yes” or “No” would be the labels. Either “Yes” or “No” would be the field's final result, depending on which box is checked. In this case, "Yes".
![]() |
![]() |
In general, what you want to extract is the text of the checked label. The Labeled OMR extractor allows you to do just that.
First, you will set up an extractor to locate the text labels.
|
|
Then, Grooper's OMR detection will determine if there is a box next to the label, and whether or not that box is checked. |
|
Last, if the label is checked, the label is returned as the extractor's result. |
FYI |
Labeled OMR has multiple extraction modes depending on how checkboxes behave on the document. There is also a Boolean mode to simply output "True" or "False" if a single checkbox is checked or not. We will discuss the different extraction modes further in the #How To section of this article. |
How To
Assign the Extractor
The Labeled OMR extractor can be utilized in two ways:
- As a Value Reader's extractor type.
- As an object's extractor property configuration. For example:
- As a Data Field's Value Extractor property's extractor configuration.
- As a Data Type's Local Extractor property's extractor configuration.
- As a Document Type's Positive Extractor property's extractor configuration.
- And more!
Value Reader
The Labeled OMR extractor is one of the extractor types available to the Value Reader extractor object.
|
Extractor Property
You may also configure a Labeled OMR extractor when configuring an extractor property. Many Grooper objects have some kind of extractor property in their property grids. Labeled OMR is one of the options that can be selected as the extractor type.
For example, Data Field objects have a Value Extractor property, which collects a result when the Data Model is extracted during the Extract activity.
|
Configure the Extractor Part 1: OMR Labels
The first part of the Labeled OMR extractor's configuration is label extraction. Labels can be collected in one of three ways:
- Using the Label Extractor property.
- Using the List Values settings of a Data Field.
- Collecting labels for the OMR labels when using Label Sets.
- When we get to this point, this article will presume you have some familiarity with Label Sets and the Labeling Behavior functionality. For more information on Label Sets please visit the Label Sets article.
|
At this point, the Labeled OMR extractor is totally unconfigured. Next, we will detail each of the three different ways to extract OMR labels. While the configuration is slightly different, the goal is the same: Locate text labels next to checkboxes. Each method has its own strengths and weaknesses, giving you flexibility in how you locate the OMR labels based on your documents' circumstances.
Using the Label Extractor
Moderate to high level of work up front. High flexibility in configuration options.
One way to locate OMR labels is by configuring the Labeled OMR extractor's Label Extractor property. In some ways, this is the most "effort intensive" of the three options. It will require you to configure an extractor to return each of the labels for the set of OMR checkboxes. This means a lot of manual configuration of property grids and/or external extractor objects, depending on the complexity of your documents.
However, it is also extremely reliable with a huge amount of flexibility. Since you configure an extractor to return the labels, you have all the extraction tools available to Grooper's suite of extraction types and extraction logic.
When other methods can't get the job done, configuring the Label Extractor property will be your go-to method to locate OMR labels.
|
|||
|
|||
|
Using a Data Field's List Values
A simple solution for the most simple cases.
Using Label Sets
Harness the power of Label Sets. Simple set up. Easy output translation.
Maximum Noise
Header Labels
Configure the Extractor Part 2: OMR Modes
After you locate the OMR labels you must determine how checkboxes behave on your document and choose an OMR mode.
Checkboxes detail information in one of three ways, either:
- You will have several checkboxes next to several label options, giving you a list of choices. Of these choices, you may choose only one
- You will have several checkboxes next to several label options and you may choose multiple.
- You will have a single checkbox and it really just matters whether the checkbox is checked or not.
Labeled OMR has three corresponding Modes to account for this:
- CheckOne
- CheckMulti
- Boolean
How your document is formatted informs how the checkboxes behave, which will inform which mode you choose.
CheckOne
CheckMulti
Boolean
Version Differences
Radio Button Detection (2021)
In version 2021, the Labeled OMR extractor's functionality was expanded to allow for radio button extraction. Prior to 2021, Grooper could only perform OMR using square or rectangular checkboxes. In 2021, Labeled OMR is able to analyze checkboxes (whether circular or rectangular) at the time of extraction, granting it the ability to detect radio buttons.
- This also increased the extractor's functionality in that Layout Data is no longer strictly required for the extractor to function. It will use that Layout Data if present (which is preferred when possible), but it can now analyze the image at time of extraction if Layout Data is not present.
Extractor Expansion (2021)
Prior to version 2021, the Labeled OMR extractor type was only accessible using the Value Extractor property of certain objects, such as Data Fields.
In version 2021, all extractor types were expanded to the various extractor properties of all objects. This allows the Labeled OMR extractor to be utilized in ways never before possible.
- For example, a Data Type extractor can now use Labeled OMR as its Local Extractor's extractor type. Prior to 2021, you could not use Labeled OMR to extract OMR labels using an extractor object.
Labeled OMR Introduction (2.90)
In version 2.80, Labeled OMR is referred to as Anchored OMR. The two features are configured and function nearly the same.
Prior to version 2.80, this functionality would been performed using the "Data Element Profiles" tab of a Document Type and drawing "OMR Zones" around the checkboxes to read their check states. Grooper has moved away from "Data Element Profiles" in favor of configuring the functionality directly on Data Elements in a Data Model, using extractor types such as Labeled OMR.