2021:Labeled OMR (Extractor Type)

Labeled OMR is an extractor used to output OMR checkbox labels. It determines whether labeled checkboxes are checked or not. If checked, it outputs the label(s) as the result.

About

Documents use checkboxes to make our life easier. They are particularly prevalent on structured forms. It gives the person filling out the form the ability to just check a box next to a series of options rather than typing in the information.

However, most of Grooper's extraction centers around regular expression, matching text patterns and returning the result. There isn't necessarily a character to match a checked checkbox. Regular expression isn't going to cut it to determine if a box is checked or not.

This is where OMR comes into play. OMR stands for "Optical Mark Recognition". OMR determines checkbox states. The basic idea behind it is very simple. First find a box. A box is just four lines connected to each other in a square-like fashion. If that box has a mark of some kind inside it, it is checked. If not, it's not. Checked (or marked) boxes, whether a checked "x" (☒), a checkmark (☑), or a check block (▣), while have more black pixels inside the box than an unchecked (or unmarked) one (☐). If the detected box has a high threshold of black pixels in it, it's checked (or marked). If not, it's unchecked (or unmarked).

A simple example would be a document asking a question and giving two boxes to check “Yes” or “No.” For example, see the portion of the document below asking if the applicant is a U.S. Citizen. “Yes” or “No” would be the labels. Either “Yes” or “No” would be the field's final result, depending on which box is checked. In this case, "Yes".

In general, what you want to extract is the text of the checked label. The Labeled OMR extractor allows you to do just that.

First, you will set up an extractor to locate the text labels.

Or, use Label Sets to locate the text tables.

Then, Grooper's OMR detection will determine if there is a box next to the label, and whether or not that box is checked.

Last, if the label is checked, the label is returned as the extractor's result.

FYI

Labeled OMR has multiple extraction modes depending on how checkboxes behave on the document. There is also a Boolean mode to simply output "True" or "False" if a single checkbox is checked or not. We will discuss the different extraction modes further in the #How To section of this article.

How To

Assign the Extractor

The Labeled OMR extractor can be utilized in two ways:

As a Value Reader's extractor type.
As an object's extractor property configuration. For example:
- As a Data Field's Value Extractor property's extractor configuration.
- As a Data Type's Local Extractor property's extractor configuration.
- As a Document Type's Positive Extractor property's extractor configuration.
- And more!

Value ReaderExtractor Property

Value Reader

The Labeled OMR extractor is one of the extractor types available to the Value Reader extractor object.

This is a Value Reader object.
For its Extractor Type property, Labeled OMR is selected.
The Labeled OMR extractor's configuration is set up in this property grid.
This Labeled OMR extractor is configured to determine which box is checked of three options on the document (DOMESTIC, FARM, or SHOW).
The Value Reader returns the label with a checked box next to it (FARM on this document).

Extractor Property

You may also configure a Labeled OMR extractor when configuring an extractor property. Many Grooper objects have some kind of extractor property in their property grids. Labeled OMR is one of the options that can be selected as the extractor type.

For example, Data Field objects have a Value Extractor property, which collects a result when the Data Model is extracted during the Extract activity.

This is a Data Field object.
For its Value Extractor property, Labeled OMR is selected.
The Labeled OMR extractor's configuration is set up in the Value Extractor property's sub-property grid, OR can be configured using the Extractor Editor window by pressing this ellipsis button.
This Labeled OMR extractor is configured to determine which box is checked of three options on the document (DOMESTIC, FARM, or SHOW).
When the Data Field is collected during the Extract activity, the label with a checked box next to it (FARM on this document) is returned.

Configure the Extractor Part 1: OMR Labels

The first part of the Labeled OMR extractor's configuration is label extraction. Labels can be collected in one of three ways:

Using the Label Extractor property.
Using the List Values settings of a Data Field.
Collecting labels for the OMR labels when using Label Sets.
- When we get to this point, this article will presume you have some familiarity with Label Sets and the Labeling Behavior functionality. For more information on Label Sets please visit the Label Sets article.

To illustrate this, we will configure extraction for a single Data Field, detailing how each of these three different methods get a result.

Our example document is an "Application For Cow Ownership" form.
This form lists the "Type of Cow Applied for" using checkboxes
- Either "DOMESTIC", "FARM", or "SHOW".
We will use this Data Field named "13. Type of Cow" to collect the choice indicated on the document.
We have assigned the Data Field's Value Extractor to Labeled OMR.

FYI

Be aware the Mode property is very important to Labeled OMR's configuration.

For the time being, we will use the default CheckOne mode. This will presume for multiple labeled boxes only one may be checked. We will discuss the different OMR modes in the #Configure the Extractor Part 2: OMR Modes section of this article.

At this point, the Labeled OMR extractor is totally unconfigured. Next, we will detail each of the three different ways to extract OMR labels. While the configuration is slightly different, the goal is the same: Locate text labels next to checkboxes. Each method has its own strengths and weaknesses, giving you flexibility in how you locate the OMR labels based on your documents' circumstances.

Using the Label ExtractorUsing a Data Field's List ValuesUsing Label Sets

Using the Label Extractor

Moderate to high level of work up front. High flexibility in configuration options.

One way to locate OMR labels is by configuring the Labeled OMR extractor's Label Extractor property. In some ways, this is the most "effort intensive" of the three options. It will require you to configure an extractor to return each of the labels for the set of OMR checkboxes. This means a lot of manual configuration of property grids and/or external extractor objects, depending on the complexity of your documents.

However, it is also extremely reliable with a huge amount of flexibility. Since you configure an extractor to return the labels, you have all the extraction tools available to Grooper's suite of extraction types and extraction logic.

When other methods can't get the job done, configuring the Label Extractor property will be your go-to method to locate OMR labels.

For this method OMR labels are located using an extractor's results.

This extractor is configured using the Label Extractor property.
Select the extractor type you wish to configure using the dropdown list.
- Most often, you will use the List Match extractor to return labels next to OMR checkboxes. We will select List Match for this exercise.
- However, you can use whatever extractor types and techniques you choose, as long as the extractors end results are your OMR labels.

We will configure the extractor in the Extractor Editor by pressing the ellipsis button at the end of the Label Extractor property.
Regardless of the specific extractor you choose to configure, your goal will be the same. Return one result for each individual label in the group of checkboxes.
- These will supply Labeled OMR with data instances that should be have checkboxes nearby. Then, Grooper will look for checkboxes around the data instances, determine which ones are checked, and return whichever data instance has a checked box next to it.
- In our case, we're wanting to return the labels "DOMESTIC (Home)" "FARM (Agriculture)" and "SHOW (Beauty)".

Essentially, we want to return a list of OMR labels. The List Match extractor is well-suited for this task.

Using the Local Entries list type the list of OMR labels.
- DOMESTIC (Home)
- FARM (Agriculture)
- SHOW (Beauty)
Ensure the extractor returns data instances next to the OMR checkboxes on the document.
The extractor should return one result for each individual label.

FYI

It is generally preferable to capture the full label when possible.

For example, we would want to collect "DOMESTIC (Home)" rather than "DOMESTIC". We can always translate our result's output later (which we will do shortly).

Why is this important? It has to do with how Grooper isolates label groups using "noise characters". We will discuss this further in the #Maximum Noise section of this article.

Using a Data Field's List Values

A simple solution for the most simple cases.

Using Label Sets

Harness the power of Label Sets. Simple set up. Easy output translation.

Maximum Noise

Header Labels

Configure the Extractor Part 2: OMR Modes

After you locate the OMR labels you must determine how checkboxes behave on your document and choose an OMR mode.

Checkboxes detail information in one of three ways, either:

You will have several checkboxes next to several label options, giving you a list of choices. Of these choices, you may choose only one
You will have several checkboxes next to several label options and you may choose multiple.
You will have a single checkbox and it really just matters whether the checkbox is checked or not.

Labeled OMR has three corresponding Modes to account for this:

CheckOne
CheckMulti
Boolean

How your document is formatted informs how the checkboxes behave, which will inform which mode you choose.

CheckOneCheckMultiBoolean

CheckOne

CheckMulti

Boolean

Version Differences

Radio Button Detection (2021)

In version 2021, the Labeled OMR extractor's functionality was expanded to allow for radio button extraction. Prior to 2021, Grooper could only perform OMR using square or rectangular checkboxes. In 2021, Labeled OMR is able to analyze checkboxes (whether circular or rectangular) at the time of extraction, granting it the ability to detect radio buttons.

This also increased the extractor's functionality in that Layout Data is no longer strictly required for the extractor to function. It will use that Layout Data if present (which is preferred when possible), but it can now analyze the image at time of extraction if Layout Data is not present.

Extractor Expansion (2021)

Prior to version 2021, the Labeled OMR extractor type was only accessible using the Value Extractor property of certain objects, such as Data Fields.

In version 2021, all extractor types were expanded to the various extractor properties of all objects. This allows the Labeled OMR extractor to be utilized in ways never before possible.

For example, a Data Type extractor can now use Labeled OMR as its Local Extractor's extractor type. Prior to 2021, you could not use Labeled OMR to extract OMR labels using an extractor object.

Labeled OMR Introduction (2.90)

In version 2.80, Labeled OMR is referred to as Anchored OMR. The two features are configured and function nearly the same.

Prior to version 2.80, this functionality would been performed using the "Data Element Profiles" tab of a Document Type and drawing "OMR Zones" around the checkboxes to read their check states. Grooper has moved away from "Data Element Profiles" in favor of configuring the functionality directly on Data Elements in a Data Model, using extractor types such as Labeled OMR.