2023:Labeled OMR (Value Extractor): Difference between revisions
No edit summary |
No edit summary |
||
| Line 188: | Line 188: | ||
#* However, you can use whatever extractor types and techniques you choose, as long as the extractors end results are your OMR labels. | #* However, you can use whatever extractor types and techniques you choose, as long as the extractors end results are your OMR labels. | ||
|valgin=top| | |valgin=top| | ||
[[File: | [[File:2023 Labeled OMR - 2023 02 How To 02 Using the Label Extractor 01.png]] | ||
|- | |- | ||
|valign=top| | |valign=top| | ||
<br> | <br> | ||
# We will configure the extractor in the '''''Extractor Editor''''' by pressing the ellipsis button at the end of the '''''Label Extractor''''' property. | # We will configure the extractor in the '''''Extractor Editor''''' by pressing the ellipsis button at the end of the '''''Label Extractor''''' property. | ||
|valgin=top| | |||
[[File:2023 Labeled OMR - 2023 02 How To 02 Using the Label Extractor 02.png]] | |||
|- | |||
|valign=top| | |||
# Regardless of the specific extractor you choose to configure, your goal will be the same. Return one result for each individual label in the group of checkboxes. | # Regardless of the specific extractor you choose to configure, your goal will be the same. Return one result for each individual label in the group of checkboxes. | ||
#* These will supply '''''Labeled OMR''''' with data instances that should be have checkboxes nearby. Then, Grooper will look for checkboxes around the data instances, determine which ones are checked, and return whichever data instance has a checked box next to it. | #* These will supply '''''Labeled OMR''''' with data instances that should be have checkboxes nearby. Then, Grooper will look for checkboxes around the data instances, determine which ones are checked, and return whichever data instance has a checked box next to it. | ||
#* In our case, we're wanting to return the labels "DOMESTIC (Home)" "FARM (Agriculture)" and "SHOW (Beauty)". | #* In our case, we're wanting to return the labels "DOMESTIC (Home)" "FARM (Agriculture)" and "SHOW (Beauty)". | ||
|valign=top| | |valign=top| | ||
[[File: | [[File:2023 Labeled OMR - 2023 02 How To 02 Using the Label Extractor 03.png]] | ||
|- | |- | ||
|valign=top| | |valign=top| | ||
| Line 221: | Line 225: | ||
|} | |} | ||
|valign=top| | |valign=top| | ||
[[File: | [[File:2023 Labeled OMR - 2023 02 How To 02 Using the Label Extractor 04.png]] | ||
|- | |- | ||
|valign=top| | |valign=top| | ||
| Line 227: | Line 231: | ||
That's it! Grooper will now analyze the pixels around the labels' data instances, determine if anything around it is a checkbox, determine their checkbox states (checked or not checked), and return the data instance (in other words, the OMR label) next to a checked box. | That's it! Grooper will now analyze the pixels around the labels' data instances, determine if anything around it is a checkbox, determine their checkbox states (checked or not checked), and return the data instance (in other words, the OMR label) next to a checked box. | ||
# Now that the '''''Label Extractor''''' is configured, the '''''Labeled OMR''''' extractor can find OMR labels. | # Now that the '''''Label Extractor''''' is configured, the '''''Labeled OMR''''' extractor can find OMR labels. Grooper determined there was a checked box next to the label <code>DOMESTIC (HOME)</code> | ||
# That label is returned, populating the '''Data Field'''. | # That label is returned, populating the '''Data Field'''. | ||
|valign=top| | |valign=top| | ||
[[File: | [[File:2023 Labeled OMR - 2023 02 How To 02 Using the Label Extractor 05.png]] | ||
|} | |} | ||
=== Tips and Tricks: Translating Output === | === Tips and Tricks: Translating Output === | ||
| Line 246: | Line 249: | ||
# Under the '''''Output''''' properties, change the '''''Translate''''' property to ''True''. | # Under the '''''Output''''' properties, change the '''''Translate''''' property to ''True''. | ||
|valign=top| | |valign=top| | ||
[[File: | [[File:2023 Labeled OMR - 2023 02 How To 02 Using the Label Extractor 06.png]] | ||
|- | |- | ||
|valign=top| | |valign=top| | ||
| Line 262: | Line 265: | ||
# However, the output is translated to the format we want. | # However, the output is translated to the format we want. | ||
|valign=top| | |valign=top| | ||
[[File: | [[File:2023 Labeled OMR - 2023 02 How To 02 Using the Label Extractor 07.png]] | ||
|- | |- | ||
|valign=top| | |valign=top| | ||
| Line 268: | Line 271: | ||
This gives us the best of both worlds! | This gives us the best of both worlds! | ||
# | # Grooper finds the label next to the checked box, and returns the formatted value we want. | ||
|valign=top| | |valign=top| | ||
[[File: | [[File:2023 Labeled OMR - 2023 02 How To 02 Using the Label Extractor 08.png]] | ||
|} | |} | ||
</tab> | </tab> | ||
Revision as of 09:18, 20 October 2023
|
WIP |
This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly. This tag will be removed upon draft completion. |

Labeled OMR is an extractor used to output OMR checkbox labels. It determines whether labeled checkboxes are checked or not. If checked, it outputs the label(s) as the result.
About
|
You may download and import the file below into your own Grooper environment (version 2021). This contains a Batch with the example document(s) discussed in this tutorial and a Content Model configured according to its instructions.
|
Documents use checkboxes to make our life easier. They are particularly prevalent on structured forms. It gives the person filling out the form the ability to just check a box next to a series of options rather than typing in the information.
However, most of Grooper's extraction centers around regular expression, matching text patterns and returning the result. There isn't necessarily a character to match a checked checkbox. Regular expression isn't going to cut it to determine if a box is checked or not.
This is where OMR comes into play. OMR stands for "Optical Mark Recognition". OMR determines checkbox states. The basic idea behind it is very simple. First find a box. A box is just four lines connected to each other in a square-like fashion. If that box has a mark of some kind inside it, it is checked. If not, it's not. Checked (or marked) boxes, whether a checked "x" (☒), a checkmark (☑), or a check block (▣), while have more black pixels inside the box than an unchecked (or unmarked) one (☐). If the detected box has a high threshold of black pixels in it, it's checked (or marked). If not, it's unchecked (or unmarked).
A simple example would be a document asking a question and giving two boxes to check “Yes” or “No.” For example, see the portion of the document below asking if the applicant is a U.S. Citizen. “Yes” or “No” would be the labels. Either “Yes” or “No” would be the field's final result, depending on which box is checked. In this case, "Yes".
![]() |
In general, what you want to extract is the text of the checked label. The Labeled OMR extractor allows you to do just that.
|
First, you will set up an extractor to locate the text labels.
|
|
|
Then, Grooper's OMR detection will determine if there is a box next to the label, and whether or not that box is checked. |
|
|
Last, if the label is checked, the label is returned as the extractor's result. |
| FYI |
Labeled OMR has multiple extraction modes depending on how checkboxes behave on the document. There is also a Boolean mode to simply output "True" or "False" if a single checkbox is checked or not. We will discuss the different extraction modes further in the #How To section of this article. |
How To
Assign the Extractor
The Labeled OMR extractor can be utilized in two ways:
- As a Value Reader's extractor type.
- As an object's extractor property configuration. For example:
- As a Data Field's Value Extractor property's extractor configuration.
- As a Data Type's Local Extractor property's extractor configuration.
- As a Document Type's Positive Extractor property's extractor configuration.
- And more!
Value Reader
|
The Labeled OMR extractor is one of the extractor types available to the Value Reader extractor object.
|
|
|
Extractor Property
You may also configure a Labeled OMR extractor when configuring an extractor property. Many Grooper objects have some kind of extractor property in their property grids. Labeled OMR is one of the options that can be selected as the extractor type.
|
For example, Data Field objects have a Value Extractor property, which collects a result when the Data Model is extracted during the Extract activity.
|
|
|
Configure the Extractor Part 1: OMR Labels
The first part of the Labeled OMR extractor's configuration is label extraction. Labels can be collected in one of three ways:
- Using the Label Extractor property.
- Using the List Values settings of a Data Field.
- Collecting labels for the OMR labels when using Label Sets.
- When we get to this point, this article will presume you have some familiarity with Label Sets and the Labeling Behavior functionality. For more information on Label Sets please visit the Label Sets article.
|
|
At this point, the Labeled OMR extractor is totally unconfigured. Next, we will detail each of the three different ways to extract OMR labels. While the configuration is slightly different, the goal is the same: Locate text labels next to checkboxes. Each method has its own strengths and weaknesses, giving you flexibility in how you locate the OMR labels based on your documents' circumstances.
Using the Label Extractor
Moderate to high level of work up front. High flexibility in configuration options.
One way to locate OMR labels is by configuring the Labeled OMR extractor's Label Extractor property. In some ways, this is the most "effort intensive" of the three options. It will require you to configure an extractor to return each of the labels for the set of OMR checkboxes. This means a lot of manual configuration of property grids and/or external extractor objects, depending on the complexity of your documents.
However, it is also extremely reliable with a huge amount of flexibility. Since you configure an extractor to return the labels, you have all the extraction tools available to Grooper's suite of extraction types and extraction logic.
When other methods can't get the job done, configuring the Label Extractor property will be your go-to method to locate OMR labels.
|
|
|||
|
|
|||
|
|||
|
|
|||
|
|
Tips and Tricks: Translating Output
Often, it is the case the label on the document is not exactly what you want to collect for your data set. You may want to adjust the output value in one way or another. For example, you may want to collect the value "FARM" instead of the full text "FARM (Agriculture)".
This is easily done when using the List Match extractor for your Label Extractor.
|
|
|
|
|
|
|
|
Using a Data Field's List Values
A simple solution for the most simple cases.
This next method is fantastic... if it works for you. It is extremely simple to set up, but has the most limitations. However, for straightforward OMR extraction, it is highly effective with little setup involved.
This method uses a Data Field's List Values settings to function. Typically, the List Values property is configured to aid in human review during a Review step in a Batch Process. It allows you to enter a list of values the user can pick from during review. Labeled OMR has special interactivity with the List Values property. If you do not configure the Label Extractor, Grooper will check to see if any List Values have been entered. If so, it will attempt to match the items in List Values list as the OMR labels.
- This could be a knock-on benefit in that you might want to configure a List Values list for OMR fields regardless to make your document reviewer's work easier (and potentially more accurate).
- What is a group of checkboxes but a list of values to select from on a document? If human review is part of your process, you might be using List Values to give your document reviewers a selection list of checkbox options anyway. If it turns out those List Values match the OMR labels anyway, great! There's no need to configure a Label Extractor for Labeled OMR in that case. Two birds. One property configuration.
|
|
|
|
|
|
|
|
|
|
|
| ⚠ |
This method's strength and limitation lies in its simplicity. It will not work for every situation.
|
Using Label Sets
Harness the power of Label Sets. Simple set up. Easy output translation.
Grooper's Label Set functionality provides powerful document extraction and classification capabilities by leveraging the prevalence and utility of field labels. Labeled OMR is a "Label Set aware" extractor. OMR labels can be collected for a Data Field's set of labels and used at time of extraction in place of a Label Extractor.
If you are using Label Sets in your solution, this approach will most likely be the one for you. The setup is fairly simple, and translating/formatting your output is a breeze.
- This article presumes you have some awareness of Label Sets and the Labeling Behavior. For more information, please visit the Label Sets article.
|
|
|
|
|
|
|
|
|
|
|



























