Labeled OMR (Value Extractor): Difference between revisions

From Grooper Wiki
Redirected page to Labeled OMR - 2.90
Tag: New redirect
Line 1: Line 1:
[[File:Labeled-omr-about-01.png|thumb|200px|An example of checkboxes.]]
#REDIRECT [[Labeled OMR - 2.90]]
 
<blockquote style="font-size:14pt">
''Labeled OMR'' is an extractor used to output OMR checkbox labels.  It determines whether labeled checkboxes are checked or not and, if checked, outputs the label as its result.
</blockquote>
 
== About ==
 
Documents use checkboxes to make our life easier.  They are particularly prevalent on structured forms.  It gives the person filling out the form the ability to just check a box next to a series of options rather than typing in the information.
 
However, most of Grooper's extraction centers around regular expression, matching text patterns and returning the result.  There isn't necessarily a character to match a checked checkbox.  Regular expression isn't going to cut it to determine if a box is checked or not. 
 
This is where OMR comes into play.  OMR stands for "Optical Mark Recognition".  OMR determines checkbox states.  The basic idea behind it is very simple.  First find a box.  A box is just four lines connected to each other in a square-like fashion.  If that box has a mark of some kind inside it, it is checked.  If not, it's not.  Checked (or marked) boxes, whether a checked "x" (<span style="font-size:120%">&#9746;</span>), a checkmark (<span style="font-size:120%">&#9745;</span>),  or a check block (<span style="font-size:120%">&#9635;</span>), while have more black pixels inside the box than an unchecked (or unmarked) one (<span style="font-size:120%">&#9744;</span>).  If the detected box has a high threshold of black pixels in it, it's checked (or marked).  If not, it's unchecked (or unmarked).
 
A simple example would be a document asking a question and giving two boxes to check “Yes” or “No.”  For example, see the portion of the document below asking if the applicant is a U.S. Citizen.  “Yes” or “No” would be the labels.  Either “Yes” or “No” would be the field's final result, depending on which box is checked.  In this case, "Yes".
 
{|style="margin:auto; text-align:center" cellspacing="10" cellpadding="5"
|-
|[[file:1573055869908-200.png|center]]
|-
|[[File:Labeled-omr-about-02.png|center|594px]]
|}
 
The ''Labeled OMR'' extractor is a '''''Value Extractor''''' option for '''Data Fields''' in a '''Data Model'''.
 
In general, what you want to extract is the text of the checked ''label''.  The ''Labeled OMR'' extractor allows you to do just that.  You will set up an extractor to locate the text label.  Grooper's OMR detection will determine if the box next to the label is checked.  And, the label is returned as the '''Data Field's''' result.
 
== Use Cases ==
 
Any document using checkboxes can take advantage of this functionality.  There is a wide variety of use cases, including application forms, surveys, and questionnaires.
 
== How To ==
 
=== Configure the Extractor ===
 
Checkboxes detail information in one of three ways, either:
 
* You will have several checkboxes next to several label options, giving you a list of choices.  Of these choices, you may choose ''only one''
* You will have several checkboxes next to several label options and you may choose ''multiple''.
* You will have a single checkbox and it really just matters whether the checkbox is checked or not.
 
''Labeled OMR'' has three corresponding '''''Modes''''' to account for this:
* ''CheckOne''
* ''CheckMulti''
* ''Boolean''
 
How your document is structured will inform which mode you choose.  However, most of the extractor's configuration is the same regardless of which one you choose.
 
<tabs margin:20px>
<tab name="Prereqs - Box Detection" style="margin:20px">
=== Prereqs - Box Detection ===
{|cellpadding=10 cellspacing=5
|style="width:40%" valign=top|
In order for ''Labled OMR'' to return a result, it needs to be able to find a checkbox and it needs to be able to tell if that box is checked or unchecked.  The checkbox locations and "check states" (checked or unchecked) ''must'' be saved to a page ''before'' extracting the value.
 
This information is saved to a '''Batch Page''' objects, "LayoutData.json" file during permanent or temporary [[Image Processing|image processing]], using a '''Box Detection''' or '''Box Removal''' command.
 
This means you must execute an '''IP Profile''' with a '''Box Detection''' or '''Box Removal''' command as one of it's '''IP Steps'''. 
 
# Here, we have an '''IP Profile''' titled "Layout Data". 
# It has a '''Box Detection''' command as its first step.
# You can verify boxes are detected using the "Boxes" diagnostic image.
# Detected checked boxes are green.
# Detected unchecked boxes are red.
 
Then the '''IP Profile''' must be ran on the pages in the '''Batch''' using the '''Image Processing''' activity (for permanent image processing) or '''Recognize''' activity (for temporary image processing).
|
[[File:Labeled-omr-about-03.png]]
|}
</tab>
<tab name="Assign the Value Extractor" style="margin:20px">
=== Assign the Value Extractor ===
 
{|cellpadding=10 cellspacing=5
|style="width:40%" valign=top|
''Labeled OMR'' is an option for a '''Data Field's''' '''''Value Extractor''''' property.
 
# Select a '''Data Field''' in a '''Data Model'''
#* The one we have selected is named "(CheckMulti) This estimate includes".  We'll talk more about what "CheckMulti" means later.
# Select the '''''Value Extractor''''' property.
# Choose ''Labeled OMR'' from the dropdown list.
|
[[File:Labeled-omr-how-to-02.png]]
|}
</tab>
<tab name="Configure the Label Extractor" style="margin:20px">
=== Configure the Label Extractor ===
{|cellpadding=10 cellspacing=5
|style="width:40%" valign=top|
# Expand the ''Labeled OMR'' sub-properties.  The first one you'll see is '''''Label Extractor'''''.  Here, you will use a ''Text Pattern'' or ''Reference'' an extractor in the Node Tree to locate the text labels next to a checkbox.
# For our first example, we want to know the information under "This estimate includes"
#* Our goal is to determine which of the following three options are checked:  Property Taxes, Homeowner's Insurance, or Other.
# In this case, this will be a simple extractor.  We have chosen ''Text Pattern''.
|
[[File:Labeled-omr-how-to-03.png]]
|-
|valign=top|
The goal of the extractor is to produce a result for each checkbox option.  We have created a regular expression pattern that is an "or separated list".  Each label followed by the vertical pipe character.
 
# We have entered the following pattern in our '''''Value Pattern''''':
<pre>
Property Taxes|
Homeowner's Insurance|
Other:
</pre>
#<li value=2>All three labels next to the checkbox are returned as ''individual'' values.
# You may notice we're getting a lot more than just these three values in our results list.
#* That's actually ok ''in this case''.  None of these other results have boxes next to them.  Because ''Labeled OMR'' is looking for results that have checkboxes next to them, these results will ultimately be thrown out.
|
[[File:Labeled-omr-how-to-04.png]]
|}
</tab>
<tab name="Verify Extraction" style="margin:20px">
=== Verify Extraction ===
 
We will go ahead and test our extraction and see what results we get.  However, before continuing, we will point out the '''''Box Location''''' and '''''Max Distance''''' properties.  These properties behave as advertised.  '''''Box Location''''' determines where the box is in relation to the label.  The default of ''West'' assumes the box will be to the left of the label, changing it to ''East'' will assume the box is to the right, and so on.  The ''Max Distance'' property sets the maximum space allowable between a box and a label.  You expect check boxes to be fairly close to their label.  If a box is on the other side of a page from a label, it typically does not pertain to that label at all.  The default of ''0.25in'' works well in most cases.  However, it is editable.
 
{|cellpadding=10 cellspacing=5
|style="width:40%" valign=top|
# Press the "Test Extraction" button.
# It looks like we are successful!  The box next to "Homeowner's Insurance" is checked.
# The label value "Homeowner's Insurance" is returned to the '''Data Field'''.
|
[[File:Labeled-omr-how-to-05.png]]
|-
|valign=top|
Not so fast.  Let's check another document.
 
# Any combination of these boxes may be checked.  One might be checked.  They all might be checked.  Even none might be checked!
# However, only the first label, "Homeowner's Insurance" is returned.
# That is because the default '''''Mode''''' property of ''CheckOne'' assumes ''only'' one box may be checked.  We need to use a different '''''Mode'''''.
|
[[File:Labeled-omr-how-to-06.png]]
|}
</tab>
<tab name="CheckMulti Mode" style="margin:20px">
=== CheckMulti Mode===
{|cellpadding=10 cellspacing=5
|style="width:40%" valign=top|
Since multiple boxes may be checked, we need to change the '''''Mode''''' property to ''CheckMulti''.  This will return all results as a concatenated string.
 
# Select the '''''Mode''''' property.
# Choose ''CheckMulti''
|
[[File:Labeled-omr-how-to-07.png]]
|-
|valign=top|
Upon extraction now, the two results return as a single string value, "Homeowner's InsuranceOther:"
|
[[File:Labeled-omr-how-to-08.png]]
|-
|valign=top|
# If you wish, you may use the '''''Separator String''''' property to insert a character (or several characters) between each result.
# For example, here we set the '''''Separator String's''''' value to the pipe character (<code>|</code>), returning a pipe delimited list.
|
[[File:Labeled-omr-how-to-09.png]]
|}
</tab>
<tab name = "CheckOne Mode" style="margin:20px">
=== CheckOne Mode ===
 
{|cellpadding=10 cellspacing=5
|style="width:40%" valign=top|
As the name implies, the ''CheckOne'' '''''Mode''''' assumes only one check box is checked.  It will only return a maximum of one result.
 
However, its '''''Text Extractor''''' is configured in much the same way.  Here, we're looking for two options:
# the sentence indicating the lender "will allow" assumption of the loan or
# the sentence indicating the lender "will not allow" assumption of the loan.
 
Our regular expression is very similar to our ''CheckMulti'' option.  We've created an "or separated list" using the pipe character (<code>|</code>) matching the text next to the two text boxes:
 
<pre>
will allow|
will not allow
</pre>
|
[[File:Labeled-omr-how-to-10.png]]
|-
|valign=top|
# ''CheckOne'' is the default option for the '''''Mode''''' property.
# Press the "Test Extraction" button to verify the results.
# Notice the label will be outlined in blue and the checked box will be shaded green.
#* Unchecked boxes will be shaded grey.
# The label next to the checked box populates the '''Data Field'''.
|
[[File:Labeled-omr-how-to-11.png]]
|}
 
</tab>
<tab name="Boolean Mode" style="margin:20px">
=== Boolean Mode ===
 
{|cellpadding=10 cellspacing=5
|style="width:40%" valign=top|
The ''Boolean'' option will return a value of "True" or "False" depending on if the box is checked or not.  Sometimes checkboxes aren't used to indicate your choice of a list of options, but instead a binary "yes/no" or "true/false" type response.
 
In this case, the checkbox indicates whether or not an escrow account is used for the loan.  Our regex only needs to capture a single label next to the box.  The first part of the sentence will put us right next to that check box:
 
<pre>
will have an escrow account
</pre>
|
[[File:Labeled-omr-how-to-12.png]]
|-
|valign=top|
# Change the '''''Mode''''' property to ''Boolean''.
# Press the "Test Extraction" button to verify the results.
# Notice for the ''Boolean'' mode, the checkbox (whether checked or not) will be shaded green and the label will be outlined in blue.
# Since we're looking for a boolean value, rather than the text label returned a value of "False" is returned if the box is unchecked, "True" if it is.
# The output value can also be changed using the '''''Value If Checked''''' and '''''Value If Unchecked''''' properties.
|
[[File:Labeled-omr-how-to-13.png]]
|}
</tab>
</tabs>
 
== Version Differences ==
 
In version 2.80, ''Labeled OMR'' is referred to as ''[[Anchored OMR]]''.  The two features are configured and function nearly the same.
 
Prior to version 2.80, this functionality would been performed using the "Data Element Profiles" tab of a '''Document Type''' and drawing "OMR Zones" around the checkboxes to read their check states.  Grooper has moved away from "Data Element Profiles" in favor of configuring the functionality directly on '''Data Elements''' in a '''Data Model''', using '''''Value Extractors''''' such as ''Labeled OMR''
 
== See Also ==
 
* [[Anchored OMR]]
* [[OMR Reader (Result Post Processor)]]

Revision as of 09:56, 3 June 2022

Redirect to: