Labeled OMR (Value Extractor): Difference between revisions

Revision as of 15:04, 16 October 2020

Labeled OMR is an extractor used to output OMR checkbox labels. It determines whether labeled checkboxes are checked or not and, if checked, outputs the label as its result.

About

Documents use checkboxes to make our life easier. They are particularly prevalent on structured forms. It gives the person filling out the form the ability to just check a box next to a series of options rather than typing in the information.

However, most of Grooper's extraction centers around regular expression, matching text patterns and returning the result. There isn't necessarily a character to match a checked checkbox. Regular expression isn't going to cut it to determine if a box is checked or not.

This is where OMR comes into play. OMR stands for "Optical Mark Recognition". OMR determines checkbox states. The basic idea behind it is very simple. First find a box. A box is just four lines connected to each other in a square-like fashion. If that box has a mark of some kind inside it, it is checked. If not, it's not. Checked (or marked) boxes, whether a checked "x" (☒), a checkmark (☑), or a check block (▣), while have more black pixels inside the box than an unchecked (or unmarked) one (☐). If the detected box has a high threshold of black pixels in it, it's checked (or marked). If not, it's unchecked (or unmarked).

A simple example would be a document asking a question and giving two boxes to check “Yes” or “No.” For example, see the portion of the document below asking if the applicant is a U.S. Citizen. “Yes” or “No” would be the labels. Either “Yes” or “No” would be the field's final result, depending on which box is checked. In this case, "Yes".

The Labeled OMR extractor is a Value Extractor option for Data Fields in a Data Model.

In general, what you want to extract is the text of the checked label. The Labeled OMR extractor allows you to do just that. You will set up an extractor to locate the text label. Grooper's OMR detection will determine if the box next to the label is checked. And, the label is returned as the Data Field's result.

Use Cases

Any document using checkboxes can take advantage of this functionality. There is a wide variety of use cases, including application forms, surveys, and questionnaires.

How To

Configure the Extractor

Prereqs - Box Detection

In order for Labled OMR to return a result, it needs to be able to find a checkbox and it needs to be able to tell if that box is checked or unchecked. The checkbox locations and "check states" (checked or unchecked) must be saved to a page before extracting the value.

This information is saved to a Batch Page objects, "LayoutData.json" file during permanent or temporary image processing, using a Box Detection or Box Removal command.

This means you must execute an IP Profile with a Box Detection or Box Removal command as one of it's IP Steps.

Here, we have an IP Profile titled "Layout Data".
It has a Box Detection command as its first step.
You can verify boxes are detected using the "Boxes" diagnostic image.
Detected checked boxes are green.
Detected unchecked boxes are red.

@@ Line 2: / Line 2: @@
 <blockquote style="font-size:14pt">
-''Labeled OMR'' is an extractor to output OMR checkbox labels.  It determines whether labeled checkboxes are checked or not and, if checked, outputs the label as its result.
+''Labeled OMR'' is an extractor used to output OMR checkbox labels.  It determines whether labeled checkboxes are checked or not and, if checked, outputs the label as its result.
 </blockquote>
@@ Line 11: / Line 11: @@
 However, most of Grooper's extraction centers around regular expression, matching text patterns and returning the result.  There isn't necessarily a character to match a checked checkbox.  Regular expression isn't going to cut it to determine if a box is checked or not.
-This is where OMR comes into play.  OMR stands for "Optical Mark Recognition".  OMR determines checkbox states.  The basic idea behind it is very simple.  First find a box.  A box is just four lines connected to each other in a square-like fashion.  If that box has a mark of some kind inside it, it is checked.  If not, it's not.  Checked (or marked) boxes, whether a checked "x" (&#9746;), a checkmark (&#9745;),  or a check block (&#9635;), while have more black pixels inside the box than an unchecked (or unmarked) one (&#9744;).  If the detected box has a high threshold of black pixels in it, it's checked (or marked).  If not, it's unchecked (or unmarked).
+This is where OMR comes into play.  OMR stands for "Optical Mark Recognition".  OMR determines checkbox states.  The basic idea behind it is very simple.  First find a box.  A box is just four lines connected to each other in a square-like fashion.  If that box has a mark of some kind inside it, it is checked.  If not, it's not.  Checked (or marked) boxes, whether a checked "x" (<span style="font-size:120%">&#9746;</span>), a checkmark (<span style="font-size:120%">&#9745;</span>),  or a check block (<span style="font-size:120%">&#9635;</span>), while have more black pixels inside the box than an unchecked (or unmarked) one (<span style="font-size:120%">&#9744;</span>).  If the detected box has a high threshold of black pixels in it, it's checked (or marked).  If not, it's unchecked (or unmarked).
 A simple example would be a document asking a question and giving two boxes to check “Yes” or “No.”  For example, see the portion of the document below asking if the applicant is a U.S. Citizen.  “Yes” or “No” would be the labels.  Either “Yes” or “No” would be the field's final result, depending on which box is checked.  In this case, "Yes".
-{|style="margin:auto; text-align:center" cellspacing="10" cellpadding="2"
+{|style="margin:auto; text-align:center" cellspacing="10" cellpadding="5"
 |-
 |[[file:1573055869908-200.png|center]]
-|-style="background-color:#616364"
+|-
-|[[file:1573153126264-914.png|center|600px]]
+|[[File:Labeled-omr-about-02.png|center|594px]]
+|}
+The ''Labeled OMR'' extractor is a '''''Value Extractor''''' option for '''Data Fields''' in a '''Data Model'''.
+In general, what you want to extract is the text of the checked ''label''.  The ''Labeled OMR'' extractor allows you to do just that.  You will set up an extractor to locate the text label.  Grooper's OMR detection will determine if the box next to the label is checked.  And, the label is returned as the '''Data Field's''' result.
+== Use Cases ==
+Any document using checkboxes can take advantage of this functionality.  There is a wide variety of use cases, including application forms, surveys, and questionnaires.
+== How To ==
+=== Configure the Extractor ===
+<tabs margin:20px>
+<tab name="Prereqs - Box Detection" style="margin:20px">
+=== Prereqs - Box Detection ===
+{|cellpadding=10 cellspacing=5
+|style="width:40%" valign=top|
+In order for ''Labled OMR'' to return a result, it needs to be able to find a checkbox and it needs to be able to tell if that box is checked or unchecked.  The checkbox locations and "check states" (checked or unchecked) ''must'' be saved to a page ''before'' extracting the value.
+This information is saved to a '''Batch Page''' objects, "LayoutData.json" file during permanent or temporary [[Image Processing|image processing]], using a '''Box Detection''' or '''Box Removal''' command.
+This means you must execute an '''IP Profile''' with a '''Box Detection''' or '''Box Removal''' command as one of it's '''IP Steps'''.
+# Here, we have an '''IP Profile''' titled "Layout Data".
+# It has a '''Box Detection''' command as its first step.
+# You can verify boxes are detected using the "Boxes" diagnostic image.
+# Detected checked boxes are green.
+# Detected unchecked boxes are red.
+|
+[[File:Labeled-omr-about-03.png]]
 |}
+</tab>
+</tabs>