Labeled OMR (Value Extractor): Difference between revisions
Dgreenwood (talk | contribs) Created page with "<blockquote style="font-size:14pt"> ''Labeled OMR'' is an extractor to output OMR checkbox labels. It determines whether labeled checkboxes are checked or not and, if checked..." |
Dgreenwood (talk | contribs) No edit summary |
||
| Line 1: | Line 1: | ||
[[File:Labeled-omr-about-01.png|thumb|200px|An example of checkboxes.]] | |||
<blockquote style="font-size:14pt"> | <blockquote style="font-size:14pt"> | ||
''Labeled OMR'' is an extractor to output OMR checkbox labels. It determines whether labeled checkboxes are checked or not and, if checked, outputs the label as its result. | ''Labeled OMR'' is an extractor to output OMR checkbox labels. It determines whether labeled checkboxes are checked or not and, if checked, outputs the label as its result. | ||
Revision as of 14:21, 16 October 2020

Labeled OMR is an extractor to output OMR checkbox labels. It determines whether labeled checkboxes are checked or not and, if checked, outputs the label as its result.
About
Documents use checkboxes to make our life easier. They are particularly prevalent on structured forms. It gives the person filling out the form the ability to just check a box next to a series of options rather than typing in the information.
However, most of Grooper's extraction centers around regular expression, matching text patterns and returning the result. There isn't necessarily a character to match a checked checkbox. Regular expression isn't going to cut it to determine if a box is checked or not.
This is where OMR comes into play. OMR stands for "Optical Mark Recognition". OMR determines checkbox states. The basic idea behind it is very simple. First find a box. A box is just four lines connected to each other in a square-like fashion. If that box has a mark of some kind inside it, it is checked. If not, it's not. Checked (or marked) boxes, whether a checked "x" (☒), a checkmark (☑), or a check block (▣), while have more black pixels inside the box than an unchecked (or unmarked) one (☐). If the detected box has a high threshold of black pixels in it, it's checked (or marked). If not, it's unchecked (or unmarked).
A simple example would be a document asking a question and giving two boxes to check “Yes” or “No.” For example, see the portion of the document below asking if the applicant is a U.S. Citizen. “Yes” or “No” would be the labels. Either “Yes” or “No” would be the field's final result, depending on which box is checked. In this case, "Yes".
![]() |

