2023:Labeled OMR (Value Extractor)

From Grooper Wiki
Revision as of 11:07, 18 October 2023 by Rpatton (talk | contribs)

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.

An example of checkboxes.

Labeled OMR is an extractor used to output OMR checkbox labels. It determines whether labeled checkboxes are checked or not. If checked, it outputs the label(s) as the result.

About

You may download and import the file below into your own Grooper environment (version 2021). This contains a Batch with the example document(s) discussed in this tutorial and a Content Model configured according to its instructions.


Documents use checkboxes to make our life easier. They are particularly prevalent on structured forms. It gives the person filling out the form the ability to just check a box next to a series of options rather than typing in the information.

However, most of Grooper's extraction centers around regular expression, matching text patterns and returning the result. There isn't necessarily a character to match a checked checkbox. Regular expression isn't going to cut it to determine if a box is checked or not.

This is where OMR comes into play. OMR stands for "Optical Mark Recognition". OMR determines checkbox states. The basic idea behind it is very simple. First find a box. A box is just four lines connected to each other in a square-like fashion. If that box has a mark of some kind inside it, it is checked. If not, it's not. Checked (or marked) boxes, whether a checked "x" (), a checkmark (), or a check block (), while have more black pixels inside the box than an unchecked (or unmarked) one (). If the detected box has a high threshold of black pixels in it, it's checked (or marked). If not, it's unchecked (or unmarked).

A simple example would be a document asking a question and giving two boxes to check “Yes” or “No.” For example, see the portion of the document below asking if the applicant is a U.S. Citizen. “Yes” or “No” would be the labels. Either “Yes” or “No” would be the field's final result, depending on which box is checked.  In this case, "Yes".

In general, what you want to extract is the text of the checked label. The Labeled OMR extractor allows you to do just that.

First, you will set up an extractor to locate the text labels.

Then, Grooper's OMR detection will determine if there is a box next to the label, and whether or not that box is checked.

Last, if the label is checked, the label is returned as the extractor's result.

FYI

Labeled OMR has multiple extraction modes depending on how checkboxes behave on the document. There is also a Boolean mode to simply output "True" or "False" if a single checkbox is checked or not. We will discuss the different extraction modes further in the #How To section of this article.

How To

Assign the Extractor

The Labeled OMR extractor can be utilized in two ways:

  1. As a Value Reader's extractor type.
  2. As an object's extractor property configuration. For example:
    • As a Data Field's Value Extractor property's extractor configuration.
    • As a Data Type's Local Extractor property's extractor configuration.
    • As a Document Type's Positive Extractor property's extractor configuration.
    • And more!

Value Reader

The Labeled OMR extractor is one of the extractor types available to the Value Reader extractor object.

  1. This is a Value Reader object.
  2. For its Extractor Type property, Labeled OMR is selected.
  3. The Labeled OMR extractor's configuration is set up in this property grid.
  4. This Labeled OMR extractor is configured to determine which box is checked of three options on the document (DOMESTIC, FARM, or SHOW).
  5. The Value Reader returns the label with a checked box next to it (FARM on this document).

Extractor Property

You may also configure a Labeled OMR extractor when configuring an extractor property. Many Grooper objects have some kind of extractor property in their property grids. Labeled OMR is one of the options that can be selected as the extractor type.

For example, Data Field objects have a Value Extractor property, which collects a result when the Data Model is extracted during the Extract activity.

  1. This is a Data Field object.
  2. For its Value Extractor property, Labeled OMR is selected.
  3. The Labeled OMR extractor's configuration is set up in the Value Extractor property's sub-property grid, OR can be configured using the Extractor Editor window by pressing this ellipsis button.
  4. This Labeled OMR extractor is configured to determine which box is checked of three options on the document (DOMESTIC, FARM, or SHOW).
  5. When the Data Field is collected during the Extract activity, the label with a checked box next to it (FARM on this document) is returned.