2023:Labeled OMR (Value Extractor): Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Line 91: Line 91:
# For its '''''Extractor Type''''' property, ''Labeled OMR'' is selected.
# For its '''''Extractor Type''''' property, ''Labeled OMR'' is selected.
# The '''''Labeled OMR''''' extractor's configuration is set up in this property grid.
# The '''''Labeled OMR''''' extractor's configuration is set up in this property grid.
|
[[File:2023 Labeled OMR - 2023 01 How To 01 Value Reader 01.png]]
|-
|valign=top|
# This '''''Labeled OMR''''' extractor is configured to determine which box is checked of three options on the document (DOMESTIC, FARM, or SHOW).
# This '''''Labeled OMR''''' extractor is configured to determine which box is checked of three options on the document (DOMESTIC, FARM, or SHOW).
# The '''Value Reader''' returns the label with a checked box next to it (FARM on this document).
# The '''Value Reader''' returns the label with a checked box next to it (DOMESTIC on this document).
|valign=top|
|
[[File:2021-labeled-omr-how-to-01.png]]
[[File:2023 Labeled OMR - 2023 01 How To 01 Value Reader 02.png]]
|}
|}
</tab>
</tab>
Line 108: Line 112:
# This is a '''Data Field''' object.
# This is a '''Data Field''' object.
# For its '''''Value Extractor''''' property, ''Labeled OMR'' is selected.
# For its '''''Value Extractor''''' property, ''Labeled OMR'' is selected.
# The '''''Labeled OMR''''' extractor's configuration is set up in the '''''Value Extractor''''' property's sub-property grid, OR can be configured using the Extractor Editor window by pressing this ellipsis button.
# The '''''Labeled OMR''''' extractor's configuration is set up in the '''''Value Extractor''''' property's sub-property grid, OR can be configured using the Extractor Editor window by pressing the ellipsis button by the '''''Value Extractor''''' property.
|
[[File:2023 Labeled OMR - 2023 01 How To 02 Extractor Property 01.png]]
|-
|valign=top|
# This '''''Labeled OMR''''' extractor is configured to determine which box is checked of three options on the document (DOMESTIC, FARM, or SHOW).
# This '''''Labeled OMR''''' extractor is configured to determine which box is checked of three options on the document (DOMESTIC, FARM, or SHOW).
# When the '''Data Field''' is collected during the '''Extract''' activity, the label with a checked box next to it (FARM on this document) is returned.
# When the '''Data Field''' is collected during the '''Extract''' activity, the label with a checked box next to it (DOMESTIC on this document) is returned.
|valign=top|
|
[[File:2021-labeled-omr-how-to-02.png]]
[[File:2023 Labeled OMR - 2023 01 How To 02 Extractor Property 02.png]]
|}
|}
</tab>
</tab>

Revision as of 14:30, 19 October 2023

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.

An example of checkboxes.

Labeled OMR is an extractor used to output OMR checkbox labels. It determines whether labeled checkboxes are checked or not. If checked, it outputs the label(s) as the result.

About

You may download and import the file below into your own Grooper environment (version 2021). This contains a Batch with the example document(s) discussed in this tutorial and a Content Model configured according to its instructions.


Documents use checkboxes to make our life easier. They are particularly prevalent on structured forms. It gives the person filling out the form the ability to just check a box next to a series of options rather than typing in the information.

However, most of Grooper's extraction centers around regular expression, matching text patterns and returning the result. There isn't necessarily a character to match a checked checkbox. Regular expression isn't going to cut it to determine if a box is checked or not.

This is where OMR comes into play. OMR stands for "Optical Mark Recognition". OMR determines checkbox states. The basic idea behind it is very simple. First find a box. A box is just four lines connected to each other in a square-like fashion. If that box has a mark of some kind inside it, it is checked. If not, it's not. Checked (or marked) boxes, whether a checked "x" (), a checkmark (), or a check block (), while have more black pixels inside the box than an unchecked (or unmarked) one (). If the detected box has a high threshold of black pixels in it, it's checked (or marked). If not, it's unchecked (or unmarked).

A simple example would be a document asking a question and giving two boxes to check “Yes” or “No.” For example, see the portion of the document below asking if the applicant is a U.S. Citizen. “Yes” or “No” would be the labels. Either “Yes” or “No” would be the field's final result, depending on which box is checked.  In this case, "Yes".

In general, what you want to extract is the text of the checked label. The Labeled OMR extractor allows you to do just that.

First, you will set up an extractor to locate the text labels.

Then, Grooper's OMR detection will determine if there is a box next to the label, and whether or not that box is checked.

Last, if the label is checked, the label is returned as the extractor's result.

FYI

Labeled OMR has multiple extraction modes depending on how checkboxes behave on the document. There is also a Boolean mode to simply output "True" or "False" if a single checkbox is checked or not. We will discuss the different extraction modes further in the #How To section of this article.

How To

Assign the Extractor

The Labeled OMR extractor can be utilized in two ways:

  1. As a Value Reader's extractor type.
  2. As an object's extractor property configuration. For example:
    • As a Data Field's Value Extractor property's extractor configuration.
    • As a Data Type's Local Extractor property's extractor configuration.
    • As a Document Type's Positive Extractor property's extractor configuration.
    • And more!

Value Reader

The Labeled OMR extractor is one of the extractor types available to the Value Reader extractor object.

  1. This is a Value Reader object.
  2. For its Extractor Type property, Labeled OMR is selected.
  3. The Labeled OMR extractor's configuration is set up in this property grid.

  1. This Labeled OMR extractor is configured to determine which box is checked of three options on the document (DOMESTIC, FARM, or SHOW).
  2. The Value Reader returns the label with a checked box next to it (DOMESTIC on this document).

Extractor Property

You may also configure a Labeled OMR extractor when configuring an extractor property. Many Grooper objects have some kind of extractor property in their property grids. Labeled OMR is one of the options that can be selected as the extractor type.

For example, Data Field objects have a Value Extractor property, which collects a result when the Data Model is extracted during the Extract activity.

  1. This is a Data Field object.
  2. For its Value Extractor property, Labeled OMR is selected.
  3. The Labeled OMR extractor's configuration is set up in the Value Extractor property's sub-property grid, OR can be configured using the Extractor Editor window by pressing the ellipsis button by the Value Extractor property.

  1. This Labeled OMR extractor is configured to determine which box is checked of three options on the document (DOMESTIC, FARM, or SHOW).
  2. When the Data Field is collected during the Extract activity, the label with a checked box next to it (DOMESTIC on this document) is returned.