Anchored OMR

From Grooper Wiki
Jump to navigation Jump to search

Anchored OMR determines whether labeled checkboxes are checked or not and outputs the label as its result.

This is a data extraction method used to determine check states of checkboxes on documents. OMR (or Optical Mark Recognition) is used to determine whether a box next to a text label (or “anchor”) is checked or unchecked.

About

A simple example would be a document asking a question and giving two boxes to check “Yes” or “No.” For example, see the portion of the document below asking if the applicant is a U.S. Citizen. “Yes” or “No” would be the labels. Either “Yes” or “No” would be the field's final result, depending on which box is checked.  In this case, "Yes".

1573055869908-200.png
1573153126264-914.png

Version Differences

In version 2.90, this extractor was re-named Labeled OMR. It's configuration and functionality is similar to Anchored OMR. Visit the Labeled OMR article for information on this feature in version 2.90.

Prior to 2.80, this capability was available by setting up a Data Element Profile in a Content Type object. The Anchored OMR extractor provides a simpler setup, without having to draw out individual zones around the check-boxes, and without having to explicitly train the content type with at least one document example of the document first. That, in turn, means this method can extract data across multiple Content Types without being explicitly required to configure OMR extraction logic for each.

Use Cases

Any document using check-boxes can take advantage of this functionality.  There is a wide variety of use cases, including application forms, surveys, and questionnaires.

How To: Configure the extractor

Before you begin

Anchored OMR is an extraction method to populate Data Fields or Data Columns in a Data Model. As such, you will need to create a Content Model with a Data Model with a Data Field or Data Column data element.

The anchor label is found using a data extractor. In order to find a value, you will need OCR text or extracted native text from a PDF. To do this, run your document through the Recognize activity.

Critical to this extraction method, Grooper must know where checkboxes are on the page and whether or not they are checked. To do this, run a Box Detection command on an IP Profile during a Recognize or Image Processing activity. These activities must be run on the Page level.

Set the Value Extractor to Anchored OMR

Navigate to the Data Field or Data Column you wish to populate in your Data Model. Select the "Value Extractor" property. From the dropdown list, choose "Anchored Extract".


1573142680778-971.png


Set the Label Extractor

1. Expand Anchored OMR's properties by double clicking "Value Extractor".

2. Select the "Label Extractor" property. The value(s) this extractor returns will be the label(s) by checkboxes on the page. The Label Extractor can be set to "Text Pattern" to write a regular expression local to the Data Field or "Reference" to point to the results of a Data Extractor elsewhere in the Node Tree.


1573152779582-129.png


2a. Choose "Text Pattern" to use the Pattern Editor to construct a simple extractor local to the data element. Select the ellipsis button at the end of the "Pattern" property to bring up the Pattern Editor.


1573142685744-974.png


2b. Or, you may use a Data Type or Field Class extractor in the Node Tree by choosing "Reference". Selecting the "Extractor" property, use the dropdown menu to point to the extractor's location in the Node Tree.


1573142691354-302.png


Commonly, whether in a Text Pattern or a referenced extractor, the regex pattern is set to a list of checkbox labels on the document, separated by a vertical bar character (“|”). If you had two options, "yes" or "no", the regex pattern could simply be "yes|no". The example below is a form asking citizenship status. In the Value Pattern editor, "yes|no" is set as the regex pattern, and a look ahead pattern is set to limit the number of results to just the "yes" and "no" near the "U.S. Citizen" label.


1573142137555-306.png


Configure the checkbox's location

1573151755471-255.png


1. Edit the "Box Location" property depending on where the checkboxes are in relation to the labels, either to the West, East, North or South of the labels. Multiple options may be selected if checkboxes appear next to labels in more than one cardinal direction. For example, if the checkbox is to the left of a label, you will select "West"

1573152740416-824.png
1573152262412-224.png


2. Set the maximum space allowable between checkboxes and labels by adjusting the "Max Distance" property. Setting this property can prevent Grooper from looking at a checkbox further away from the label, giving a false positive result.


1573152275705-540.png 1573152180988-736.png


Set the extraction mode

The "Mode" property is determined by how checkboxes are used on a document. On some documents, checkboxes are used to select one choice out of a list. On others, checkboxes are used to select multiple choices from a list. On still others, it really only matters if the box is checked or not (such as "Yes/No" or "True/False" type responses).

The different extraction modes correspond to how checkboxes are used on the document in hand. The "Mode" can be CheckOne, CheckMulti, or Boolean.


1573153264910-378.png


Use "CheckOne" in situations where there are multiple checkboxes, but only one can be checked. The output will be the single label checked.


Property Panel of the "U.S. Citizen" field

1573154918462-714.png
Results seen in Document View

1573153730510-776.png
Extraction results

1573155270802-161.png


Use "CheckMulti" in situations where there are multiple checkboxes, and any number of those boxes can be checked. The output will be a single concatenated list of all checked labels. However, you may separate that list using a separator string. For example, you could put a comma and space between every value by entering ", " for the "Separator String Property"


Property Panel of the "Requirements" field

1573157252743-611.png
Results seen in the Document Viewer

1573157853455-402.png
Notice the three checked values are separated by a "|" (the vertical bar character) using the "Separator String" property

1573157263632-110.png
Had the "Separator String" property not been used, the results would be concatenated, with no separation between each result.

1573158164886-877.png


Use "Boolean" in situations where it only matters if the box is checked or not, corresponding to a boolean "true" or "false" value. By default the value if the box is checked is "true" and "false" if it is not. However, the output can be changed using the "Value If Checked" and "Value if Unchecked" properties.


Property Panel of the "US Citizen?" field

1573155596280-953.png
Results seen in Document View

1573155591024-524.png
Extraction results
Notice the output is what is typed in the "Value If Checked Property"

1573155941591-182.png


Property Details

Property Default Value Information
Label Extractor (none) The value this extractor returns will be the label(s) next to the checkboxes. You may choose "Text Pattern" or "Reference" from the dropdown menu.
Box Location West This property sets where the checkboxes are in location to the labels. This can be set to West (left of the label), East (right of the label), North (above the label) or South (below the label). Multiple options may be selected if checkboxes appear next to labels in more than one cardinal direction.
Max Distance 0.25in This property determines the maximum space allowable between checkboxes and labels (in inches, points, millimeters or centimeters).
Mode CheckOne The mode determines how labels are extracted, depending on how checkboxes work on the document. It can be one of three values:
  • CheckOne - Only one box may be checked. Only one label is output.
  • CheckMulti - Multiple boxes may be checked. Multiple labels may output. One result is output as a concatenated list of all checked options. The list may be separated by the Separator String Property.
  • Boolean - This returns a "true" or "false" value depending on if the label's checkbox is checked or not. By default the value if the box is checked is "true" and "false" if it is not. However, the output can be changed using the "Value If Checked" and "Value if Unchecked" properties.

Additional Mode - CheckMulti Properties

Property Default Value Information
Separator String Defines a delimiter string to be used between each label value. This could be as simple as a space character entered to put a space between the checked label values.

Additional Mode - Boolean Properties

Property Default Value Information
Value If Checked True Specifies the output value if a box is checked.
Value If Unchecked False Specifies the output value if a box is left unchecked.