Anchored OMR determines whether labeled checkboxes are checked or not and outputs the label as its result.
This is a data extraction method used to determine check states of checkboxes on documents. OMR (or Optical Mark Recognition) is used to determine whether a box next to a text label (or “anchor”) is checked or unchecked.
A simple example would be a document asking a question and giving two boxes to check “Yes” or “No.” For example, see the portion of the document below asking if the applicant is a U.S. Citizen. “Yes” or “No” would be the labels. Either “Yes” or “No” would be the field's final result, depending on which box is checked. In this case, "Yes".
|⚠||In version 2.90, this extractor was re-named Labeled OMR. It's configuration and functionality is similar to Anchored OMR. Visit the Labeled OMR article for information on this feature in version 2.90.|
Prior to 2.80, this capability was available by setting up a Data Element Profile in a Content Type object. The Anchored OMR extractor provides a simpler setup, without having to draw out individual zones around the check-boxes, and without having to explicitly train the content type with at least one document example of the document first. That, in turn, means this method can extract data across multiple Content Types without being explicitly required to configure OMR extraction logic for each.
Any document using check-boxes can take advantage of this functionality. There is a wide variety of use cases, including application forms, surveys, and questionnaires.
How To: Configure the extractor
Before you begin
Anchored OMR is an extraction method to populate Data Fields or Data Columns in a Data Model. As such, you will need to create a Content Model with a Data Model with a Data Field or Data Column data element.
The anchor label is found using a data extractor. In order to find a value, you will need OCR text or extracted native text from a PDF. To do this, run your document through the Recognize activity.
Critical to this extraction method, Grooper must know where checkboxes are on the page and whether or not they are checked. To do this, run a Box Detection command on an IP Profile during a Recognize or Image Processing activity. These activities must be run on the Page level.
Set the Value Extractor to Anchored OMR
Navigate to the Data Field or Data Column you wish to populate in your Data Model. Select the "Value Extractor" property. From the dropdown list, choose "Anchored Extract".
Set the Label Extractor
1. Expand Anchored OMR's properties by double clicking "Value Extractor".
2. Select the "Label Extractor" property. The value(s) this extractor returns will be the label(s) by checkboxes on the page. The Label Extractor can be set to "Text Pattern" to write a regular expression local to the Data Field or "Reference" to point to the results of a Data Extractor elsewhere in the Node Tree.
2a. Choose "Text Pattern" to use the Pattern Editor to construct a simple extractor local to the data element. Select the ellipsis button at the end of the "Pattern" property to bring up the Pattern Editor.
2b. Or, you may use a Data Type or Field Class extractor in the Node Tree by choosing "Reference". Selecting the "Extractor" property, use the dropdown menu to point to the extractor's location in the Node Tree.
Commonly, whether in a Text Pattern or a referenced extractor, the regex pattern is set to a list of checkbox labels on the document, separated by a vertical bar character (“|”). If you had two options, "yes" or "no", the regex pattern could simply be "yes|no". The example below is a form asking citizenship status. In the Value Pattern editor, "yes|no" is set as the regex pattern, and a look ahead pattern is set to limit the number of results to just the "yes" and "no" near the "U.S. Citizen" label.
Configure the checkbox's location
1. Edit the "Box Location" property depending on where the checkboxes are in relation to the labels, either to the West, East, North or South of the labels. Multiple options may be selected if checkboxes appear next to labels in more than one cardinal direction. For example, if the checkbox is to the left of a label, you will select "West"
2. Set the maximum space allowable between checkboxes and labels by adjusting the "Max Distance" property. Setting this property can prevent Grooper from looking at a checkbox further away from the label, giving a false positive result.
Set the extraction mode
The "Mode" property is determined by how checkboxes are used on a document. On some documents, checkboxes are used to select one choice out of a list. On others, checkboxes are used to select multiple choices from a list. On still others, it really only matters if the box is checked or not (such as "Yes/No" or "True/False" type responses).
The different extraction modes correspond to how checkboxes are used on the document in hand. The "Mode" can be CheckOne, CheckMulti, or Boolean.
Use "CheckOne" in situations where there are multiple checkboxes, but only one can be checked. The output will be the single label checked.
|Property Panel of the "U.S. Citizen" field
||Results seen in Document View|
Use "CheckMulti" in situations where there are multiple checkboxes, and any number of those boxes can be checked. The output will be a single concatenated list of all checked labels. However, you may separate that list using a separator string. For example, you could put a comma and space between every value by entering ", " for the "Separator String Property"
|Property Panel of the "Requirements" field
||Results seen in the Document Viewer|
|Notice the three checked values are separated by a "|" (the vertical bar character) using the "Separator String" property
||Had the "Separator String" property not been used, the results would be concatenated, with no separation between each result.|
Use "Boolean" in situations where it only matters if the box is checked or not, corresponding to a boolean "true" or "false" value. By default the value if the box is checked is "true" and "false" if it is not. However, the output can be changed using the "Value If Checked" and "Value if Unchecked" properties.
|Property Panel of the "US Citizen?" field
||Results seen in Document View|
Notice the output is what is typed in the "Value If Checked Property"
|Label Extractor||(none)||The value this extractor returns will be the label(s) next to the checkboxes. You may choose "Text Pattern" or "Reference" from the dropdown menu.|
|Box Location||West||This property sets where the checkboxes are in location to the labels. This can be set to West (left of the label), East (right of the label), North (above the label) or South (below the label). Multiple options may be selected if checkboxes appear next to labels in more than one cardinal direction.|
|Max Distance||0.25in||This property determines the maximum space allowable between checkboxes and labels (in inches, points, millimeters or centimeters).|
|Mode||CheckOne||The mode determines how labels are extracted, depending on how checkboxes work on the document. It can be one of three values:
Additional Mode - CheckMulti Properties
|Separator String||Defines a delimiter string to be used between each label value. This could be as simple as a space character entered to put a space between the checked label values.|
Additional Mode - Boolean Properties
|Value If Checked||True||Specifies the output value if a box is checked.|
|Value If Unchecked||False||Specifies the output value if a box is left unchecked.|