2.80:OMR Reader (Result Processor)

From Grooper Wiki
Revision as of 13:05, 23 December 2019 by Configadmin (talk | contribs)
The OMR Reader post processor selected on a Data Type's property panel.

The OMR Reader result processor allows Data Type extractors to read checkbox states near a label.

The Data Type is first configured to return the labes near the checkboxes. The extractor can then "see" checkboxes near these labels and determine whether or not they are filled.

There are relatively few configuration options, making setup fairly simple.

Version Differences

The OMR Reader post processor is a new configurable property on Data Type extractors as of version 2.72. Prior to version 2.72, OMR checkboxes were able to be read by configuring a Data Element Profile in a Document Type object. Setting up a Data Type to read OMR checkboxes is much simpler. Furthermore, since the result is returned to a Data Type, this data could be used any time an extractor is used in Grooper, not just to populate a field in a data model.

Example

From a list of multiple checkboxes, we need to determine which are checked.

The idea for this scenario is to first extract the labels for the checkboxes (using a Data Type), and then set the "Post Processing" property of that Data Type to run OMR Reader to look for checked boxes near the labels.

Steps

Create the Data Type

First, we want to create the Data Type that will extract all possible checkbox labels. We'll set the Pattern of the Data Type to extract the label. This example is using a static list, but this can be achieved a number of ways.

A. New|
B. Renewal|
C. Upgrade|
D. Multi-Unit|
E. Reapplication



Enable the Post Processor

In the "Output" section of the Data Type's property panel, we'll set the "Post Processing" property to "OMR Reader".



Configure the Post Processor

Once we've chosen "OMR Reader", we can expand it to reveal its configurable properties. For this example, we don't need to change anything, so we'll leave these at their default values.



Test Extraction

Once we Save and Run Extraction, the Post Processor runs after the initial pattern (the one that finds the checkbox labels), looks for checkboxes to the West of the labels, and outputs the results with a confidence percentage.



Notice the confidence percentage for our checkbox is only **50%**. This is because the pattern we wrote happens to exist on multiple pages, so there are multiple positive values being found.

To bring this back up to 100%, we can use the "Result Filter" property on the Data Type to limit the extractor's scope to only the first page.

Properties

Box Location Defines the spatial relationship between labels and OMR boxes. A combination of the following flags:
  • West: The OMR zone is located to the left of the label.
  • East: The OMR zone is located to the right of the label.
  • North: The OMR zone is located above the label.
  • South: The OMR zone is located below the label.
Max Distance The maximum distance between the label and the OMR box, measured from their closest edges.
Mode Specifies the OMR extraction mode. Can be one of the following values:
  • CheckOne: The label extractor targets multiple OMR boxes, but only one may be checked. The output will include one instance for each label, sorted in descending order by confidence. The confidence of each output instance indicates the likelihood that it is the correct result.
  • CheckMulti: The label extract targets multiple OMR boxes, and any number may be checked. The output will include a single instance, containing a concatenated list of values for the checked boxes, delimited by the Separator String.
  • Boolean: The label extractor targets a single OMR box representing a Boolean state. The output will include a single instance with its value set to "Value If Checked" or "Value If Unchecked", depending on whether or not the box is checked. If the label extractor produces multiple hits, then labels where a box is detected with be prioritized over labels where no box is detected.