2023:Ordered OMR (Value Extractor): Difference between revisions

Revision as of 09:57, 13 February 2023

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.

Ordered OMR is an extractor type similar to a Labeled OMR in that it is used to return OMR check box information. Rather than relying on a label for the extraction, the Ordered OMR returns information from the boxes based on the order of the check boxes.

About

Check boxes on a form can be extremely useful. They give us quick information at a glance. However, there is not an expression we can put into a text extractor, such as a Pattern Match or List Match, to find checked and unchecked boxes. Instead we must use one of the OMR extractors.

OMR stands for "Optical Mark Recognition". OMR first detects the check boxes on a document and then determines whether not that box is checked or unchecked. The most common ways a box can be checked are with a checkmark, black box, or an "X".

There are three types of OMR recognition in Grooper: Labeled OMR, Ordered OMR, and Zonal OMR.

NOTE: For any OMR detection, documents in Grooper first need to be recognized and go through the box detection step from either OCR or an IP Profile. Please see the OCR and IP Profile wiki articles for more information.

Ordered OMR determines which boxes are checked and unchecked and then returns values based on the order of the boxes. Before extraction, the boxes have to be given an Output Value to assign a specific value to each box. So, what does this mean?

How Does It Work?

Understanding Ordered OMRVertical FlowHorizontal Flow

Understanding Ordered OMR

If you look at the image on the right, you will see a check box list. You can see that Baseball is checked as NO, Basketball is checked as YES, and so on down the list. Grooper uses the pixel count inside of a box to determine if it is checked or not. There are more pixels inside of a checked box than an unchecked box.

A Labeled OMR extractor uses labels to determine which check box values to return. However, with an Ordered OMR extractor, the labels next to the check boxes mean very little. Instead, the order of the boxes is important.

In this example, the check boxes are arranged in a grid. There are two columns labeled "YES" and "NO" and eleven rows numbered 1-11, for a total of 22 check boxes. Selecting either a Vertical or Horizontal Flow Direction instructs Grooper on how to read the boxes. Grooper will then return results based on the checked boxes.

Vertical Flow

With a Vertical Flow Direction selected, Grooper would first look at box for YES and 1 or Y1 and determine whether or not it is checked. Then it would look at Y2 and determine whether or not that one was checked, and so on down the line. At the end of the first column, the Ordered OMR extractor would start again at the top of the second column at N1 and go down that column determining whether or not the boxes are checked.

So in this example if we were to symbolize "YES" with a Y and a "NO" with an N, Grooper would return the following values in this order: 2Y 6Y 7Y 10Y 1N 3N 4N 5N 8N 9N 11N

Horizontal Flow

With a Horizontal Flow Direction selected, Grooper would first look at box for YES and 1 or Y1 and determine whether or not it is checked. Then it would look at N1 and determine whether or not that one was checked. Then it would return to the first column at Y2 and then move to N2 and so on. It jumps back and forth between the two columns, first looking horizontally for information before moving on to the next row. If there were a third column, Grooper would look at the first row and extract the values for the first, second, and third columns before moving to the next row.

So in this example if we were to symbolize "YES" with a Y and a "NO" with an N, Grooper would return the following values in this order: 1N 2Y 3N 4N 5N 6Y 7Y 8N 9N 10Y 11N

How To

So how do we set this up in Grooper? An Ordered OMR can be selected anywhere an extractor is used.

Configuring on a Value ReaderConfiguring on a Data TypeConfiguring on Other Object Types

Configuring on a Value Reader
In your Node Tree, create or select a Value Reader. Visit the Value Reader Wiki Page for instructions on how to create a Value Reader. Select the "Value Reader" tab. Click the drop down list next to Extractor and select Ordered OMR.

Configuring on a Data Type
In your Node Tree, create or select a Data Type. Visit the Data Type Wiki Page for instructions on how to create a Data Type. Select the "Data Type" tab. Click the drop down list next to Local Extractor and select Ordered OMR.

Configuring on Other Object Types

The Ordered OMR extractor can be used on a multitude of object types. Any object that has an extractor property can be configured with an Ordered OMR.

The configuration process on other objects is identical to both the Value Reader and Data Type objects. Simply select Ordered OMR as your extractor type.

Examples where you can use an Ordered OMR include:

A Data Type's Value Extractor property
A Document Type's Positive Extractor property
The Labeled Value extractor's Label Extractor property
The Pattern-Based Separation Provider's Value Extractor property

@@ Line 121: / Line 121: @@
 |
-<!--Screenshot-->
+[[File:2023-Ordered OMR-How To 01.png]]
 |}
@@ Line 141: / Line 141: @@
 |
-<!--Screenshot-->
+[[File:2023-Ordered OMR-How To 02.png]]
 |}