2023:Ordered OMR (Value Extractor): Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Line 23: Line 23:
There are three types of OMR recognition in Grooper: '''''Labeled OMR''''', '''''Ordered OMR''''', and '''''Zonal OMR'''''.  
There are three types of OMR recognition in Grooper: '''''Labeled OMR''''', '''''Ordered OMR''''', and '''''Zonal OMR'''''.  


::'''NOTE:''' For any OMR detection, documents in Grooper first need to be recognized and go through the box detection step from either '''OCR''' or an '''IP Profile'''. Please see the [[OCR]] and [[IP Profile]] wiki articles for more information.  
::'''NOTE:''' For any OMR detection, documents in Grooper first need to be recognized and go through the '''''Box Detection''''' or '''''Box Removal''''' step from either and '''OCR Profile''' or an '''IP Profile'''. Please see the [[OCR]] and [[IP Profile]] wiki articles for more information.  


'''''Ordered OMR''''' determines which boxes are checked and unchecked and then returns values based on the order of the boxes. Before extraction, the boxes have to be given an '''''Output Value''''' to assign a specific value to each box. So, what does this mean?  
'''''Ordered OMR''''' determines which boxes are checked and unchecked and then returns values based on the order of the boxes. Before extraction, the boxes have to be given an '''''Output Value''''' to assign a specific value to each box. So, what does this mean?  
Line 52: Line 52:
</tab>
</tab>


<tab Name="Vertical Flow" style="margin:20px">
<tab Name="Horizontal Flow" style="margin:20px">
 
{|cellpadding=10 cellspacing=5
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
|valign=top style="width:40%"|


====Vertical Flow====
====Horizontal Flow====




Line 64: Line 65:
|valign=top style="width:60%"|
|valign=top style="width:60%"|


With a ''Vertical'' '''''Flow Direction''''' selected, Grooper would first look at box for YES and 1 or Y1 and determine whether or not it is checked. Then it would look at Y2 and determine whether or not that one was checked, and so on down the line. At the end of the first column, the '''''Ordered OMR''''' extractor would start again at the top of the second column at N1 and go down that column determining whether or not the boxes are checked.  
With a ''Horizontal'' '''''Flow Direction''''' selected, Grooper would first look at box for YES and 1 or Y1 and determine whether or not it is checked. Then it would look at N1 and determine whether or not that one was checked. Then it would return to the first column at Y2 and then move to N2 and so on. It jumps back and forth between the two columns, first looking horizontally for information before moving on to the next row. If there were a third column, Grooper would look at the first row and extract the values for the first, second, and third columns before moving to the next row.  


So in this example if we were to symbolize "YES" with a Y and a "NO" with an N, Grooper would return the following values in this order: 2Y 6Y 7Y 10Y 1N 3N 4N 5N 8N 9N 11N
So in this example if we were to symbolize "YES" with a Y and a "NO" with an N, Grooper would return the following values in this order: 1N 2Y 3N 4N 5N 6Y 7Y 8N 9N 10Y 11N


|
|
[[File:2023-Ordered OMR-About 02.png]]
[[File:2023-Ordered OMR-About 03.png]]
|}


|}
</tab>
</tab>
<tab Name="Horizontal Flow" style="margin:20px">
<tab Name="Vertical Flow" style="margin:20px">
 
{|cellpadding=10 cellspacing=5
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
|valign=top style="width:40%"|


====Horizontal Flow====
====Vertical Flow====




Line 85: Line 87:
|valign=top style="width:60%"|
|valign=top style="width:60%"|


With a ''Horizontal'' '''''Flow Direction''''' selected, Grooper would first look at box for YES and 1 or Y1 and determine whether or not it is checked. Then it would look at N1 and determine whether or not that one was checked. Then it would return to the first column at Y2 and then move to N2 and so on. It jumps back and forth between the two columns, first looking horizontally for information before moving on to the next row. If there were a third column, Grooper would look at the first row and extract the values for the first, second, and third columns before moving to the next row.  
With a ''Vertical'' '''''Flow Direction''''' selected, Grooper would first look at box for YES and 1 or Y1 and determine whether or not it is checked. Then it would look at Y2 and determine whether or not that one was checked, and so on down the line. At the end of the first column, the '''''Ordered OMR''''' extractor would start again at the top of the second column at N1 and go down that column determining whether or not the boxes are checked.  


So in this example if we were to symbolize "YES" with a Y and a "NO" with an N, Grooper would return the following values in this order: 1N 2Y 3N 4N 5N 6Y 7Y 8N 9N 10Y 11N
So in this example if we were to symbolize "YES" with a Y and a "NO" with an N, Grooper would return the following values in this order: 2Y 6Y 7Y 10Y 1N 3N 4N 5N 8N 9N 11N


|
|
[[File:2023-Ordered OMR-About 03.png]]
[[File:2023-Ordered OMR-About 02.png]]
|}


|}
</tab>
</tab>


Line 163: Line 165:
* The '''''Labeled Value''''' extractor's '''''Label Extractor''''' property
* The '''''Labeled Value''''' extractor's '''''Label Extractor''''' property
* The '''''Pattern-Based Separation Provider''''''s '''''Value Extractor''''' property
* The '''''Pattern-Based Separation Provider''''''s '''''Value Extractor''''' property
|}
</tab>
</tabs>
Once you have '''''Ordered OMR''''' selected as the extractor type, there are several properties that need to be configured.
First, the extractor's '''''Mode''''' needs to be set. The options are ''CheckOne'', ''CheckMulti'', and ''Boolean''. The ''CheckMulti'' option is going to be most commonly used for an '''''Ordered OMR''''' extractor. For the example below, we will be using ''CheckMulti'' as the '''''Mode'''''. For more information on the other two options, please visit the [[Labeled OMR - 2021]] Wiki Page.
Once the '''''Mode''''' is set, you will need to select the '''''Location''''' option. This tells Grooper what area of the document to look for OMR boxes. There are four options for the '''''Location''''': ''Fixed Region'', ''Relative Region'', ''Shape Region'', and ''Text Region''. For the example below, we will be using ''Fixed Region'' as the '''''Location'''''. For more information on the other three options, please visit the [[Labeled OMR - 2021]] Wiki Page.
Next you will need to configure the '''''Output Values''''' to give the check boxes a value to be extracted. The order of your '''''Output Values''''' depends on whether you decide to use a ''Horizontal'' or ''Vertical'' '''''Flow Direction'''''.
Finally, you will need to select either a ''Horizontal'' or ''Vertical'' '''''Flow Direction'''''. If you do not select the proper '''''Flow Direction''''' that matches with your '''''Output Values''''', Grooper will not extract the correct information. Please reference the "About" section of this article to determine which '''''Flow Direction''''' fits your needs.
<tabs style="margin:20px">
<tab Name="Mode" style="margin:20px">
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
====Mode====
|-
|valign=top style="width:40%"|
# After you have created your object type and set '''''Ordered OMR''''' as your extractor, select the "Tester" tab.
# Look for the '''''Mode''''' property and click the three stacked lines on the right to open up the drop down list.
# Select the preferred '''''Mode''''' to be used. For this example, we will be selecting the ''CheckMulti'' '''''Mode'''''.
#* Generally an '''''Ordered OMR''''' is used when multiple items can be checked.
# When selecting ''CheckMulti'' you can enter a '''''Separator String'''''. By default, this will enter a space between each result returned to separate the results and make them easier to read. If desired you can insert a comma, pipe, forward slash, or any other separator you would like. In this case, we will use a comma as our separator.
|
< Screenshot Here >
|}
</tab>
<tab Name="Location" style="margin:20px">
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
====Location====
|-
|valign=top style="width:40%"|
# After selecting your '''''Mode'''' click on the drop down next to '''''Location'''''.
# Select the preferred '''''Location''''' method to be used. In this example we will use ''Fixed Region''.
# Click the arrow to the left of the word '''''Location'''''' to open sub-properties.
# Now you will need to configure the '''''Location''''' method you have chosen. Since we are using ''Fixed Region'', we need to give the extractor '''''Bounds''''' to look within. Click the ellipsis button next to the '''''Bounds''''' property.
# A new window will open. Click the marquee selection tool located just above your batch viewer display.
# On your document, draw a box around all of the boxes you wish to be part of the extraction.
#* You will notice that on the top left side of the window there are properties that say '''''Left''''', '''''Top''''', '''''Width''''', and '''''Height'''''. Once you select an area with the marquee selection tool, these properties will automatically be updated with the bounds you selected. You can edit each of these by typing in different values if you wish.
# Click "OK" in the top right hand corner to set the '''''Bounds'''''.
# If you have multiple pages, use the '''''Page Filter''''' property to let Grooper know which page of each document the ''Fixed Region'' applies to.
# '''''Auto Snap''''' is an optional property you can set if you have lines detected on your document. When Enabled, the bounds you selected will automatically "snap" to the lines around the selected region. For this example, '''''Auto Snap''''' is not needed, but to ''Enable'' the property simply click the checkbox on the right of the property.
|
< Screenshot Here >
|}
</tab>
<tab Name="Output Values" style="margin:20px">
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
====Output Values====
|-
|valign=top style="width:40%"|
# Decide what '''''Flow Direction''''' you intend to use and take a look at your document. You will need to assign a value to each check box for Grooper to return.
#*In this example we will be using a ''Horizontal'' '''''Flow Direction'''''. We are going to say "Y" symbolizes "YES" and "N" symbolizes "NO". So, for the top left check box, we will assign it a value of "1Y" for being a "YES" answer to question #1. The box next to it will be assigned a value of "1N" for being a "NO" answer to question #1. From there we will go down to the "YES" box for question #2 and assign it a value of "2Y" and so on down the list.
# Enter the values in order following your '''''Flow Direction''''' in the '''''Output Values''''' text box.
|
< Screenshot Here >
|}
</tab>
<tab Name="Flow Direction" style="margin:20px">
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
====Flow Direction====
|-
|valign=top style="width:40%"|
# Finally, after you set your '''''Output Values''''', use the drop down next to the '''''Flow Direction''''' property and select either ''Vertical'' or ''Horizontal''.
# Save and Test your extraction.
|
< Screenshot Here >


|}
|}

Revision as of 17:58, 13 February 2023

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.

Ordered OMR is an extractor type similar to a Labeled OMR in that it is used to return OMR check box information. Rather than relying on a label for the extraction, the Ordered OMR returns information from the boxes based on the order of the check boxes.


About

Check boxes on a form can be extremely useful. They give us quick information at a glance. However, there is not an expression we can put into a text extractor, such as a Pattern Match or List Match, to find checked and unchecked boxes. Instead we must use one of the OMR extractors.

OMR stands for "Optical Mark Recognition". OMR first detects the check boxes on a document and then determines whether not that box is checked or unchecked. The most common ways a box can be checked are with a checkmark, black box, or an "X".

There are three types of OMR recognition in Grooper: Labeled OMR, Ordered OMR, and Zonal OMR.

NOTE: For any OMR detection, documents in Grooper first need to be recognized and go through the Box Detection or Box Removal step from either and OCR Profile or an IP Profile. Please see the OCR and IP Profile wiki articles for more information.

Ordered OMR determines which boxes are checked and unchecked and then returns values based on the order of the boxes. Before extraction, the boxes have to be given an Output Value to assign a specific value to each box. So, what does this mean?

How Does It Work?

Understanding Ordered OMR

If you look at the image on the right, you will see a check box list. You can see that Baseball is checked as NO, Basketball is checked as YES, and so on down the list. Grooper uses the pixel count inside of a box to determine if it is checked or not. There are more pixels inside of a checked box than an unchecked box.

A Labeled OMR extractor uses labels to determine which check box values to return. However, with an Ordered OMR extractor, the labels next to the check boxes mean very little. Instead, the order of the boxes is important.

In this example, the check boxes are arranged in a grid. There are two columns labeled "YES" and "NO" and eleven rows numbered 1-11, for a total of 22 check boxes. Selecting either a Vertical or Horizontal Flow Direction instructs Grooper on how to read the boxes. Grooper will then return results based on the checked boxes.

Horizontal Flow

With a Horizontal Flow Direction selected, Grooper would first look at box for YES and 1 or Y1 and determine whether or not it is checked. Then it would look at N1 and determine whether or not that one was checked. Then it would return to the first column at Y2 and then move to N2 and so on. It jumps back and forth between the two columns, first looking horizontally for information before moving on to the next row. If there were a third column, Grooper would look at the first row and extract the values for the first, second, and third columns before moving to the next row.

So in this example if we were to symbolize "YES" with a Y and a "NO" with an N, Grooper would return the following values in this order: 1N 2Y 3N 4N 5N 6Y 7Y 8N 9N 10Y 11N

Vertical Flow

With a Vertical Flow Direction selected, Grooper would first look at box for YES and 1 or Y1 and determine whether or not it is checked. Then it would look at Y2 and determine whether or not that one was checked, and so on down the line. At the end of the first column, the Ordered OMR extractor would start again at the top of the second column at N1 and go down that column determining whether or not the boxes are checked.

So in this example if we were to symbolize "YES" with a Y and a "NO" with an N, Grooper would return the following values in this order: 2Y 6Y 7Y 10Y 1N 3N 4N 5N 8N 9N 11N



How To

So how do we set this up in Grooper? An Ordered OMR can be selected anywhere an extractor is used.


Configuring on a Value Reader

  1. In your Node Tree, create or select a Value Reader.
    • Visit the Value Reader Wiki Page for instructions on how to create a Value Reader.
  2. Select the "Value Reader" tab.
  3. Click the drop down list next to Extractor and select Ordered OMR.

Configuring on a Data Type

  1. In your Node Tree, create or select a Data Type.
    • Visit the Data Type Wiki Page for instructions on how to create a Data Type.
  2. Select the "Data Type" tab.
  3. Click the drop down list next to Local Extractor and select Ordered OMR.

Configuring on Other Object Types

The Ordered OMR extractor can be used on a multitude of object types. Any object that has an extractor property can be configured with an Ordered OMR.

The configuration process on other objects is identical to both the Value Reader and Data Type objects. Simply select Ordered OMR as your extractor type.


Examples where you can use an Ordered OMR include:

  • A Data Type's Value Extractor property
  • A Document Type's Positive Extractor property
  • The Labeled Value extractor's Label Extractor property
  • The Pattern-Based Separation Provider's Value Extractor property



Once you have Ordered OMR selected as the extractor type, there are several properties that need to be configured.

First, the extractor's Mode needs to be set. The options are CheckOne, CheckMulti, and Boolean. The CheckMulti option is going to be most commonly used for an Ordered OMR extractor. For the example below, we will be using CheckMulti as the Mode. For more information on the other two options, please visit the Labeled OMR - 2021 Wiki Page.

Once the Mode is set, you will need to select the Location option. This tells Grooper what area of the document to look for OMR boxes. There are four options for the Location: Fixed Region, Relative Region, Shape Region, and Text Region. For the example below, we will be using Fixed Region as the Location. For more information on the other three options, please visit the Labeled OMR - 2021 Wiki Page.

Next you will need to configure the Output Values to give the check boxes a value to be extracted. The order of your Output Values depends on whether you decide to use a Horizontal or Vertical Flow Direction.

Finally, you will need to select either a Horizontal or Vertical Flow Direction. If you do not select the proper Flow Direction that matches with your Output Values, Grooper will not extract the correct information. Please reference the "About" section of this article to determine which Flow Direction fits your needs.


Mode

  1. After you have created your object type and set Ordered OMR as your extractor, select the "Tester" tab.
  2. Look for the Mode property and click the three stacked lines on the right to open up the drop down list.
  3. Select the preferred Mode to be used. For this example, we will be selecting the CheckMulti Mode.
    • Generally an Ordered OMR is used when multiple items can be checked.
  4. When selecting CheckMulti you can enter a Separator String. By default, this will enter a space between each result returned to separate the results and make them easier to read. If desired you can insert a comma, pipe, forward slash, or any other separator you would like. In this case, we will use a comma as our separator.

< Screenshot Here >

Location

  1. After selecting your Mode' click on the drop down next to Location.
  2. Select the preferred Location method to be used. In this example we will use Fixed Region.
  3. Click the arrow to the left of the word Location' to open sub-properties.
  4. Now you will need to configure the Location method you have chosen. Since we are using Fixed Region, we need to give the extractor Bounds to look within. Click the ellipsis button next to the Bounds property.
  5. A new window will open. Click the marquee selection tool located just above your batch viewer display.
  6. On your document, draw a box around all of the boxes you wish to be part of the extraction.
    • You will notice that on the top left side of the window there are properties that say Left, Top, Width, and Height. Once you select an area with the marquee selection tool, these properties will automatically be updated with the bounds you selected. You can edit each of these by typing in different values if you wish.
  7. Click "OK" in the top right hand corner to set the Bounds.
  8. If you have multiple pages, use the Page Filter property to let Grooper know which page of each document the Fixed Region applies to.
  9. Auto Snap is an optional property you can set if you have lines detected on your document. When Enabled, the bounds you selected will automatically "snap" to the lines around the selected region. For this example, Auto Snap is not needed, but to Enable the property simply click the checkbox on the right of the property.

< Screenshot Here >

Output Values

  1. Decide what Flow Direction you intend to use and take a look at your document. You will need to assign a value to each check box for Grooper to return.
    • In this example we will be using a Horizontal Flow Direction. We are going to say "Y" symbolizes "YES" and "N" symbolizes "NO". So, for the top left check box, we will assign it a value of "1Y" for being a "YES" answer to question #1. The box next to it will be assigned a value of "1N" for being a "NO" answer to question #1. From there we will go down to the "YES" box for question #2 and assign it a value of "2Y" and so on down the list.
  2. Enter the values in order following your Flow Direction in the Output Values text box.

< Screenshot Here >


Flow Direction

  1. Finally, after you set your Output Values, use the drop down next to the Flow Direction property and select either Vertical or Horizontal.
  2. Save and Test your extraction.

< Screenshot Here >