2023:Ordered OMR (Value Extractor)
| WIP |
This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly. This tag will be removed upon draft completion. |
Ordered OMR is an extractor type similar to a Labeled OMR in that it is used to return OMR check box information. Rather than relying on a label for the extraction, the Ordered OMR returns information from the boxes based on the order of the check boxes.
About
Check boxes on a form can be extremely useful. They give us quick information at a glance. However, there is not an expression we can put into a text extractor, such as a Pattern Match or List Match, to find checked and unchecked boxes. Instead we must use one of the OMR extractors.
OMR stands for "Optical Mark Recognition". OMR first detects the check boxes on a document and then determines whether not that box is checked or unchecked. The most common ways a box can be checked are with a checkmark, black box, or an "X".
There are three types of OMR recognition in Grooper: Labeled OMR, Ordered OMR, and Zonal OMR.
- NOTE: For any OMR detection, documents in Grooper first need to be recognized and go through the box detection step from either OCR or an IP Profile. Please see the OCR and IP Profile wiki articles for more information.
Ordered OMR determines which boxes are checked and unchecked and then returns values based on the order of the boxes. Before extraction, the boxes have to be given an Output Value to assign a specific value to each box. So, what does this mean?
How Does It Work?
|
If you look at the image on the right, you will see a check box list. You can see that Baseball is checked as "NO", Basketball is checked as "YES", and so on down the list. Grooper uses the pixel count inside of a box to determine if it is checked or not. There are more pixels inside of a checked box than an unchecked box. Using a Labeled OMR extractor, Grooper uses labels to determine which check box values to return. However, with an Ordered OMR extractor, the labels next to the check boxes mean very little. Instead, the order of the boxes is what is important. In the example to the right, the check boxes are arranged in a grid. There are two columns labeled "YES" and "NO" and eleven rows numbered 1-11. Using an "Ordered OMR" extractor, |
