Main Page: Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Line 22: Line 22:
|
|
<blockquote style="font-size:14pt">
<blockquote style="font-size:14pt">
''[[Read Zone]]''
''[[Labeled OMR]]''
</blockquote>
</blockquote>
''Read Zone'' is a '''''Value Extractor''''' option available to '''[[Data Field]]s''' in a '''[[Data Model]]'''.
[[File:Labeled-omr-about-01.png|thumb|200px|An example of checkboxes.]]


''Read Zone'' allows you to extract text data in a rectangular region (called a "extraction zone" or just "zone") on a documentThis can be a fixed zone, extracting text from the same location on a document, or a zone relative to an extracted text anchor or shape location on the document.
''Labeled OMR'' is an extractor used to output OMR checkbox labelsIt determines whether labeled checkboxes are checked or not and, if checked, outputs the label as its result.


Highly structured documents organize information into a series of data fieldsThese fields will have a label identifying what the field contains, such as "Name", and a corresponding value, such as "John Doe".  While the values for these fields will change from document to document, their position on the document will remain constant.
Documents use checkboxes to make our life easier.  They are particularly prevalent on structured formsIt gives the person filling out the form the ability to just check a box next to a series of options rather than typing in the information.


The ''Read Zone'' extractor extracts data using this feature of document layouts.   
However, most of Grooper's extraction centers around regular expression, matching text patterns and returning the result.  There isn't necessarily a character to match a checked checkbox.  Regular expression isn't going to cut it to determine if a box is checked or not.   


As long as you can be reasonably assured the data you want to find will be in the same spot from document to document, you don't necessarily need anything fancier than extracting whatever text is in that known location.   
This is where OMR comes into play.  OMR stands for "Optical Mark Recognition".  OMR determines checkbox states.  The basic idea behind it is very simple.  First find a box.  A box is just four lines connected to each other in a square-like fashion.  If that box has a mark of some kind inside it, it is checked.  If not, it's not.  Checked (or marked) boxes, whether a checked "x" (<span style="font-size:120%">&#9746;</span>), a checkmark (<span style="font-size:120%">&#9745;</span>),  or a check block (<span style="font-size:120%">&#9635;</span>), while have more black pixels inside the box than an unchecked (or unmarked) one (<span style="font-size:120%">&#9744;</span>).  If the detected box has a high threshold of black pixels in it, it's checked (or marked).  If not, it's unchecked (or unmarked).   
|
|
You can now manually manipulate the confidence of an extraction result. The '''''[[Confidence Multiplier and Output Confidence]]''''' properties of '''[[Data Type]]''' and '''[[Data Format]]''' extractors allow you to change the confidence score of extraction resultsNo longer are you forced to accept the score Grooper providesThese properties give you more control when it comes to what confidence a result ''should'' be.
The earliest examples of OCR (Optical Character Recognition) can be traced back to the  1870s?  Early OCR devices were actually invented to aid the blindThis included "text-to-speech" devices that would scan black print and produce sounds a blind blind person could interpret, as well as "text-to-tactile" machines which would convert luminous sensations into tactile sensationsMachines such as these would allow a blind person to read printed text not yet converted to Braille.


This allows you to prioritize certain results over others.  You can create a kind of "fall back" or "safety net" result by using this property.  You can even ''increase'' the confidence of an extractor's result, allowing you to give more weight to a fuzzy extractor's result over a non-fuzzy one, for example.
The first business to install an OCR reader was the magazine ''Reader's Digest'' in 1954.  The company used it to convert typewritten sales reports into machine readable punch cards.
 
For more information visit, the [[Confidence Multiplier and Output Confidence]] article.
|}
|}



Revision as of 09:25, 19 October 2020

Getting Started

Grooper is a software application that helps organizations innovate workflows by integrating difficult data.

Grooper empowers rapid innovation for organizations processing and integrating large quantities of difficult data. Created by a team of courageous developers frustrated by limitations in existing solutions, Grooper is an intelligent document and digital data integration platform. Grooper combines patented and sophisticated image processing, capture technology, machine learning, and natural language processing. Grooper – intelligent document processing; limitless, template-free data integration.

Getting Started
Install and Setup
2.90 Reference Documentation


Featured Articles Did you know?

Labeled OMR

An example of checkboxes.

Labeled OMR is an extractor used to output OMR checkbox labels. It determines whether labeled checkboxes are checked or not and, if checked, outputs the label as its result.

Documents use checkboxes to make our life easier. They are particularly prevalent on structured forms. It gives the person filling out the form the ability to just check a box next to a series of options rather than typing in the information.

However, most of Grooper's extraction centers around regular expression, matching text patterns and returning the result. There isn't necessarily a character to match a checked checkbox. Regular expression isn't going to cut it to determine if a box is checked or not.

This is where OMR comes into play. OMR stands for "Optical Mark Recognition". OMR determines checkbox states. The basic idea behind it is very simple. First find a box. A box is just four lines connected to each other in a square-like fashion. If that box has a mark of some kind inside it, it is checked. If not, it's not. Checked (or marked) boxes, whether a checked "x" (), a checkmark (), or a check block (), while have more black pixels inside the box than an unchecked (or unmarked) one (). If the detected box has a high threshold of black pixels in it, it's checked (or marked). If not, it's unchecked (or unmarked).

The earliest examples of OCR (Optical Character Recognition) can be traced back to the 1870s? Early OCR devices were actually invented to aid the blind. This included "text-to-speech" devices that would scan black print and produce sounds a blind blind person could interpret, as well as "text-to-tactile" machines which would convert luminous sensations into tactile sensations. Machines such as these would allow a blind person to read printed text not yet converted to Braille.

The first business to install an OCR reader was the magazine Reader's Digest in 1954. The company used it to convert typewritten sales reports into machine readable punch cards.

New in 2.9 Featured Use Case

Welcome to Grooper 2.9!
Below you will find helpful links to all the articles about the new/changed functionality in this version of Grooper.

Compile Stats Microsoft Office Integration Document Viewer Separation and Separation Review
Data Review Confidence Multiplier Data Element Overrides Database Export
CMIS Lookup Content Type Filter Output Extractor Key Box (CMIS Binding)
LINQ to Grooper Objects

They’re Saving Over 5,000 Hours Every Year in Data Discovery and Processing


American Airlines Credit Union has transformed their data workflows, quickly saving thousands of hours in electronic data discovery , resulting in much greater efficiency and improved member services.

Discover how they:

  • Quickly found 40,000 specific files among one billion
  • Easily integrated with data silos and content management systems when no other solution would
  • Have cut their mortgage processing time in half (and they process mortgages for 47 branch offices!)
  • Learn from the document and electronic data discovery experts at BIS!

You can access the full case study clicking this link.

Feedback

Feedback

We value your feedback!

Help us improve our product by leaving us a review on Gartner.com.

Click "Submit a review" on the image to the left to start a review.


Other Resources