Main Page: Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Line 22: Line 22:
|
|
<blockquote style="font-size:14pt">
<blockquote style="font-size:14pt">
'''[[OCR Synthesis]]'''
''[[Read Zone]]''
</blockquote>
</blockquote>
''Read Zone'' is a '''''Value Extractor''''' option available to '''[[Data Field]]s''' in a '''[[Data Model]]'''.


The Synthesis functionality is Grooper's unique method of pre-processing and re-processing raw results from the [[OCR Engine|OCR engine]] to get better results out of itUsing Synthesis, portions of the document can be OCR'd independently from the full text OCR.  Portions of the image dropped out from the first OCR pass can be re-run.  And, certain results can be reprocessed.  The results from the Synthesis operation are combined with (or in some cases replace) the full text OCR results from the OCR Engine into a single text flow.
''Read Zone'' allows you to extract text data in a rectangular region (called a "extraction zone" or just "zone") on a documentThis can be a fixed zone, extracting text from the same location on a document, or a zone relative to an extracted text anchor or shape location on the document.


Synthesis is a collection of five separate OCR processing operations:
Highly structured documents organize information into a series of data fields.  These fields will have a label identifying what the field contains, such as "Name", and a corresponding value, such as "John Doe".  While the values for these fields will change from document to document, their position on the document will remain constant.


* [[OCR Synthesis#Font Pitch Detection|Font Pitch Detection]]
The ''Read Zone'' extractor extracts data using this feature of document layouts. 
* [[OCR Synthesis#Bound Region Processing|Bound Region Processing]]
* [[OCR Synthesis#Iterative Processing|Iterative Processing]]
* [[OCR Synthesis#Cell Validation|Cell Validation]]
* [[OCR Synthesis#Segment Reprocessing|Segment Reprocessing]]


As separate operations, the user can choose to enable all five operations, choose to use only one, or a combination.  Synthesis is enabled on '''[[OCR Profile]]s''', using the '''''Synthesis''''' property.  This property is enabled by default on OCR Profiles (and can be disabled if you so choose)However, each Synthesis operation needs to be configured independently in order to function.
As long as you can be reasonably assured the data you want to find will be in the same spot from document to document, you don't necessarily need anything fancier than extracting whatever text is in that known location.   
|
|
You can now manually manipulate the confidence of an extraction result.  The '''''[[Confidence Multiplier and Output Confidence]]''''' properties of '''[[Data Type]]''' and '''[[Data Format]]''' extractors allow you to change the confidence score of extraction results.  No longer are you forced to accept the score Grooper provides.  These properties give you more control when it comes to what confidence a result ''should'' be.
You can now manually manipulate the confidence of an extraction result.  The '''''[[Confidence Multiplier and Output Confidence]]''''' properties of '''[[Data Type]]''' and '''[[Data Format]]''' extractors allow you to change the confidence score of extraction results.  No longer are you forced to accept the score Grooper provides.  These properties give you more control when it comes to what confidence a result ''should'' be.

Revision as of 08:44, 13 October 2020

Getting Started

Grooper is a software application that helps organizations innovate workflows by integrating difficult data.

Grooper empowers rapid innovation for organizations processing and integrating large quantities of difficult data. Created by a team of courageous developers frustrated by limitations in existing solutions, Grooper is an intelligent document and digital data integration platform. Grooper combines patented and sophisticated image processing, capture technology, machine learning, and natural language processing. Grooper – intelligent document processing; limitless, template-free data integration.

Getting Started
Install and Setup
2.90 Reference Documentation


Featured Articles Did you know?

Read Zone

Read Zone is a Value Extractor option available to Data Fields in a Data Model.

Read Zone allows you to extract text data in a rectangular region (called a "extraction zone" or just "zone") on a document. This can be a fixed zone, extracting text from the same location on a document, or a zone relative to an extracted text anchor or shape location on the document.

Highly structured documents organize information into a series of data fields. These fields will have a label identifying what the field contains, such as "Name", and a corresponding value, such as "John Doe". While the values for these fields will change from document to document, their position on the document will remain constant.

The Read Zone extractor extracts data using this feature of document layouts.

As long as you can be reasonably assured the data you want to find will be in the same spot from document to document, you don't necessarily need anything fancier than extracting whatever text is in that known location.

You can now manually manipulate the confidence of an extraction result. The Confidence Multiplier and Output Confidence properties of Data Type and Data Format extractors allow you to change the confidence score of extraction results. No longer are you forced to accept the score Grooper provides. These properties give you more control when it comes to what confidence a result should be.

This allows you to prioritize certain results over others. You can create a kind of "fall back" or "safety net" result by using this property. You can even increase the confidence of an extractor's result, allowing you to give more weight to a fuzzy extractor's result over a non-fuzzy one, for example.

For more information visit, the Confidence Multiplier and Output Confidence article.

New in 2.9 Featured Use Case

Welcome to Grooper 2.9!
Below you will find helpful links to all the articles about the new/changed functionality in this version of Grooper.

Compile Stats Microsoft Office Integration Document Viewer Separation and Separation Review
Data Review Confidence Multiplier Data Element Overrides Database Export
CMIS Lookup Content Type Filter Output Extractor Key Box (CMIS Binding)
LINQ to Grooper Objects

They’re Saving Over 5,000 Hours Every Year in Data Discovery and Processing


American Airlines Credit Union has transformed their data workflows, quickly saving thousands of hours in electronic data discovery , resulting in much greater efficiency and improved member services.

Discover how they:

  • Quickly found 40,000 specific files among one billion
  • Easily integrated with data silos and content management systems when no other solution would
  • Have cut their mortgage processing time in half (and they process mortgages for 47 branch offices!)
  • Learn from the document and electronic data discovery experts at BIS!

You can access the full case study clicking this link.

Feedback

Feedback

We value your feedback!

Help us improve our product by leaving us a review on Gartner.com.

Click "Submit a review" on the image to the left to start a review.


Other Resources