2.80:Layered OCR (OCR Engine): Difference between revisions
Dgreenwood (talk | contribs) No edit summary |
Dgreenwood (talk | contribs) No edit summary |
||
| Line 38: | Line 38: | ||
# The '''''Layers''''' property allows you to use secondary OCR Profiles. Here, you will add one more more Layers pointing to a second configured OCR Profile and an Extractor. | # The '''''Layers''''' property allows you to use secondary OCR Profiles. Here, you will add one more more Layers pointing to a second configured OCR Profile and an Extractor. | ||
# The Extractor returns segments of text recognized by the secondary (or layer) OCR Profiles, and replaces the results from the Main OCR Profile. | # The Extractor returns segments of text recognized by the secondary (or layer) OCR Profiles, and replaces the results from the Main OCR Profile. | ||
{|cellpadding="10" cellspacing="5" | {|cellpadding="10" cellspacing="5" | ||
| Line 64: | Line 53: | ||
** The result will not replace the main OCR results if you do. The FuzzyRegEx and FuzzyList match modes are the only ways Layered OCR can modify the results of the secondary OCR Profile before merging with the main OCR Profile's results. | ** The result will not replace the main OCR results if you do. The FuzzyRegEx and FuzzyList match modes are the only ways Layered OCR can modify the results of the secondary OCR Profile before merging with the main OCR Profile's results. | ||
|} | |} | ||
<!-- | |||
This engine is designed to read documents with specialized print types (such as [https://en.wikipedia.org/wiki/Magnetic_ink_character_recognition MICR], handwriting, etc.) and to ensure as close to 100% accuracy as possible for background elements on forms. | |||
=== How it works === | |||
# The "Main OCR Profile" establishes the base OCR output. | |||
# Each subsequent layer is configured to run an additional OCR Profile and an "Extractor". This is key, as these are run independent of the "Main OCR Profile", so specialized OCR Profiles can be created and utilized to ensure the desired data is extracted properly. | |||
# The extraction results of each subsequent layer are then merged into the final OCR output. | |||
== Use Cases == | == Use Cases == | ||
Revision as of 11:49, 9 July 2020
Layered OCR enables you to run secondary OCR Profiles on a single page. The OCR results from these secondary OCR Profiles are merged with (or layered on top of) the primary OCR Profile's results.
About
You can use Layered OCR by selecting it as your OCR Engine in an OCR Profile. While not itself an OCR Engine, such as Transym or Tesseract, it allows you to obtain OCR text with multiple OCR Profiles, each using their own OCR engines.

For example, certain OCR engines have advantages over others in specific cases. Transym performs well in most cases. However, it does not do well with certain specialized print types, such as MICR or handwriting. Another engine may perform better in these cases. Microsoft's Azure Computer Vision does better than most OCR engines at recognizing handwriting (but requires a licence key from Microsoft). Google's Tesseract has the capability to train fonts. Grooper ships with both Transym and Tesseract as selectable OCR engines. Furthermore, training files for the MICR, OCR-A, and OCR-B fonts are included.
This can greatly improve your OCR results. The secondary layers can target segments of text better recognized by different OCR Profiles and merge the results with your main OCR Profile.
How It Works
Layered OCR has three basic steps.
- The Main OCR Profile property establishes the primary OCR Profile. Here, you will point to a configured OCR Profile you want to use as your baseline OCR.
- The Layers property allows you to use secondary OCR Profiles. Here, you will add one more more Layers pointing to a second configured OCR Profile and an Extractor.
- The Extractor returns segments of text recognized by the secondary (or layer) OCR Profiles, and replaces the results from the Main OCR Profile.
| ! | There are some specific requirements for what results from a Layer Extractor can be merged with the main OCR results. The extractor MUST meet these requirements, or it will not replace the results from the Main OCR Profile.
|



