2.80:Layered OCR (OCR Engine): Difference between revisions

Revision as of 11:48, 9 July 2020

Layered OCR enables you to run secondary OCR Profiles on a single page. The OCR results from these secondary OCR Profiles are merged with (or layered on top of) the primary OCR Profile's results.

About

You can use Layered OCR by selecting it as your OCR Engine in an OCR Profile. While not itself an OCR Engine, such as Transym or Tesseract, it allows you to obtain OCR text with multiple OCR Profiles, each using their own OCR engines.

For example, certain OCR engines have advantages over others in specific cases. Transym performs well in most cases. However, it does not do well with certain specialized print types, such as MICR or handwriting. Another engine may perform better in these cases. Microsoft's Azure Computer Vision does better than most OCR engines at recognizing handwriting (but requires a licence key from Microsoft). Google's Tesseract has the capability to train fonts. Grooper ships with both Transym and Tesseract as selectable OCR engines. Furthermore, training files for the MICR, OCR-A, and OCR-B fonts are included.

For the check below, an OCR Profile using Transym performed well generally, but failed to read the MICR line at the bottom. Tesseract got the MICR line, but had issues recognizing other parts of the check.

Transym accurately reads the check's text except for the MICR Line	Tesseract has issues with other parts of the check.

With Layered OCR you can use an OCR Profile using Transym as your primary or baseline OCR results (seen in teal), and target the MICR line with an extractor to pull the results from an OCR Profile using Tesseract (seen in orange).

This can greatly improve your OCR results. The secondary layers can target segments of text better recognized by different OCR Profiles and merge the results with your main OCR Profile.

How It Works

Layered OCR has three basic steps.

The Main OCR Profile property establishes the primary OCR Profile. Here, you will point to a configured OCR Profile you want to use as your baseline OCR.
The Layers property allows you to use secondary OCR Profiles. Here, you will add one more more Layers pointing to a second configured OCR Profile and an Extractor.
The Extractor returns segments of text recognized by the secondary (or layer) OCR Profiles, and replaces the results from the Main OCR Profile.

@@ Line 13: / Line 13: @@
 For example, certain OCR engines have advantages over others in specific cases.  Transym performs well in most cases.  However, it does not do well with certain specialized print types, such as [https://en.wikipedia.org/wiki/Magnetic_ink_character_recognition MICR] or handwriting.  Another engine may perform better in these cases.  Microsoft's [https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/ Azure Computer Vision] does better than most OCR engines at recognizing handwriting (but requires a licence key from Microsoft).  Google's [https://en.wikipedia.org/wiki/Tesseract_(software) Tesseract] has the capability to train fonts.  Grooper ships with both Transym and Tesseract as selectable OCR engines.  Furthermore, training files for the MICR, OCR-A, and OCR-B fonts are included.
-{|style="margin:auto; text-align:center"
+{|style="margin:auto; text-align:center" cellpadding=10 cellspacing=5
 |colspan=2|For the check below, an OCR Profile using Transym performed well generally, but failed to read the MICR line at the bottom.  Tesseract got the MICR line, but had issues recognizing other parts of the check.
 |-
@@ Line 25: / Line 26: @@
 |-
 |colspan=2|[[file:Layered OCR Checks 04.png|border]]
+|}
+This can greatly improve your OCR results.  The secondary layers can target segments of text better recognized by different OCR Profiles and merge the results with your main OCR Profile.
+=== How It Works ===
 Layered OCR has three basic steps.
-#
+# The '''''Main OCR Profile''''' property establishes the primary OCR Profile.  Here, you will point to a configured OCR Profile you want to use as your baseline OCR.
+# The '''''Layers''''' property allows you to use secondary OCR Profiles.  Here, you will add one more more Layers pointing to a second configured OCR Profile and an Extractor.
+# The Extractor returns segments of text recognized by the secondary (or layer) OCR Profiles, and replaces the results from the Main OCR Profile.
 <!--
@@ Line 39: / Line 48: @@
 # Each subsequent layer is configured to run an additional OCR Profile and an "Extractor".  This is key, as these are run independent of the "Main OCR Profile", so specialized OCR Profiles can be created and utilized to ensure the desired data is extracted properly.
 # The extraction results of each subsequent layer are then merged into the final OCR output.
+{|cellpadding="10" cellspacing="5"
+|-style="background-color:#f89420; color:white"
+|style="font-size:14pt"|'''!'''||There are some specific requirements for what results from a Layer Extractor can be merged with the main OCR results.  The extractor MUST meet these requirements, or it will not replace the results from the Main OCR Profile.
+* The extractor must return a '''contiguous sequence of characters''' on a '''single line of text.'''
+** This is the most important consideration to keep in mind.  This means you '''cannot''' merge OCR Results if the extractor returns results on multiple lines.  You cannot, for example, create an OCR Layer that merges the results of a full paragraph.
+** Data Type Collation Providers that produce results on multiple lines are not suitable for use, such as Arrays and Ordered Arrays in Vertical Mode.
+** Collation Providers that produce non-contiguous output values, such as Arrays and Ordered Arrays, can also cause some problems.  Results will not merge properly unless the array elements are '''contiguous''', meaning the text in the first array element should not skip characters between the next element.
+* FuzzyRegEx and FuzzyList modes '''CAN''' be used to correct output.  This can be a great way to repair labels recognized with minor OCR errors.
+** Keep in mind, the <code>\r\n</code> characters at the end of a line can be swapped in a fuzzy match.  Be sure to list these characters as immutable in your Fuzzy Match Weightings, using the syntax <code>Immutable=\r\n</code>.  This will prevent unintentional matches across line breaks.
+* '''DO NOT''' use fuzzy lexicon lookups, output formats, or lexicon translation to modify output values.
+** The result will not replace the main OCR results if you do.  The FuzzyRegEx and FuzzyList match modes are the only ways Layered OCR can modify the results of the secondary OCR Profile before merging with the main OCR Profile's results.
+|}
 == Use Cases ==