OCR Reader (Result Post Processor): Difference between revisions
Configadmin (talk | contribs) Created page with "frame|The OCR Reader post processor selected on a Data Type's property panel. <blockquote style="font-size:14pt"> The '''OCR Reader''' post process..." |
Configadmin (talk | contribs) No edit summary |
||
| Line 29: | Line 29: | ||
[[image: | [[image:ocrpp001.png|center]] | ||
| Line 35: | Line 35: | ||
<tab name="Step 2"> | <tab name="Step 2"> | ||
===== Enable the Post Processor ===== | ===== Enable the Post Processor ===== | ||
In the | In the "Output" section of the Data Type, we'll set the "Post Processing" property to "OCR Reader". | ||
[[image: | [[image:ocrpp002.png|center]] | ||
| Line 51: | Line 51: | ||
[[image: | [[image:ocrpp003.png|center]] | ||
| Line 60: | Line 60: | ||
[[image: | [[image:ocrpp004.png|center]] | ||
</tab> | </tab> | ||
</tabs> | </tabs> | ||
| Line 67: | Line 67: | ||
{|cellpadding="10" cellspacing="5" | {|cellpadding="10" cellspacing="5" | ||
|-style="background-color:# | |-style="background-color:#ddf5f5" | ||
|OCR Profile||The OCR Profile to use for character extraction. | |OCR Profile||The OCR Profile to use for character extraction. | ||
|-style="background-color:# | |-style="background-color:#ddf5f5" | ||
|Region||Specifies a region, relative to each output instance, where OCR should be performed. | |Region||Specifies a region, relative to each output instance, where OCR should be performed. | ||
|-style="background-color:# | |-style="background-color:#ddf5f5" | ||
|Auto Snap Distance||Specifies the maximum distance for an auto snap operation, which automatically aligns the edges of the zone to lines on the document. | |Auto Snap Distance||Specifies the maximum distance for an auto snap operation, which automatically aligns the edges of the zone to lines on the document. | ||
|-style="background-color:# | |-style="background-color:#ddf5f5" | ||
|Auto Snap Margin||When the auto snap feature is in use, specifies an additional amount to shrink the zone on each edge. | |Auto Snap Margin||When the auto snap feature is in use, specifies an additional amount to shrink the zone on each edge. | ||
|-style="background-color:# | |-style="background-color:#ddf5f5" | ||
|Value Extractor||An optional extractor to be executed against the OCR content. | |Value Extractor||An optional extractor to be executed against the OCR content. | ||
|-style="background-color:# | |-style="background-color:#ddf5f5" | ||
|Line Separator||When capturing multiple lines of text, specifies how line breaks will be represented in the output. | |Line Separator||When capturing multiple lines of text, specifies how line breaks will be represented in the output. | ||
|-style="background-color:# | |-style="background-color:#ddf5f5" | ||
|Output Full Region||Specifies whether the highlight region of each output instance will reflect the full OCR area, or only the area containing text. | |Output Full Region||Specifies whether the highlight region of each output instance will reflect the full OCR area, or only the area containing text. | ||
Revision as of 16:59, 20 December 2019

The OCR Reader post processor allows you to run additional OCR on a region nearby a label (which has been returned as the result of a Data Type extractor) and return the OCR'd text.
This is especially useful on documents where data is printed in a special font, or even when the label and value fonts are different entirely.
This region can be relative to the initial Data Type extractor's result, or be configured to automatically "snap" to a bounding box.
Example
We want to extract key values from a form. The problem is that this form contains mixed fonts, and the values we want are displayed in the OCR-A font, which is sometimes troublesome for standard OCR engines.
The idea for this scenario is to first extract the label for the value we want using a Data Type extractor 2. First Name, and then set the "Post Processing" property of that Data Type to run a special OCR Profile designed to extract the value associated with the label Benjamin.
Steps
Create the Data Type
First, we want to create the "Data Type" that will extract the label with the following "Pattern".
(\d\.\s)?first name

Enable the Post Processor
In the "Output" section of the Data Type, we'll set the "Post Processing" property to "OCR Reader".

Configure the Post Processor
Once we've chosen "OCR Reader", we can expand it to reveal its configurable properties.
- Set the OCR Profile to a profile designed specifically to deal with OCR-A fonts.
- Set Auto Snap Distance to "1.5in".
- Set Auto Snap Margin to "1pt, 12pt, 2pt, 1pt".
- Set Output Full Region to "True".

Properties
| OCR Profile | The OCR Profile to use for character extraction. |
| Region | Specifies a region, relative to each output instance, where OCR should be performed. |
| Auto Snap Distance | Specifies the maximum distance for an auto snap operation, which automatically aligns the edges of the zone to lines on the document. |
| Auto Snap Margin | When the auto snap feature is in use, specifies an additional amount to shrink the zone on each edge. |
| Value Extractor | An optional extractor to be executed against the OCR content. |
| Line Separator | When capturing multiple lines of text, specifies how line breaks will be represented in the output. |
| Output Full Region | Specifies whether the highlight region of each output instance will reflect the full OCR area, or only the area containing text. |

