2.90:OCR Reader (Result Post Processor)

The OCR Reader post processor selected on a Data Type's property panel.

OCR Reader is a Post Processing option for Data Type extractors. This allows you to define a rectangular region (called an "extraction zone" or just "zone") on a page relative to the Data Type's extraction result. Instead of just the original result, all text falling within the zone (obtained from the Recognize activity) will be returned as the result.

OCR Reader has some additional functionality as well. It has capability to exclude the original result using the Exclude Anchor property, returning everything else in the zone. It can take advantage of Grooper's "Auto Snap" functionality if lines are present on the document to draw the zone without any configuration. The text can be optionally re-processed with a different OCR Profile for highly targeted OCR results.

About

Highly structured documents organize information into a series of data fields. These fields will have a label identifying what the field contains, such as "Name", and a corresponding value, such as "John Doe". While the values for these fields will change from document to document, their position on the document will remain constant.

The OCR Reader result post-processor extracts data using this feature of document layouts.

As long as you can be reasonably assured the data you want to find will be in the same spot from document to document, you don't necessarily need anything fancier than extracting whatever text is in that known location.