2.90:Read Zone (Value Extractor)
Read Zone allows you to extract text data in a rectangular region (called a "extraction zone" or just "zone") on a document. This can be a fixed zone, extracting text from the same location on a document, or a zone relative to an extracted text anchor or shape location on the document.
Read Zone is a Value Extractor option available to Data Fields in a Data Model.
About
|
Highly structured documents organize information into a series of data fields. These fields will have a label identifying what the field contains, such as "Name", and a corresponding value, such as "John Doe". While the values for these fields will change from document to document, their position on the document will remain constant. |
|
The Read Zone extractor extracts data using this feature of document layouts. As long as you can be reasonably assured the data you want to find will be in the same spot from document to document, you don't necessarily need anything fancier than extracting whatever text is in that known location. |
Read Zone populates data in Data Fields by drawing a rectangle on a location on a page. Whatever text was obtained from the Recognize activity (either via OCR or native text extraction) that falls within the boundaries of that rectangle (or "zone") populates the Data Field.
| For the zone drawn on the document... | ...the text data falling within that zone will be extracted. |
Read Zone also has the capability to anchor this extraction zone to another location on the document. For example, due to issues with printing or scanning, the location of the value may shift from document to document. It's more than possible that zone could extract the data fine on one document but be slightly off on another.
| The margins here are different from the document above... | ...resulting in the wrong extracted data. |
Several configuration options allow you to place the extraction zone relative to another piece of information. This serves as an "anchor" for the zone. Instead of a fixed position on all documents, the zone is placed relative to this anchor's position. For example, in this case the label "1. Last Name" could be an anchor. If you can pattern match that field label with regular expression, the zone you draw on the document will extract the value relative to that label's position.
| Anchored off the field label... | ...the zone falls on the right page location. |
| FYI | Read Zone is new to version 2.90. Similar functionality was performed by Zonal Extract and Anchored Extract in version 2.80 or using "Data Element Profiles" in older versions. |
How To
Enable Read Zone
|
Read Zone is an option for the Value Extractor property of a Data Field.
|
|
|
The Read Zone extractor has four Location property options. You must choose one of these options in order for Read Zone to function.
Each one has slightly different functionality and configurations. The four Location options are as follows:
Each option is detailed in the How To sections below. |
Fixed Region
Draw The Zone
|
The Fixed Region option is the simplest to set up. As the name implies, the extraction zone will be fixed on the page. It will stay in the same coordinates for every document. All you need to do is draw the box where you want to extract data.
|
|
|
You will place a green box on the page. Any text falling within this box will be extracted. You can move the box around the page and use the transform controls on the corners and edges of the box to edit its width and height (as well as using the Left, Top, Width, and Height properties)
|
Test Extraction
Success! The last name "Cleaugh" is extracted from the OCR text of this document.
|
|
|













