2.90:Read Zone (Value Extractor): Difference between revisions
Dgreenwood (talk | contribs) |
Dgreenwood (talk | contribs) No edit summary |
||
| Line 25: | Line 25: | ||
''Read Zone'' populates data in '''Data Fields''' by drawing a rectangle on a location on a page. Whatever text was obtained from the '''[[Recognize]]''' activity (either via OCR or native text extraction) that falls within the boundaries of that rectangle (or "zone") populates the '''Data Field'''. | ''Read Zone'' populates data in '''Data Fields''' by drawing a rectangle on a location on a page. Whatever text was obtained from the '''[[Recognize]]''' activity (either via OCR or native text extraction) that falls within the boundaries of that rectangle (or "zone") populates the '''Data Field'''. | ||
{|cellpadding=10 cellpadding=5 | {|cellpadding=10 cellpadding=5 style="margin:auto" | ||
|-style="text-align:center" | |-style="text-align:center" | ||
|For the zone drawn on the document...||...the text data falling within that zone will be extracted. | |For the zone drawn on the document...||...the text data falling within that zone will be extracted. | ||
| Line 37: | Line 37: | ||
''Read Zone'' also has the capability to anchor this extraction zone to another location on the document. For example, due to issues with printing or scanning, the location of the value may shift from document to document. It's more than possible that zone could extract the data fine on one document but be slightly off on another. | ''Read Zone'' also has the capability to anchor this extraction zone to another location on the document. For example, due to issues with printing or scanning, the location of the value may shift from document to document. It's more than possible that zone could extract the data fine on one document but be slightly off on another. | ||
{|cellpadding=10 cellpadding=5 | {|cellpadding=10 cellpadding=5 style="margin:auto" | ||
|-style="text-align:center" | |-style="text-align:center" | ||
|The margins here are different from the document above...||...resulting in the wrong extracted data. | |The margins here are different from the document above...||...resulting in the wrong extracted data. | ||
| Line 49: | Line 49: | ||
Several configuration options allow you to place the extraction zone relative to another piece of information. This serves as an "anchor" for the zone. Instead of a fixed position on all documents, the zone is placed relative to this anchor's position. For example, in this case the label "1. Last Name" could be an anchor. If you can pattern match that field label with regular expression, the zone you draw on the document will extract the value relative to that label's position. | Several configuration options allow you to place the extraction zone relative to another piece of information. This serves as an "anchor" for the zone. Instead of a fixed position on all documents, the zone is placed relative to this anchor's position. For example, in this case the label "1. Last Name" could be an anchor. If you can pattern match that field label with regular expression, the zone you draw on the document will extract the value relative to that label's position. | ||
{|cellpadding=10 cellpadding=5 | {|cellpadding=10 cellpadding=5 style="margin:auto" | ||
|-style="text-align:center" | |-style="text-align:center" | ||
|Anchored off the field label...||...the zone falls on the right page location. | |Anchored off the field label...||...the zone falls on the right page location. | ||
| Line 143: | Line 143: | ||
|} | |} | ||
</tab> | |||
<tab name="A Word of Caution" style="margin:20px"> | |||
=== A Word Of Caution === | |||
{|cellpadding=10 cellspacing=5 | |||
|style="width:40%" valign=top| | |||
Remember, the ''Fixed Region'' location's extraction zone stays in the same physical location on the page from document to document. If the text your trying to extract shifts locations due to scanning irregularities or a new document format, this method has the potential to extract the wrong data. | |||
For example, take the two documents here. They are the same document, but one has very different margins than the other. While the registration zone we configured earlier falls on the last name on the left, it does not on the document on the right. | |||
| | |||
[[File:Read-zone-how-to-07.png]] | |||
|- | |||
|valign=top| | |||
Whatever text falls within that extraction zone is extracted. As you can see, for the second document, the text "1. Last Name" is extracted, instead of "Cleugh". | |||
If your documents are not totally uniform, and you're running into issues like this. You may want to explore the other '''''Location''''' options detailed in the tutorials below. | |||
| | |||
[[File:Read-zone-how-to-08.png]] | |||
|} | |||
</tab> | </tab> | ||
Revision as of 08:38, 12 October 2020
Read Zone allows you to extract text data in a rectangular region (called a "extraction zone" or just "zone") on a document. This can be a fixed zone, extracting text from the same location on a document, or a zone relative to an extracted text anchor or shape location on the document.
Read Zone is a Value Extractor option available to Data Fields in a Data Model.
About
|
Highly structured documents organize information into a series of data fields. These fields will have a label identifying what the field contains, such as "Name", and a corresponding value, such as "John Doe". While the values for these fields will change from document to document, their position on the document will remain constant. |
|
The Read Zone extractor extracts data using this feature of document layouts. As long as you can be reasonably assured the data you want to find will be in the same spot from document to document, you don't necessarily need anything fancier than extracting whatever text is in that known location. |
Read Zone populates data in Data Fields by drawing a rectangle on a location on a page. Whatever text was obtained from the Recognize activity (either via OCR or native text extraction) that falls within the boundaries of that rectangle (or "zone") populates the Data Field.
| For the zone drawn on the document... | ...the text data falling within that zone will be extracted. |
Read Zone also has the capability to anchor this extraction zone to another location on the document. For example, due to issues with printing or scanning, the location of the value may shift from document to document. It's more than possible that zone could extract the data fine on one document but be slightly off on another.
| The margins here are different from the document above... | ...resulting in the wrong extracted data. |
Several configuration options allow you to place the extraction zone relative to another piece of information. This serves as an "anchor" for the zone. Instead of a fixed position on all documents, the zone is placed relative to this anchor's position. For example, in this case the label "1. Last Name" could be an anchor. If you can pattern match that field label with regular expression, the zone you draw on the document will extract the value relative to that label's position.
| Anchored off the field label... | ...the zone falls on the right page location. |
| FYI | Read Zone is new to version 2.90. Similar functionality was performed by Zonal Extract and Anchored Extract in version 2.80 or using "Data Element Profiles" in older versions. |
How To
Enable Read Zone
|
Read Zone is an option for the Value Extractor property of a Data Field.
|
|
|
The Read Zone extractor has four Location property options. You must choose one of these options in order for Read Zone to function.
Each one has slightly different functionality and configurations. The four Location options are as follows:
Each option is detailed in the How To sections below. |
Fixed Region
Draw The Zone
|
The Fixed Region option is the simplest to set up. As the name implies, the extraction zone will be fixed on the page. It will stay in the same coordinates for every document. All you need to do is draw the box where you want to extract data.
|
|
|
You will place a green box on the page. Any text falling within this box will be extracted. You can move the box around the page and use the transform controls on the corners and edges of the box to edit its width and height (as well as using the Left, Top, Width, and Height properties)
|
Test Extraction
Success! The last name "Cleaugh" is extracted from the OCR text of this document.
|
|
|
A Word Of Caution
|
Remember, the Fixed Region location's extraction zone stays in the same physical location on the page from document to document. If the text your trying to extract shifts locations due to scanning irregularities or a new document format, this method has the potential to extract the wrong data. For example, take the two documents here. They are the same document, but one has very different margins than the other. While the registration zone we configured earlier falls on the last name on the left, it does not on the document on the right. |
|
|
Whatever text falls within that extraction zone is extracted. As you can see, for the second document, the text "1. Last Name" is extracted, instead of "Cleugh". If your documents are not totally uniform, and you're running into issues like this. You may want to explore the other Location options detailed in the tutorials below. |















