2.90:Read Zone (Value Extractor): Difference between revisions

Revision as of 16:07, 9 October 2020

Read Zone allows you to extract text data in a rectangular region (called a "extraction zone" or just "zone") on a document. This can be a fixed zone, extracting text from the same location on a document, or a zone relative to an extracted text anchor or shape location on the document.

Read Zone is a Value Extractor option available to Data Fields in a Data Model.

About

Highly structured documents organize information into a series of data fields. These fields will have a label identifying what the field contains, such as "Name", and a corresponding value, such as "John Doe". While the values for these fields will change from document to document, their position on the document will remain constant.

The Read Zone extractor extracts data using this feature of document layouts.

As long as you can be reasonably assured the data you want to find will be in the same spot from document to document, you don't necessarily need anything fancier than extracting whatever text is in that known location.

Read Zone populates data in Data Fields by drawing a rectangle on a location on a page. Whatever text was obtained from the Recognize activity (either via OCR or native text extraction) that falls within the boundaries of that rectangle (or "zone") populates the Data Field.

For the zone drawn on the document...	...the text data falling within that zone will be extracted.

Read Zone also has the capability to anchor this extraction zone to another location on the document. For example, due to issues with printing or scanning, the location of the value may shift from document to document. It's more than possible that zone could extract the data fine on one document but be slightly off on another.

The margins here are different from the document above...	...resulting in the wrong extracted data.

Several configuration options allow you to place the extraction zone relative to another piece of information. This serves as an "anchor" for the zone. Instead of a fixed position on all documents, the zone is placed relative to this anchor's position. For example, in this case the label "1. Last Name" could be an anchor. If you can pattern match that field label with regular expression, the zone you draw on the document will extract the value relative to that label's position.

Anchored off the field label...	...the zone falls on the right page location.

FYI

Read Zone is new to version 2.90. Similar functionality was performed by Zonal Extract and Anchored Extract in version 2.80 or using "Data Element Profiles" in older versions.

How To

Enable Read Zone

Read Zone is an option for the Value Extractor property of a Data Field.

To use this extractor, select a Data Field in a Data Model.
Select the Value Extractor property.
Choose Read Zone from the dropdown menu.

The Read Zone extractor has four Location property options. You must choose one of these options in order for Read Zone to function.

Expand the Read Zone sub-properties.
Choose your Location option.

Each one has slightly different functionality and configurations. The four Location options are as follows:

Fixed Region
Relative Region
Shape Region
Text Region

Each option is detailed in the How To sections below.

Fixed Region

Draw the ZoneTest Extraction

Draw The Zone

The Fixed Region option is the simplest to set up. As the name implies, the extraction zone will be fixed on the page. It will stay in the same coordinates for every document. All you need to do is draw the box where you want to extract data.

Expand out the Location sub-properties and select the Bounds property.
Press the ellipsis button at the end.
This will bring up the "Edit Zone" window. Press the "Select Region" button, if it is not already selected.
With your mouse, draw a box around the text you want to select. Remember, any text falling inside of this box will be extracted. Any outside of the box will be missed. Make sure your box is the appropriate size to capture all field values for this document.

You will place a green box on the page. Any text falling within this box will be extracted. You can move the box around the page and use the transform controls on the corners and edges of the box to edit its width and height (as well as using the Left, Top, Width, and Height properties)

Press the "Ok" button when finished placing the zone.

With a Document Folder selected, press the Test Extraction button to verify our results.

Success! The last name "Cleaugh" is extracted from the OCR text of this document.

Notice the green box around "Cleaugh" on the page only extends to the size of the text value extracted. When configuring Read Zone it can be useful to see the full size of the box you drew earlier. This can lead to some confusion as to what is or is not being extracted and why while testing your Read Zone configurations. That is why the Output Full Region property exists.

Turn the Output Full Region property to True.
Press the "Test Extraction" button again.
This changes absolutely nothing in terms of what data is extracted, but can be useful in your configuration testing. We will keep this property set to True for this example and the other Location option examples.

@@ Line 3: / Line 3: @@
 </blockquote>
-''Read Zone'' is a ''''''Value Extractor''''' option available to '''[[Data Field]]s''' in a '''[[Data Model]]'''.
+''Read Zone'' is a '''''Value Extractor''''' option available to '''[[Data Field]]s''' in a '''[[Data Model]]'''.
 == About ==
@@ Line 63: / Line 63: @@
 |style="font-size:14pt"|'''FYI'''||''Read Zone'' is new to version 2.90.  Similar functionality was performed by [[Zonal Extract]] and [[Anchored Extract]] in version 2.80 or using "Data Element Profiles" in older versions.
 |}
+== How To ==
+=== Enable Read Zone ===
+{|cellpadding=10 cellspacing=5
+|style="width:40%" valign=top|
+''Read Zone'' is an option for the '''''Value Extractor''''' property of a '''Data Field'''.
+# To use this extractor, select a '''Data Field''' in a '''Data Model'''.
+# Select the '''''Value Extractor''''' property.
+# Choose ''Read Zone'' from the dropdown menu.
+|
+[[File:Read-zone-how-to-01.png]]
+|-
+|valign=top|
+The ''Read Zone'' extractor has four '''''Location''''' property options.  You must choose one of these options in order for ''Read Zone'' to function.
+# Expand the ''Read Zone'' sub-properties.
+# Choose your '''''Location''''' option.
+Each one has slightly different functionality and configurations.  The four '''''Location''''' options are as follows:
+* ''Fixed Region''
+* ''Relative Region''
+* ''Shape Region''
+* ''Text Region''
+Each option is detailed in the How To sections below.
+|
+[[File:Read-zone-how-to-02.png]]
+|}
+=== Fixed Region ===
+<tabs style="margin:20px">
+<tab name="Draw the Zone" style="margin:20px">
+=== Draw The Zone ===
+{|cellpadding=10 cellspacing=5
+|style="width:40%" valign=top|
+The ''Fixed Region'' option is the simplest to set up.  As the name implies, the extraction zone will be fixed on the page.  It will stay in the same coordinates for every document.  All you need to do is draw the box where you want to extract data.
+# Expand out the '''''Location''''' sub-properties and select the '''''Bounds''''' property.
+# Press the ellipsis button at the end.
+# This will bring up the "Edit Zone" window.  Press the "Select Region" button, if it is not already selected.
+# With your mouse, draw a box around the text you want to select.  Remember, any text falling inside of this box will be extracted.  Any outside of the box will be missed.  Make sure your box is the appropriate size to capture all field values for this document.
+|
+[[File:Read-zone-how-to-03.png]]
+|-
+|valign=top|
+You will place a green box on the page.  Any text falling within this box will be extracted.  You can move the box around the page and use the transform controls on the corners and edges of the box to edit its width and height (as well as using the '''''Left''''', '''''Top''''', '''''Width''''', and '''''Height''''' properties)
+# Press the "Ok" button when finished placing the zone.
+|
+[[File:Read-zone-how-to-04.png]]
+|}
+</tab>
+<tab name="Test Extraction" style="margin:20px">
+{|cellpadding=10 cellspacing=5
+|style="width:40%" valign=top|
+# With a Document Folder selected, press the Test Extraction button to verify our results.
+Success!  The last name "Cleaugh" is extracted from the OCR text of this document.
+#<li value=2> Notice the green box around "Cleaugh" on the page only extends to the size of the text value extracted.  When configuring ''Read Zone'' it can be useful to see the full size of the box you drew earlier.  This can lead to some confusion as to what is or is not being extracted and why while testing your ''Read Zone'' configurations.  That is why the '''''Output Full Region''''' property exists.
+|
+[[File:Read-zone-how-to-05.png]]
+|-
+|valign=top|
+# Turn the '''''Output Full Region''''' property to ''True''.
+# Press the "Test Extraction" button again.
+# This changes absolutely nothing in terms of what data is extracted, but can be useful in your configuration testing.  We will keep this property set to ''True'' for this example and the other '''''Location''''' option examples.
+|
+[[File:Read-zone-how-to-06.png]]
+|}
+</tab>
+</tabs>
+=== Relative Region ===
+=== Shape Region ===
+=== Text Region ===
+=== Auto Snap ===
+=== Re-OCRing the Zone ===