2023:Read Zone (Value Extractor)
|
WIP |
This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly. This tag will be removed upon draft completion. |
The Read Zone extractor allows you to extract text data in a rectangular region (called a "extraction zone" or just "zone") on a document. This can be a fixed zone, extracting text from the same location on a document, or a zone relative to an extracted text anchor or shape location on the document.
About
Read Zone has a few different options for where the box is placed using the Location property. This can be one of four options:
- Fixed Region
- Relative Region
- Text Region
- Shape Region
The Read Zone extractor can optionally re-process the text data with an OCR Profile. This can be used to perform custom OCR on the extracted text.
The text in the zone can also be itself extracted by a Value Extractor. This allows you to break up the document into a smaller portion and run an extractor on just the zone instead of the full document. Essentially, you use the Read Zone extractor to create a smaller data instance (from the larger document data instance) and use its Value Extractor property to return data from the smaller data instance.
How To
The Location Property
Fixed Region
This option is the simplest to set up. As the name implies, the extraction zone will be fixed on the page. It will stay in the same coordinates for every document. All you need to do is draw the box where you want to extract data.
Relative Region
Instead of setting the extraction zone in a fixed location for every document, the Relative Region mode will anchor the zone to a text label on the document. The extraction zone's position will change relative to the label's position on the document, but will still have the same drawn dimensions.
This option is useful to overcome issues arising during scanning printed documents. Slight variations can occur as to where a value is when printing or scanning a document, even for very structured documents. This can cause problems when drawing a single fixed region for the extraction zone. However, if you can anchor the zone off an extractable text value, the zone's position will shift according to that anchor's position.
Auto Snap
On many documents, such as the Application for Cow Ownership document we have been using in these examples so far, you will have a grid of lines enclosing the data you want to return. This can also be found in things like tables.
Grooper can use these lines as guides to determine what needs to be extracted. You can use this feature by enabling the Auto Snap property.
Text Region
The Text Region option creates an extraction zone using the logical boundaries of an extraction result. This can return all the text falling within the boundaries of the rectangle around the extractor's result.
This can also be configured to provide results in a similar way the Relative Region option does, using text anchors located by an extractor to position the extraction zone's location. This means both methods can be used to position the zone relative to a point from document to document. The main difference is in how the zone is drawn.
Shape Region
The Shape Region option is extremely similar to the Text Region option. However, instead of using text to anchor the extraction zone, it uses a shape detected from a Shape Detection or Shape Removal IP Command.
This is the least common method used.


















