Extractor Type (Property)

From Grooper Wiki

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

An Extractor Type (shorthand for Value Extractor Type) is configured for numerous properties on a wide array of Grooper objects. They are used to return "data instances" from documents for one purpose or another. The Extractor Type defines an operation that reads data from the text or visual content of a document and returns one or more results. Each different Extractor Type uses a specialized logic to return results. Extractor Types are consumed by higher-level objects such as Data Elements, extractor objects, Content Types and more.

About

Extractors are configured all over the place in Grooper. There are around 100 different configuration properties that allow you to configure an extractor to return data from a document. In older versions of Grooper, extraction was limited to simple regular expression pattern matching. As Grooper has evolved, we have developed a number of different mechanisms to extract information from a page or document. We call these different kinds of extractors "Extractor Types".

For any extractor property, you can choose one of the Extractor Types available in Grooper. For example, you might use the List Match extractor to match a state on a document from a list of US states. Or, you might use the Pattern Match extractor to extract dates of various date/time formats.

Currently, there are the following Extractor Types in Grooper:

Text Parsing Extractors

These Extractor Types primarily rely on regular expression, lists of values (such as a Lexicon of field labels) or other forms of text parsing to return values

FYI

Please note, regular expression and other forms of text parsing is the "bread and butter" of how Grooper data extraction works. Other Extractor Types may also utilize regex or other forms of text parsing as part of their configuration. These Extractor Types just rely on it more heavily.

OMR Extractors

These Extractor Types allow you to return values using optical mark recognition. These are useful for extracting values on documents that use checkboxes to detail information.

Barcode Extractors

These Extractor Types allow you to return a value encoded in a barcode.

Zonal Extractors

These Extractor Types extract by drawing a logical rectangle somewhere on a document. These are useful for extracting values on highly structured documents where field values are consistently located on the same position on the page for every document.

Miscellaneous Extractors

These Extractor Types have specialized uses and don't fit in well into the other categories.