2021:Value Reader (Node Type)

From Grooper Wiki
Revision as of 12:33, 9 December 2020 by Dgreenwood (talk | contribs) (u)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The Value Reader is a data extraction object in Grooper. It allows users to return values from a document in a variety of ways, including regular expression pattern matching, optical mark recognition, and barcode detection.

Value Readers are Grooper's "one stop shop" for data extraction. They return a single or a list of numerical or text results from a page or document folder's text data (obtained via OCR or native text extraction from the Recognize activity).

About

The 'Value Reader is a new extraction object introduced in Grooper 2021. It is designed to expand on the extractor functionality of Grooper's regular expression pattern matching capabilities to include newer extraction capabilities, such as extracting values next to OMR (optical mark recognition) checkboxes and barcode values. In previous versions, this functionality was split across multiple objects (or properties of multiple objects). The Value Reader object

Do you need to extract a date? A Value Reader can do that! Do you need to extract anything matching a list of values? A Value Reader can do that! Do you need to extract English language unigrams (or bigrams etc)? A Value Reader can do that! Do you need to extract a value from a barcode? A Value Reader can do that! Do you need to extract the label next to a checked checkbox? A Value Reader can do that!

Do you need to find any value at all? You're going to use some kind of configuration of the Value Reader to do it.

Value Readers locate results using a variety of Extractor Types.

Value Readers vs Data Types

TO BE EXPANDED

Before version 2021, the Data Type extractor object was considered the bread and butter of data extraction. For many Grooper users a "data extractor" and a Data Type are synonymous. In some ways, this may remain to be the case. However, the introduction of Value Reader object was (at least in part) designed to conceptually separate a Data Type from a "general purpose extractor" to emphasize its primary function in Grooper: Data Collation.

While both Value Readers and Data Types are considered "extractors", they really have two different jobs as far as Grooper is concerned. One way to think of this is a Value Reader is a "data finder" while a Data Type is a "data collator.

Because data collation is so important for a variety of extraction techniques it's almost natural to equate collated data with extracted data. But, there's really two parts of what's going on. First, values are extracted from the document's text data then they are collated and finally returned by the Data Type. Values