Output Extractor Key

From Grooper Wiki
Revision as of 10:42, 30 April 2020 by Randallkinard (talk | contribs) (Created page with "<blockquote style="font-size:14pt">Another weapon in the arsenal of powerful '''Grooper''' classification techniques</blockquote> ==About== '''''Output Extractor Key''''' is...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Another weapon in the arsenal of powerful Grooper classification techniques


Output Extractor Key is a property on a the Data Type extractor, exposed when the Collation property is set to Individual. When set to True, each output value will be set to a key representing the name of the extractor which produced the match. It is useful when extracting non-word classification features.

The main purpose of this property is to supplement the capabilities of Grooper's classification technology. When using Lexical classification, a Content Model must use an extractor to collect the lexical features upon training. A common use-case is to have the extractor collect words, which is useful when the semantic content of a document varied among examples. However, this breaks down when a document consists mainly of repeated types of information. Take, for example, a bank statement. With no keywords present on the document, the only way to properly classify the document is to recognize that it contains a high frequency of transaction line items. It would be highly impractical to train Grooper to understand every variation of a transaction line item.

This is where the Output Extractor Key property comes into play. In using this property, one can establish an extractor which will match transaction line items on the document, and return an output such as "feature_transaction" to the classification engine. With this approach, a document containing a high frequency of "transaction" features, let's say ... 50, will be treated as though it contained 50 separate occurences of the phrase "feature_transaction".