Field Class

From Grooper Wiki
Jump to navigation Jump to search

Field Classes are Data Extractors that use supervised machine learning in order to find the right value on a page. This is done by training examples of positive candidates. Field Classes use two Data Extractors to do this:

The Value Extractor finds specified output. There can be multiple possible values (candidates) returned by the Value Extractor. To find the context that differentiates the right candidate from the wrong one, the Feature Extractor is written to return words, phrases or other labels that can identify the value in question. From the list of value candidates, the correct value is trained as a positive candidate. The features around it returned by the Feature Extractor are given positive weightings using a TF-IDF algorithm. The extractor will use the weightings of these features on other documents to identify the correct value.

As with any extractor, data context can be critical to understanding your documents and building the Field Class extractor. For more information on this topic, visit the Data Context article.