Field Class (Node Type)

Field Classes are Data Extractors that use supervised machine learning in order to find the right value on a page. This is done by training examples of positive candidates. Field Classes use two Data Extractors to do this:

A Value Extractor
and a Feature Extractor

The Value Extractor finds specified output. There can be multiple possible values (candidates) returned by the Value Extractor. To find the context that differentiates the right candidate from the wrong one, the Feature Extractor is written to return words, phrases or other labels that can identify the value in question. From the list of value candidates, the correct value is trained as a positive candidate. The features around it returned by the Feature Extractor are given positive weightings using a TF-IDF algorithm. The extractor will use the weightings of these features on other documents to identify the correct value.