Data Extractors are Grooper objects or property configurations used to isolate and return information from text data on a page.
Data extractors (or simply "extractors") are used in a variety of ways, including (but not limited to):
- Classify documents
- Find data on a page you wish to store outside of Grooper
- Separate documents
There are three types of Data Extractors objects:
- Data Type
- Data Format (These extractors are only created as child objects of a Data Type)
- Field Class
All have configurable properties to leverage information on the page, control how data is extracted, collate extraction results, get around imperfect text data, and format results the extraction returns.
Furthermore, certain objects contain properties whose configuration extracts data. These include:
Author's note: These two options are interchangeable. For some properties, this option is listed as "Internal". For others, it is listed as "Text Pattern". Both behave the same way. They use a Pattern Editor to write and configure a regular expression pattern to extract data (Similar to how a Data Format or the "Pattern" property of a Data Type does).
These properties will also have an option to reference data extractor objects, using the Reference option.