Data Instance

From Grooper Wiki
Jump to navigation Jump to search

Data instances are an encapsulation of text data within the document.

The largest data instance would be the document itself. Individual pages would be smaller sub-instances of the document level data instance. If you want to execute an extractor on page and not the whole document, you effectively execute it on the page instance of the document instance. Data Sections allow Grooper users to define how the document is subdivided to execute an extractor on a section instance of the document instance. An extractor's result itself is a data instance as well. It is a returned portion of the document's text.

Grooper uses data instances in a variety of ways. In the case of Input Filters and Data Sections data instances are used to filter text data to limit an extractor's result to a certain portion of the document. For the Row Match table extraction method, the tables structure is established by the a Row Extractor which returns a data instance for each row in the table.

As well as the text's character data, data instances are comprised of the text data's position on the document as well. This is critical information for certain Grooper operations, such as the Key-Value Pair Collation Provider, where the spatial relationship between data instances is important to return data in a particular way.