Data Instance (Concept)

From Grooper Wiki
(Redirected from Data Instance)

STUB

This article is a stub. It contains minimal information on the topic and should be expanded.

A Data Instance is an encapsulation of text data within a document returned by Grooper's extractors. Data instances are the hierarchy of text data created by Grooper's extractors.

Data instances represent portions of a document.

  • The largest data instance would be the document itself.
  • Individual pages would be smaller sub-instances of the document level data instance.
  • The smallest data instance would be a single character on the document.


Data instances are created by Grooper's extraction process. Extractors execute against a larger text instance, starting with the document itself. Their results are sub-instances, which can then be fed into other extractors, Data Elements, or Activities.

Data instances are composed of primarily two things: (1) A value, the text's character data. (2) A location, the text data's position/coordinates on the document. This is critical information for human operators to locate results in the Document Viewer and for certain Grooper operations, such as the Labeled Value extractor, where the spatial relationship between data instances is important to return data in a particular way.


Grooper uses data instances in a variety of ways.

  • A Data Field returns the data instance returned by its Value Extractor. Not only does this collect the instances value, but it captures its location for Data Viewer users in Review.
  • Data Sections allow Grooper users to define how the document is subdivided to execute an extractor on a section instance of the document instance.
  • Grooper's various Table Extract Methods break up a document into row instances (and sometimes column header instances) to ultimately return cell instances.
  • Input Filters similarly return a smaller instance from a document or larger instance. Data Types use Input Filters to limit results to a certain portion of the document.

Derived Types

These are specialized implementations of a data instance.

General instances:

  • Document Instance - Represents the entire document and serves as the root of a data instance hierarchy.
  • Text Line Instance - Represents a line of text within a document. This is the instance returned by most extractors.

Specialized extractor instances:

  • Check Box Instance - Used to capture OMR values by Labeled OMR and other OMR extractors.
  • Field Class Instance - Represents data instances returned by Field Class extractors.

Data Model related instances:

  • Field Instance - Represents the data instance collected by Data Fields
  • Section Instance Collection - Represents the collection of one or more Section Instances collected by a Data Section. Section Instance Collections may be single instance (having only one Section Instance) or multi-instance (having one or more Section Instances).
  • Section Instance - Represents the extracted content of a Data Section. These are children of the Section Instance Collection.
  • Table Instance - The top level in the instance hierarchy created by a Data Table. Represents the entire table collected by a Data Table.
  • Table Row Instance - Represents a row instance formed by the Data Table's' Table Extract Method.
  • Table Header Instance - Represents column header values and locations used in Table Extract Methods like Tabular Layout.
  • Table Cell Instance - Represents the bottom level in the instance hierarchy of a Data Table. This is the data instance collected for a Data Column in a row instance.