Extract (Activity)

From Grooper Wiki
Jump to navigation Jump to search

The Extract activity extracts defined data elements from a document.

About

Data extraction is configured using Data Model objects in a Content Model. This is where you define the data elements you wish to extract from your documents. Appropriately, you define the data to be extracted by adding Data Element objects to the Data Model. There are three main Data Elements:

  • Data Field
  • Data Section
  • Data Table
    • Data Tables are also configured with their own special child Data Element: The Data Column object.

The Data Field object is the simplest Data Element. This will allow you to extract a simple list of fields (Such as "Invoice Date", "Invoice Number", "Invoice Amount", etc.).

The Data Table object allows you to extract tabular data. Tables are more complex than simple fields, in that they are a repeating series of fields organized into rows and columns. This requires a more robust Data Element to describe this data structure; hence, the addition of the Data Table object along with it's child Data Column objects.

The Data Section object allows you to extract Data Fields and/or Data Tables in repeating sections of a document. Data Sections may even have their own child Data Sections. This allows you to divide your document into sections and sub-sections, giving your Data Model its own levels of data hierarchy.

When the Extract activity runs, it will populate the Data Model with values extracted from the document's text data (obtained from the Recognize activity). How this text is located and returned is determined by the extraction configurations set on each Data Element.

Data Extractors

After defining what Data Elements you want to extract, you need to define how to populate those fields, tables, and sections with data. This is done with Data Extractors, often shorthanded to just "extractors".

Data Hierarchy

As discussed earlier, you can create hierarchical relationships within a single Data Model using Data Sections and Data Tables. As a direct child of a Data Model a Data Field will execute against the entire document. However, as a child of a Data Section a Data Field will only execute against the portion of the document described by that Data Section.

Data Models also benefit from a Content Model's inheritance structure. For example, the Content Model itself may have a Data Model but a Document Type may also have its own Data Model. The Document Type, as a child of the Content Model, will inherit all Data Elements from the parent Content Model's Data Model.