Extract (Activity)
The Extract activity extracts defined data elements from a document.
Data extraction is configured using Data Model objects in a Content Model. This is where you define the data elements you wish to extract from your documents. Appropriately, you define the data to be extracted by adding Data Element objects to the Data Model. There are three main Data Elements:
- Data Field
- Data Section
- Data Table
- Data Tables are also configured with their own special child Data Element: The Data Column object.
The Data Field object is the simplest Data Element. This will allow you to extract a simple list of fields (Such as "Invoice Date", "Invoice Number", "Invoice Amount", etc.).
The Data Table object allows you to extract tabular data. Tables are more complex than simple fields, in that they are a repeating series of fields organized into rows and columns. This requires a more robust Data Element to describe this data structure; hence, the addition of the Data Table object along with it's child Data Column objects.
The Data Section object allows you to extract Data Fields and/or Data Tables in repeating sections of a document. Data Sections may even have their own child Data Sections. This allows you to divide your document into sections and sub-sections, giving your Data Model its own levels of data hierarchy.
Data Models also benefit from a Content Model's inheritance structure. For example, the Content Model itself may have a Data Model but a Document Type may also have its own Data Model. The Document Type, as a child of the Content Model, will inherit all Data Elements from the parent Content Model's Data Model.
Data Extractors
After defining what Data Elements you want to extract, you need to define how to populate those fields, tables, and sections with data. This is done with Data Extractors, more often shorthanded to just "extractors".