2021:Data Export (Export Definition): Difference between revisions
Dgreenwood (talk | contribs) |
Dgreenwood (talk | contribs) |
||
| Line 102: | Line 102: | ||
|} | |} | ||
</tab> | </tab> | ||
<tab name="Index Data" style="margin:20px"> | <tab name="Verifying Index Data" style="margin:20px"> | ||
===Index Data=== | ===Verifying Index Data=== | ||
Before the '''Database Export''' activity can send data, it must have data! | Before the '''Database Export''' activity can send data, it must have data! | ||
It's easy to get in the habit of testing extraction on a '''Data Field''' or a '''Data Model''' and feel good about the results, but it must be understood that the information displayed when doing so is in memory, or temporary. When | It's easy to get in the habit of testing extraction on a '''Data Field''' or a '''Data Model''' and feel good about the results, but it must be understood that the information displayed when doing so is in memory, or temporary. When testing a '''''Data Export''''' configuration, it's a good idea to ensure extracted data is actually present for document '''Batch Folders''' whose data you want to export. | ||
When the '''Extract''' activity runs, it executes all extraction logic for the '''Data Model''' tied to a '''Batch Folder's''' classified '''Document Type.''' For each '''Batch Folder''' document, it creates "Index Data" and marries it to the '''Batch Folder''' via a '''[https://en.wikipedia.org/wiki/JSON JSON]''' file called ''Grooper.DocumentData.json''. | |||
A couple of ways to verify its existence are as follows: | A couple of ways to verify its existence are as follows: | ||
Revision as of 12:21, 6 October 2021
Data Export is one of the Export Types available when configuring an Export Behavior. It exports extracted document data over a Data Connection, allowing users to export data to a SQL or ODBC compliant database.
About
|
You may download and import the file below into your own Grooper environment (version 2021). This contains Batches with the example document(s) and a Content Model discussed in this article
|
About
The most important goal of Grooper is to deliver accurate data to line of business systems that allow the information to be integrated into impactful business decisioning. Tables in databases remain, to this day, one of the main vessels by which this information is stored. Data Export is one of the main ways to deliver data collected in Grooper.
There are three important things to understand when using and configuring Data Export to export data to a database:
- The Export activity.
- Data Elements
- Data Connections
The Export Activity
Grooper's Export activity is the mechanism by which Grooper-processed document content is delivered to an external storage platform. Export configurations are defined by adding Export Type definitions to Export Behaviors. Data Export is the Export Type designed to export Batch Folder document data collected by the Extract activity to a Microsoft SQL Server or ODBC-compliant database server.
For more information on configuring Export Behaviors, please visit the full Export activity article.
Data Elements
Data Export is the chief delivery device for "collection" elements. Data is collected in Grooper by executing the Extract activity, extracting values from a Batch Folder according to its classified Document Type's Data Model.
A Data Model in Grooper is a digital representation of document data targeted for extraction, defining the data structure for a Content Type in a Content Model. Data Models are objects comprised of Data Element objects, including:
- Data Fields used to target single field values on a document.
- Data Tables and their child Data Columns used to target tabular data on a document.
- Data Sections used to divide a document into sections to simplify extraction logic and/or target repeating sections of extractable Data Elements on a single document.
With Data Models and their child Data Elements configured, Grooper collects values using the Extract activity.
Depending on the Content Type hierarchy in a Content Model and/or Data Element hierarchy in a Data Model, there will be a collection, or "set", of values for varying data scope of a fully extracted Data Model's hierarchy. That may be the full data scope of the Data Model, including any inherited Data Elements inherited from parent Data Models. It may be a narrower scope of Data Elements like a child Data Section comprised of its own child Data Fields.
Understanding this will be important as Data Export has the ability to take full advantage of Grooper's hierarchical data modeling to flatten complex and inherited data structures. Understanding Data Element hierarchy and scope will also be critical when exporting data from a single document to multiple different database tables to ensure the right data exports to the right places.
Data Connections
Data Export uses a configured Data Connection object to establish a link to SQL or ODBC compliant database tables in a database and intelligently populate said tables. Once this connection is established, collected Data Elements can be mapped to corresponding column locations in one or multiple database tables. Much of Data Export's configuration is assigning these data mappings. The Data Connection presents these mappable data endpoints to Grooper as well as allowing data content to flow from Grooper to the database table when the Export activity processes each Batch Folder in a Batch.
Furthermore, not only can Grooper connect to existing databases using a Data Connection, but it can create whole new databases as well as database tables once a connection to the database server is established.
We discuss how to create Data Connections, add a new database from a Data Connection, and add a new database table from a Data Connection in the #Configuring a Data Connection tutorial below.
How To
Understanding the Forms
Understanding the Content Model
|
The Content Model extracting the data for these documents is fairly straight forward. There are two Document Types, each with their own Data Model. The first Document Type's Data Model is the one representing the one-to-many relationship. Notice for the fields represented once in the document there are Data Fields. For the tabular data, a Data Table was established. The second Document Type's Data Model is using one table extractor to collect all the data, but reporting it to two different tables. It should be noted that the documents in the accompanying Batches had their Document Type assigned manually. The Content Model is not performing any classification. |
Verifying Index Data
Before the Database Export activity can send data, it must have data!
It's easy to get in the habit of testing extraction on a Data Field or a Data Model and feel good about the results, but it must be understood that the information displayed when doing so is in memory, or temporary. When testing a Data Export configuration, it's a good idea to ensure extracted data is actually present for document Batch Folders whose data you want to export.
When the Extract activity runs, it executes all extraction logic for the Data Model tied to a Batch Folder's classified Document Type. For each Batch Folder document, it creates "Index Data" and marries it to the Batch Folder via a JSON file called Grooper.DocumentData.json.
A couple of ways to verify its existence are as follows:
Option 1
|
Option 2
|
<---
Configuring a Data Connection
In order for the Database Export activity to run, it needs an established connection to a database and subsequent table(s). Grooper can connect to an existing database and import references to its tables, or easily enough, you can have Grooper create the database FOR you AND create tables based on data structures present in Grooper!
The following setup will have Grooper create the database and its tables.
It is worth noting that this article cannot tell you specifics about permissions in your environment. The configuration for this article is using Microsoft SQL Server, not ODBC, and has given the active Active Directory user full DB Admin privileges to the SQL environment.
|
|
With a connection to the SQL environment established, we can make Grooper create our database for us.
This Data Connection is not complete until a table references has been imported (this particular Data Connection will end up with three table references). A database has been created and is connected to, and Grooper did the work for us there, so let's keep that up to get tables made. Three different tables need to be created and references imported, due to the two different Document Types and the subsequent Data Models (the second Document Type containing a Data Model with two Data Tables). The following process will describe importing the first table, and will simply need to be repeated for the second and third tables (of which will not consist of the same inheritance notions discussed on step 5 because the scope considered for the Database Export 02 Document Type has Data Tables and no Data Fields.)
Configuring Database Export Activity
Data collected. Connection to database established. Now it's time to configure the activity that will send the data to its final destination.
Processing and Viewing Output Data
With everything configured, you can now execute the activities and view the results.
|
With the activities successfully run, over in SQL Management Studio the output can be seen. |
|





















