2021:Data Export (Export Definition): Difference between revisions

Revision as of 11:21, 6 October 2021

Data Export is one of the Export Types available when configuring an Export Behavior. It exports extracted document data over a Data Connection, allowing users to export data to a SQL or ODBC compliant database.

About

You may download and import the file below into your own Grooper environment (version 2021). This contains Batches with the example document(s) and a Content Model discussed in this article

[[]]

About

The most important goal of Grooper is to deliver accurate data to line of business systems that allow the information to be integrated into impactful business decisioning. Tables in databases remain, to this day, one of the main vessels by which this information is stored. Data Export is one of the main ways to deliver data collected in Grooper.

There are three important things to understand when using and configuring Data Export to export data to a database:

The Export activity.
Data Elements
Data Connections

The Export Activity

Grooper's Export activity is the mechanism by which Grooper-processed document content is delivered to an external storage platform. Export configurations are defined by adding Export Type definitions to Export Behaviors. Data Export is the Export Type designed to export Batch Folder document data collected by the Extract activity to a Microsoft SQL Server or ODBC-compliant database server.

For more information on configuring Export Behaviors, please visit the full Export activity article.

Data Elements

Data Export is the chief delivery device for "collection" elements. Data is collected in Grooper by executing the Extract activity, extracting values from a Batch Folder according to its classified Document Type's Data Model.

A Data Model in Grooper is a digital representation of document data targeted for extraction, defining the data structure for a Content Type in a Content Model. Data Models are objects comprised of Data Element objects, including:

Data Fields used to target single field values on a document.
Data Tables and their child Data Columns used to target tabular data on a document.
Data Sections used to divide a document into sections to simplify extraction logic and/or target repeating sections of extractable Data Elements on a single document.

With Data Models and their child Data Elements configured, Grooper collects values using the Extract activity.

Depending on the Content Type hierarchy in a Content Model and/or Data Element hierarchy in a Data Model, there will be a collection, or "set", of values for varying data scope of a fully extracted Data Model's hierarchy. That may be the full data scope of the Data Model, including any inherited Data Elements inherited from parent Data Models. It may be a narrower scope of Data Elements like a child Data Section comprised of its own child Data Fields.

Understanding this will be important as Data Export has the ability to take full advantage of Grooper's hierarchical data modeling to flatten complex and inherited data structures. Understanding Data Element hierarchy and scope will also be critical when exporting data from a single document to multiple different database tables to ensure the right data exports to the right places.

Data Connections

Data Export uses a configured Data Connection object to establish a link to SQL or ODBC compliant database tables in a database and intelligently populate said tables. Once this connection is established, collected Data Elements can be mapped to corresponding column locations in one or multiple database tables. Much of Data Export's configuration is assigning these data mappings. The Data Connection presents these mappable data endpoints to Grooper as well as allowing data content to flow from Grooper to the database table when the Export activity processes each Batch Folder in a Batch.

Furthermore, not only can Grooper connect to existing databases using a Data Connection, but it can create whole new databases as well as database tables once a connection to the database server is established.

We discuss how to create Data Connections, add a new database from a Data Connection, and add a new database table from a Data Connection in the #Configuring a Data Connection tutorial below.

How To

In the following tutorials, we discuss an an example of how to setup an Export Behavior, using Data Export.

In this example are a couple of different document formats whose data will be collected by a single Content Model. This will help illustrate two key distinctions mentioned above.

The first document (the "Employee Report) will demonstrate the flattening of inherited data.
The second (the "Personnel Information Report") will give us an avenue to export to multiple tables at once.

The example documents have already been classified and their data extracted. Before we get into Data Export configuration specifics, there's a few things we need to understand about "the story so far"

The forms used in this example and how they are formatted.
The Content Model used in this example, its Document Types and how their Data Models represent the data we want to collect.
The extracted index data, which will be exported to a database.

Understanding the FormsUnderstanding the Content ModelVerifying Index Data

Understanding the Forms

The thing to understand about the document displayed on the right is that there is a combination of single instance data (single fields of information) married with multi-instance data (table displaying a dynamic amount of rows), i.e. - a one-to-many relationship. The single instance data (Last Name, First Name, Employee ID), as a result of only being listed once on the document, will only be collected once, but needs to be married to each row of information from the table. While the single instance data is only collected once, it will be reported many times. This document is meant to show flattening inherited data structures.

The second document is collecting only information from the main table, but there is Personally Identifiable Information (PII) in the table that should be protected. As a result, it will export its information to two tables (with the assumption that the second table is "protected".) This document is meant to demonstrate how to export to multiple tables via one step in a Batch Process.

Back to top to continue to next tab

Understanding the Content Model

The Content Model extracting the data for these documents is fairly straight forward. There are two Document Types, each with their own Data Model.

The first Document Type's Data Model is the one representing the one-to-many relationship. Notice for the fields represented once in the document there are Data Fields. For the tabular data, a Data Table was established.

The second Document Type's Data Model is using one table extractor to collect all the data, but reporting it to two different tables.

It should be noted that the documents in the accompanying Batches had their Document Type assigned manually. The Content Model is not performing any classification.

Verifying Index Data

Before the Database Export activity can send data, it must have data!

It's easy to get in the habit of testing extraction on a Data Field or a Data Model and feel good about the results, but it must be understood that the information displayed when doing so is in memory, or temporary. When testing a Data Export configuration, it's a good idea to ensure extracted data is actually present for document Batch Folders whose data you want to export.

When the Extract activity runs, it executes all extraction logic for the Data Model tied to a Batch Folder's classified Document Type. For each Batch Folder document, it creates "Index Data" and marries it to the Batch Folder via a JSON file called Grooper.DocumentData.json.

A couple of ways to verify its existence are as follows:

Option 1

First, navigate to a document Batch Folder object in the node tree.
- Not the Batch, not the root Batch Folder, not a Page object, but specifically a document Batch Folder object.
- This is where the extracted index data information lives.
From there, click on the "Index Data" tab.
After doing so you can see data displayed.
- Were the document classified (the Batch Folder assigned a Document Type), but not successfully extracted (the Extract activity not applied), the data structure would be present, but the fields empty.

Option 2

Another means of verifying is to again, click on the Document object in the node tree.
Click the Advanced tab.
Click the Files tab.
In the List View you should see the Grooper.DocumentData.json file.
When you click on that file, you should see the stored JSON information of the indexed data displayed in the viewer below.

Back to top to continue to next tab

<---

Configuring a Data ConnectionConfiguring Database Export ActivityProcessing and Viewing Output Data

Configuring a Data Connection

In order for the Database Export activity to run, it needs an established connection to a database and subsequent table(s). Grooper can connect to an existing database and import references to its tables, or easily enough, you can have Grooper create the database FOR you AND create tables based on data structures present in Grooper!
The following setup will have Grooper create the database and its tables.
It is worth noting that this article cannot tell you specifics about permissions in your environment. The configuration for this article is using Microsoft SQL Server, not ODBC, and has given the active Active Directory user full DB Admin privileges to the SQL environment.

Expand the node tree to the Infrastructure folder and in the Data Connections folder Add > Data Connection...
In the Add New Data Connection window give the object a name and click OK
As mentioned above, the configuration settings listed here will be specific to this environment:
- Set the Provider property is to SqlServer
- The Server Name property should be the SQL instance. In this case, the hostname of the computer is enough because the SQL environment was established with a default SQL instance, therefore no named instance. Were there a named SQL instance it would something like OKCSPRKI01\MSSQLSERVER
- Because Grooper will create the database, the Database Name property will be left blank for now.
- The User Name and Password properties are left blank here, which passes the current logged in AD user's credentials to SQL.

With a connection to the SQL environment established, we can make Grooper create our database for us.

Click the Create Database button
In the Create Database window, give the database a name, and click the OK button.
The Connection Settings and Database Name properties will be updated to reflect the new database information entered.

This Data Connection is not complete until a table references has been imported (this particular Data Connection will end up with three table references). A database has been created and is connected to, and Grooper did the work for us there, so let's keep that up to get tables made. Three different tables need to be created and references imported, due to the two different Document Types and the subsequent Data Models (the second Document Type containing a Data Model with two Data Tables). The following process will describe importing the first table, and will simply need to be repeated for the second and third tables (of which will not consist of the same inheritance notions discussed on step 5 because the scope considered for the Database Export 02 Document Type has Data Tables and no Data Fields.)

Click the Create Table button. Click the Content Type property and click the drop-down arrow to get a mini node tree view. Select the first Content Type. In the case of this exmaple, it will be the Document Type named Database Export 01.
Becaues the Database Export 01 Content Type was selected, the following Data Element Scope property will have a drop down of which will contain Data Elements relative to this particular Content Type. Selecting the Earnings Data Table will set the baseline structure of the table to be created, and it will inherit upward. Think back to the notion of the one-to-many relationship. This table can have a dynamic number of rows and the Data Columns are capable of capturing and reporting back unlimited instances of data, hence multiple rows. The Data Fields within this scope, however, are only capable of capturing and returning a single piece of data. But, given the nature of inheretence, the table that will be created will make columns not just for the Data Columns of the Data Table, but for each of the Data Fields contained within this scope. Were there Data Fields further up the inheretence tree, say at the base Data Model of the main Content Model the table would also attempt to inheret those as well.
The Table Name property is arbitrary, but by default is a concatenation of the strings passed by the Content Type and Data Element Scope properties.M Click the Execute button to close this window and continue the process forward...
Here's where Grooper is doing the hard work for us. Notice in the Review/Edit SQL Statement window the SQL Statement required to create our table is already written for us. Click the Run SQL Statement button to create the table.
With the table made, the Database Tables List View will have an entry in it. It will initially display with a red dot (as seen in the image), but will change to a green dot when you successfully complete the reference import. The Table Columns List View will display the data structure of the table, while the Data Preview List View will display data within the table. Because this table was just created, it will not have any data to display. Clicking the Import Table Reference button will ... Be warned, the Drop Table button is a powerful one in that it will delete the entire table from SQL. Use this button with EXTREME caution. ...create a Database Table object in the node tree. You cannot use the Drop Table button unless you delete the Database Table object from the node tree.

Back to top to continue to next tab

Configuring Database Export Activity

Data collected. Connection to database established. Now it's time to configure the activity that will send the data to its final destination.

Select a Batch Process Click the Add Step... button. Set the Activity Type property to Database Export Click the drop-down menu of the Database Connection property and select the Data Connection object to be used.
Click the drop-down menu of the Content Type property and select the Content Type that contains the scope of the information you want to deliver. In this configuration it is the Database Export 01 Document Type.
Click the Export Mappings property then its ellipsis button to open the Export Mapping Collection Editor window. Click the Add button to add an entry to the List View below. Click the item added to the List View to reveal its properties. Select the Database Table property and in its drop-down menu select the Database Table object (the imported table reference that is a child of the Data Connection ojbect that was previously established) that represents the scope of the data you wish to send. In this case it is Database_Export_01_Earnings". Select the Data Element Scope property and in its drop-down menu select the object that represents the scope to which you wish the map to the previously selected table. In this case it is the Earnings Data Table. Selecting this Data Table as the scope will not only allow mapping to its Data Columns, but all Data Fields up its logical heirarchical tree path.
Select the Columns Mappings property then its ellipsis button to open the Column Mappings window. The column on the left represents the data elements of the scope selected, in this case the Earnings Data Table and its inherited Data Fields. The column on the right represents the columns of information from the Database Table object created and imported previously. Because the table in the database was created by Grooper based on this schema, the column names should match. One by one (or use the Auto Map... object command) match these columns up.

The Database Export activity that was just established is specific to the table configured because it is contained within the narrow bounds of that particular Content Type. Two other tables need to be mapped however. To do this, because they exist within the bounds of a different Content Type, another Database Export activity needs to be established.
The good news is that the remaining two tables are both relative to the same Content Type. Considering this, both tales can be mapped within on Database Export activity. The process for doing so is exactly like what was just done, with the only difference being you can add another entry to the Export Mapping Collection Editor window.
This concept is novel and is extrememly useful as you work with more and more complex data structures.

Back to top to continue to next tab

Processing and Viewing Output Data

With everything configured, you can now execute the activities and view the results.

Make sure the Batch Process is expanded in the node tree, and select the Batch Process Step object. Selecting the Batch Process Step object will give access to the Unattended Activity Tester tab. Select a Batch.
Click the Process All button to bring up the Process All window. In the Process All window you can set the amount of threads you want to apply to the processing of this activity now. Note that this setting is temporary for running this particular activity this one time. Click the Start button to run the activity. Because two Batch Process Steps were made to accommodate the two Content Types, this process should be repeated for the other Database Export activity on the other Batch.

With the activities successfully run, over in SQL Management Studio the output can be seen.

@@ Line 102: / Line 102: @@
 |}
 </tab>
-<tab name="Index Data" style="margin:20px">
+<tab name="Verifying Index Data" style="margin:20px">
-===Index Data===
+===Verifying Index Data===
 Before the '''Database Export''' activity can send data, it must have data!
-It's easy to get in the habit of testing extraction on a '''Data Field''' or a '''Data Model''' and feel good about the results, but it must be understood that the information displayed when doing so is in memory, or temporary. When the '''Extract''' activity runs, it executes all extraction logic for the '''Data Model''' tied to a '''Batch Folder's''' classified '''Document Type.''' For each '''Batch Folder''' document, it creates "Index Data" and marries it to the '''Batch Folder''' via a '''[https://en.wikipedia.org/wiki/JSON JSON]''' file called ''Grooper.DocumentData.json''.
+It's easy to get in the habit of testing extraction on a '''Data Field''' or a '''Data Model''' and feel good about the results, but it must be understood that the information displayed when doing so is in memory, or temporary. When testing a '''''Data Export''''' configuration, it's a good idea to ensure extracted data is actually present for document '''Batch Folders''' whose data you want to export.
+When the '''Extract''' activity runs, it executes all extraction logic for the '''Data Model''' tied to a '''Batch Folder's''' classified '''Document Type.''' For each '''Batch Folder''' document, it creates "Index Data" and marries it to the '''Batch Folder''' via a '''[https://en.wikipedia.org/wiki/JSON JSON]''' file called ''Grooper.DocumentData.json''.
 A couple of ways to verify its existence are as follows: