2021:Data Export (Export Definition): Difference between revisions
Dgreenwood (talk | contribs) |
Dgreenwood (talk | contribs) |
||
| Line 539: | Line 539: | ||
#* We will start by configuring an '''''Export Behavior''''' for the "Employee Report" '''Document Type'''. | #* We will start by configuring an '''''Export Behavior''''' for the "Employee Report" '''Document Type'''. | ||
# To add an '''''Export Behavior''''', first select the '''''Behaviors''''' property. | # To add an '''''Export Behavior''''', first select the '''''Behaviors''''' property. | ||
# Then, press the ellipsis button at the end of the property. | # Then,m press the ellipsis button at the end of the property. | ||
| | | | ||
[[File:Data-export-how-to-export-behaviors-01.png]] | |||
|- | |- | ||
|valign=top| | |valign=top| | ||
| Line 551: | Line 551: | ||
#** For example, our two '''Document Types''' need different '''''Export Behavior''''' configurations. We would ''''not''''' want to configure their parent '''Content Model's''' '''''Export Behavior'''''. That would apply that single export configuration to ''all'' '''Document Types'''. That's not going to work for us. The "Employee Report" documents' data need to go to one location and the "Personnel Info Report" documents' data need to go somewhere entirely different. Instead, we will end up configuring the '''''Behaviors''''' property of both '''Document Types''' individually. Thus, we end up with two '''''Export Behavior''''' configurations for the '''Content Model'''. | #** For example, our two '''Document Types''' need different '''''Export Behavior''''' configurations. We would ''''not''''' want to configure their parent '''Content Model's''' '''''Export Behavior'''''. That would apply that single export configuration to ''all'' '''Document Types'''. That's not going to work for us. The "Employee Report" documents' data need to go to one location and the "Personnel Info Report" documents' data need to go somewhere entirely different. Instead, we will end up configuring the '''''Behaviors''''' property of both '''Document Types''' individually. Thus, we end up with two '''''Export Behavior''''' configurations for the '''Content Model'''. | ||
| | | | ||
[[File:Data-export-how-to-export-behaviors-02.png]] | |||
|- | |- | ||
|valign=top| | |valign=top| | ||
| Line 558: | Line 558: | ||
#* Next, we will add a '''''Data Export''''' definition. | #* Next, we will add a '''''Data Export''''' definition. | ||
| | | | ||
[[File:Data-export-how-to-export-behaviors-03.png]] | |||
|} | |} | ||
| Line 573: | Line 573: | ||
# Then, press the ellipsis button at the end of the property. | # Then, press the ellipsis button at the end of the property. | ||
| | | | ||
[[File:Data-export-how-to-export-behaviors-04.png]] | |||
|- | |- | ||
|valign=top| | |valign=top| | ||
| Line 587: | Line 587: | ||
#** For example, since both our '''Document Type''' needed a unique '''''Export Behavior''''' configuration, we would add one '''''Export Behavior''''' to the list for each one. | #** For example, since both our '''Document Type''' needed a unique '''''Export Behavior''''' configuration, we would add one '''''Export Behavior''''' to the list for each one. | ||
|valign=top| | |valign=top| | ||
[[File:Data-export-how-to-export-behaviors-05.png]] | |||
|- | |- | ||
|valign=top| | |valign=top| | ||
# Here, we have selected the "Employee Report" '''Document Type'''. This '''''Export Behavior''''' would then only apply to '''Batch Folders''' of this '''Document Type'''. | |||
# Once a '''Content Type''' is selected, you can add one more more '''''Export Definitions''''' with the '''''Export Definitions''''' property. | # Once a '''Content Type''' is selected, you can add one more more '''''Export Definitions''''' with the '''''Export Definitions''''' property. | ||
#* We will discuss adding a '''''Data Export''''' definition next. | |||
| | | | ||
[[File:Data-export-how-to-export-behaviors-06.png]] | |||
|} | |||
</tab> | </tab> | ||
<tab name="Add an Export Definition" style="margin:20px"> | <tab name="Add an Export Definition" style="margin:20px"> | ||
| Line 602: | Line 605: | ||
# We will add the ''''Export Behavior''''' to its set of '''''Behaviors''''' properties. | # We will add the ''''Export Behavior''''' to its set of '''''Behaviors''''' properties. | ||
| | | | ||
[[File:Data-export-how-to-export-behaviors-07.png]] | |||
|- | |- | ||
|valign=top| | |valign=top| | ||
| Line 610: | Line 613: | ||
# To add an '''''Export Definition''''', press the ellipsis button at the end of the property. | # To add an '''''Export Definition''''', press the ellipsis button at the end of the property. | ||
| | | | ||
[[File:Data-export-how-to-export-behaviors-08.png]] | |||
|- | |- | ||
|valign=top| | |valign=top| | ||
# This will bring up an '''''Export Definition''''' list editor to add one or more '''''Export Types'''''. | # This will bring up an '''''Export Definition''''' list editor to add one or more '''''Export Types'''''. | ||
#* Next, we will add a '''''Data Export''''' definition to the list. | |||
| | | | ||
[[File:Data-export-how-to-export-behaviors-09.png]] | |||
|} | |} | ||
</tab> | </tab> | ||
| Line 634: | Line 638: | ||
# Choose ''Data Export'' from the list. | # Choose ''Data Export'' from the list. | ||
| | | | ||
[[File:Data-export-how-to-export-behaviors-10.png]] | |||
|- | |- | ||
|valign=top| | |valign=top| | ||
# This will add an unconfigured '''''Data Export''''' to the '''''Export Definitions''''' list. | # This will add an unconfigured '''''Data Export''''' to the '''''Export Definitions''''' list. | ||
... | # For all '''''Data Export''''' configurations, the first step is configuring the '''''Connection''''' property. | ||
# Use the dropdown menu to select a '''Data Connection''' from the node tree. | |||
#* This will provide Grooper with the information required to connect to the external database upon export. | |||
| | |||
[[File:Data-export-how-to-export-behaviors-11.png]] | |||
|- | |||
|valign=top| | |||
# With the '''Data Connection''' established, the next step is to map data from Grooper to a table in the database. | |||
# Next, we will review using the '''''Table Mappings''''' property to map '''Data Elements''' in a '''Data Model''' to corresponding column locations in a database table. | |||
| | | | ||
[[File:Data-export-how-to-export-behaviors-12.png]] | |||
|} | |} | ||
</tab> | |||
<tab name="Table Mappings Example 1: Flattening a Data Model" style="margin:20px"> | |||
=== Table Mappings Example 1: Flattening a Data Model === | |||
</tab> | |||
<tab name="Table Mappings Example 2: Exporting to Multiple Database Tables" style="margin:20px"> | |||
=== Table Mappings Example 2: Exporting to Multiple Database Tables === | |||
</tab> | </tab> | ||
</tabs> | </tabs> | ||
Revision as of 10:37, 8 October 2021
| WIP | This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.
This tag will be removed upon draft completion. |
Data Export is one of the Export Types available when configuring an Export Behavior. It exports extracted document data over a Data Connection, allowing users to export data to a SQL or ODBC compliant database.
About
|
You may download and import the file below into your own Grooper environment (version 2021). This contains Batches with the example document(s) and a Content Model discussed in this article
|
About
The most important goal of Grooper is to deliver accurate data to line of business systems that allow the information to be integrated into impactful business decisioning. Tables in databases remain, to this day, one of the main vessels by which this information is stored. Data Export is one of the main ways to deliver data collected in Grooper.
There are three important things to understand when using and configuring Data Export to export data to a database:
- The Export activity.
- Data Elements
- Data Connections
The Export Activity
Grooper's Export activity is the mechanism by which Grooper-processed document content is delivered to an external storage platform. Export configurations are defined by adding Export Type definitions to Export Behaviors. Data Export is the Export Type designed to export Batch Folder document data collected by the Extract activity to a Microsoft SQL Server or ODBC-compliant database server.
For more information on configuring Export Behaviors, please visit the full Export activity article.
Data Elements
Data Export is the chief delivery device for "collection" elements. Data is collected in Grooper by executing the Extract activity, extracting values from a Batch Folder according to its classified Document Type's Data Model.
A Data Model in Grooper is a digital representation of document data targeted for extraction, defining the data structure for a Content Type in a Content Model. Data Models are objects comprised of Data Element objects, including:
- Data Fields used to target single field values on a document.
- Data Tables and their child Data Columns used to target tabular data on a document.
- Data Sections used to divide a document into sections to simplify extraction logic and/or target repeating sections of extractable Data Elements on a single document.
With Data Models and their child Data Elements configured, Grooper collects values using the Extract activity.
Depending on the Content Type hierarchy in a Content Model and/or Data Element hierarchy in a Data Model, there will be a collection, or "set", of values for varying data scope of a fully extracted Data Model's hierarchy. That may be the full data scope of the Data Model, including any inherited Data Elements inherited from parent Data Models. It may be a narrower scope of Data Elements like a child Data Section comprised of its own child Data Fields.
Understanding this will be important as Data Export has the ability to take full advantage of Grooper's hierarchical data modeling to flatten complex and inherited data structures. Understanding Data Element hierarchy and scope will also be critical when exporting data from a single document to multiple different database tables to ensure the right data exports to the right places.
Data Connections
Data Export uses a configured Data Connection object to establish a link to SQL or ODBC compliant database tables in a database and intelligently populate said tables. Once this connection is established, collected Data Elements can be mapped to corresponding column locations in one or multiple database tables. Much of Data Export's configuration is assigning these data mappings. The Data Connection presents these mappable data endpoints to Grooper as well as allowing data content to flow from Grooper to the database table when the Export activity processes each Batch Folder in a Batch.
Furthermore, not only can Grooper connect to existing databases using a Data Connection, but it can create whole new databases as well as database tables once a connection to the database server is established.
We discuss how to create Data Connections, add a new database from a Data Connection, and add a new database table from a Data Connection in the #Configuring a Data Connection tutorial below.
How To
Understanding the Forms
Document 1: Employee ReportThe thing to understand about this document is some of its data share a "one-to-many" relationship. Some of the data is described as "single instance" data. These are individual fields like "Employee Last Name", "Employee First Name" and "Employee ID". For each document, there is only one value for each of these fields. These values are only listed once, and hence only collected once during extraction. Some of the data, however, is described as "multi-instance" data. The "Earnings" table displays a dynamic amount of rows, for which there may be a varying number of data for its columns ("Code Desc", "MTD", "QTD", "YTD") depending on how many rows are in the table. There are multiple instances of the "YTD" value for the whole table (and therefore the whole document). The single instance data, as a result of only being listed once on the document, will only be collected once, but needs to be married to each row of information from the table, in one way or another. The "one" "Employee ID" value, for example, pertains to the "many" different table rows. This document is meant to show how to flatten data structures. While the single instance data is only collected once, it will be reported many times upon exporting to a database table. |
|||
Document 2: Personnel Information ReportThe second document is essentially one big table of personnel information (name, address, email, phone number and the like). While we ultimately want to collect data from all rows in this table, there are potentially two sets of information here. Some of it is generic personnel information, but some of it is "personally identifiable information" or PII. This information should be protected for legal reasons. As a result, we will export collected data to two database tables (with the assumption that the second table is "protected".) This document is meant to demonstrate how to export to multiple tables via one Export Behavior.
|
Understanding the Content Model
The Content Model provided for this tutorial is named "Example Model - Data Export". This Content Model is designed to extract the data for these two different kinds of documents, each represented by its own Document Type.
The Employee Report Document Type
|
|
The Personnel Info Report Document Type
|
Verifying Index Data
Before the Database Export activity can send data, it must have data!
It's easy to get in the habit of testing extraction on a Data Field or a Data Model and feel good about the results, but it must be understood that the information displayed when doing so is in memory, or temporary. When testing a Data Export configuration, it's a good idea to ensure extracted data is actually present for document Batch Folders whose data you want to export.
When the Extract activity runs, it executes all extraction logic for the Data Model tied to a Batch Folder's classified Document Type. For each Batch Folder document, it creates "Index Data" and marries it to the Batch Folder via a JSON file called Grooper.DocumentData.json.
A couple of ways to verify its existence are as follows:
Option 1
|
|||
|
Option 2Another means of verifying is to actually view the file created by the Extract activity and stored in the Grooper repository's file store location.
|
Configuring a Data Connection
In order for the Data Export to run, it first needs an established connection to a database and subsequent table(s).
Grooper can connect to an existing database and import references to its tables using a Data Connection object. Once connected to the database server, you can even have Grooper create a database AND create tables based on Data Model structures present in Grooper!
In the following tutorial we will cover how to:
- Create a new Data Connection object.
- Connect to a database server using the Data Connection.
- Create a new database from the Data Connection.
- Create a database table using the Data Models in our example Content Model.
It is worth noting that this article cannot tell you specifics about permissions in your own environment. The configuration for this article uses Microsoft SQL Server and has given the active Active Directory user full DB Admin privileges to the SQL environment.
Create a Data Connection
|
New Data Connections are added to the Global Resources folder of the Node Tree.
|
|
|
Configure Connection Settings
|
Regardless whether you want to connect to an existing database or create a new one from Grooper, your first step is always the same. You must first connect to a database server. Grooper can connect to Microsoft SQL servers or any ODBC (Open Database Connectivity) compliant data source. For the purposes of this tutorial, we will connect to a SQL server.
|
|
|
Next, we need to define settings to access the database server. All you really need to do this is the server's name and access rights.
|
|
|
That's it! You're officially connected to the database server now. We can now connect to existing databases, import references to their tables (Keep this in the back of your mind. This will be important later.), create new databases, and new database tables.
|
At this point, you have two options:
- Connect to an existing database
- Create a new database
|
If you wanted to connect to an existing database, it's very easy.
|
However, now that we're connected to the database, you can also create a brand new database!
Create a New Database from the Data Connection
|
|||
|
|||
|
Create Database Tables from the Data Connection
We will end up creating three database tables by the end of this section:
- A table for the "Employee Report" Document Type's extracted Data Elements with its "one-to-many" related data elements flattened to a single table structure.
- A table for the "Personnel Info Report" Document Type's extracted "non-PII" Data Table.
- A table for the "Personnel Info Report" Document Type's extracted "PII" Data Table.
Table 1: Employee Report Data
|
|
|
|
|
|
|
|
|
Table 2: Personnel Info Report non-PII Data
For the other two tables, it's mostly a repeat of the same steps, just taking care to select the appropriate Content Type and Data Element scope.
|
|
|
|
|
Table 3: Personnel Info Report PII Data
|
This database table can be created with the exact same steps as described above with just one key difference:
|
|
|
Import Table References
Before we can export data to these newly created tables, we must import their table references. This part is critical in order for Grooper to interact with a database table, whether to export data using Data Export or perform a Database Lookup operation. Importing the table references will give Grooper an object it can reference when mapping data between Grooper and the database table, ultimately allowing for data to flow from extracted Batch Folders to the database table.
|
Importing a table reference is as simple as a click of a button.
|
|
|
|
|
Configuring Export Behaviors for Data Export
Data Export is one of the Export Type options when configuring an Export Behavior. Export Behaviors control what document content for a Batch Folder is exported where, according to its classified Document Type. As such, in order to configure a Data Export, you must first configure an Export Behavior for a Content Type (a Content Model or its child Content Categories or Document Types).
In our case, we want to perform two different kinds of export, depending on the document Batch Folder's classified Document Type.
- For the "Employee Report" Document Type, we want to export its collected Data Elements to our first database table.
- For the "Personnel Info Report" Document Type, we want to export its collected Data Elements (which are collected using an entirely different Data Model and have an entirely different data structure) to our second and third database table.
The basic idea behind Export Behaviors is, based on kind of document you're looking at, you can tell Grooper how you want to export it.
Export Behaviors can be configured in one of two ways:
- Using the Behaviors property of a Content Type object
- A Content Model
- A Content Category
- Or, a Document Type
- As part of the Export activity's property configuration
When the Export activity processes each Batch Folder it will execute the Export Behaviors, according to their configuration settings.
| FYI | In general, users will choose to configure Export Behaviors either on the Content Type object it applies to or local to the Export activity step in a Batch Process.
This may just boil down to personal preference. There is no functional difference between an Export Behavior configured on a Content Type or an Export Behavior configured on an Export Step, upon completing their configuration. In either case, they will accomplish the same goal. However, it is possible to configure Export Behaviors, in both locations. If you do this, you will need to understand the Export activity's Shared Behavior Mode property options. This will effect if and how two Export Behaviors configured for the same Content Type will execute. Please visit the Export article for more information. |
Add an Export Behavior
Option 1: Content Type Export Behaviors
|
An Export Behavior configuration can be added to any Content Type object (i.e. Content Models, Content Categories, and Document Types) using its Behaviors property. Doing so will control how a Document Type "behaves" upon export.
|
|
|
|
|
Option 2: Export Activity Export Behaviors
|
Export Behaviors can also be configured as part of the Export activity's configuration. These are called "local" Export Behaviors. They are local to the Export activity step in the Batch Process.
|
|
|
|
|
Add an Export Definition
|
|
|
Regardless if you choose to configure the Export Behavior on a Content Type object, or if you configure it local to to Export activity's configuration, your next step is adding an Export Definition.
|
|
|
Add a Data Export
Export Definitions functionally determine three things:
- Location - Where the document content ends up upon export. In other words, the storage platform you're exporting to.
- Content - What document content is exported: image content, full text content, and/or extracted data content.
- Format - What format the exported content takes, such as a PDF file or XML data file.
|
Export Definitions do this by adding one or more Export Type configurations to the definition list. The Export Type you choose determines how you want to export content to which platform. In our case, we want to use a Data Connection to export extracted document data ("Content") to a database table ("Location" and "Format"). We will add a Data Export to the definition list.
|
|
|
|
|
Table Mappings Example 1: Flattening a Data Model
Table Mappings Example 2: Exporting to Multiple Database Tables










































