2021:Export (Activity): Difference between revisions

From Grooper Wiki
Line 263: Line 263:
<tab name="FTP Export" style="margin:20px">
<tab name="FTP Export" style="margin:20px">
=== FTP Export ===
=== FTP Export ===
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
For ''FTP Export'', document content is exported to an FTP site using the FTP protocol.


# The '''''FTP Server URL''''' property defines what site you want to export to.
# The '''''User Name''''' and '''''Password''''' properties define the logon information to connect to the FTP site.
{|cellpadding="10" cellspacing="5"
|-style="background-color:#f89420; color:white"
|style="font-size:22pt"|'''&#9888;'''||The ''FTP Export'' '''''Export Type''''' is a carry over from the older methods of exporting to FTP sites.  ''FTP Export'' exists mostly for backwards compatibility, but it can still be utilized for simple exports to FTP folders.
In current versions, using the ''FTP'' '''''CMIS Binding''''' and the ''CMIS Export'' '''''Export Type''''' is a preferable method to export document content to an FTP site.
|}
|
[[File:Export-export-definitions-5.png]]
|}
</tab>
</tab>
<tab name="SFTP Export" style="margin:20px">
<tab name="SFTP Export" style="margin:20px">
=== SFTP Export ===
=== SFTP Export ===
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
For ''SFTP Export'', document content is exported to an SFTP site using the SFTP protocol.
# The '''''Host Name''''' property defines what site you want to export to.
# The '''''User Name''''' and '''''Password''''' properties define the logon information to connect to the SFTP site.
{|cellpadding="10" cellspacing="5"
|-style="background-color:#f89420; color:white"
|style="font-size:22pt"|'''&#9888;'''||The ''SFTP Export'' '''''Export Type''''' is a carry over from the older methods of exporting to SFTP sites.  ''SFTP Export'' exists mostly for backwards compatibility, but it can still be utilized for simple exports to SFTP folders.
In current versions, using the ''SFTP'' '''''CMIS Binding''''' and the ''CMIS Export'' '''''Export Type''''' is a preferable method to export document content to an SFTP site.
|}
|
[[File:Export-export-definitions-6.png]]
|}


</tab>
</tab>

Revision as of 13:59, 27 September 2021

The Export activity exports processed document content to an external storage platform.

Export is an Unattended Activity, typically added as of of the last steps (if not the last step) of a Batch Process. It allows Grooper users to deliver processed Batch content to an external system. Whether exporting Batch Folders as PDF files to a Windows folder, exporting extracted Data Model fields to a SQL database, exporting to a content management system, or some combination of multiple exports to multiple systems, the Export activity handles how document Batch Folders in a Batch ultimately leave Grooper after they have been classified and had their data extracted.

How documents are exported (what gets exported, where they go, and what format the exported content takes) is all controlled by Export Behaviors. This is a set of properties configured to control how Batch Folder content is exported based on its Document Type classification. Export Behaviors can be configured locally, configured as part of the Export activity's property configuration, or can be configured for a particular Content Type, by configuring the Behaviors property of a Content Model and/or its descendant Content Categories or Document Types.

About

So you've ingested some documents into a Batch. You've obtained their full text data with the Recognize activity, either through OCR or extracting their native embedded text. You've classified these documents, assigning the Batch Folders a Document Type from a Content Model during the Classify activity. You've collected the data you want from these documents during the Extract activity. Now what?

You need to get these documents and that data out of Grooper!

Enter the Export activity. Grooper is designed to be a document processing platform. It is a powerful tool to model document sets and their data (according to a Content Model) and put unprocessed pages or files through a step by step list of processing instructions (according to a Batch Process) to ultimately organize them and collect information from them. However, Grooper is not designed to be a content management system or a storage platform. Once your documents are organized and Grooper has extracted the data you want from them, you generally want to put those files and data in an external endpoint, such as a file system, a database, a content management system or some combination thereof.

The Export activity's job is to get document content out of Grooper, according to your specifications. Using one or more Export Behavior configurations, you can control how processed document content is exported, how its indexed in which storage location, what data goes where, what file format certain content should take, and more.

FYI How you export documents in Grooper underwent some serious changes in version 2021. In previous versions, there were two separate export activities: Document Export and Database Export.

To simplify things, we combined these two Activities into the singular Export activity. Whether you're exporting document files or data to a database, you use the Export activity and Export Behavior configurations in either case.

Just What Is "Document Content"?

We're going to talk a lot about "document content" throughout this article. Ultimately, the Export activity controls what content is exported and how it is exported. So, what do we mean by "document content"?

In terms of its content, you can break up a document processed by Grooper into (at least) three meaningful components:

  1. The document's image
  2. The document's full text
  3. The document's extracted data

Each of these different kinds of content is another layer that comprises a whole document. Grooper's job is to take source material (scanned pages or imported files), derive the content you desire (such as extracting Data Elements from a Data Model), and using the Export activity recombine this content into derivable files or data to one or more storage endpoints.

Image Content

The document's image is simply what the viewer physically sees when viewing the document. Whether scanned pages or a digital file, like a PDF, this content comprises the pixels on the screen you're looking at when reading a document. This content can be altered in a Batch Process by the Image Processing activity, which is a typical part of processing scanned documents to clean up the image before OCR. Upon Export, Grooper can build a new file from these images, or just export whatever image content was originally imported.

Full Text Content

A good deal of document processing automation requires machine readable text to parse words, phrases and other text data. Grooper obtains a document's full text data through the Recognize activity, OCRing images or extracting embedded digital text. These results can then be embedded into the exported file as another part of its content during Export.

Extracted Data Content

Last but not least, the Extract activity in a Batch Process will collect information from the document, according to its classified Document Type and Data Model. This may be simple indexing data, even just the Document Type assigned during the Classify activity. This may be every meaningful data point on the document, obtained from a Data Model with hundreds of extracted Data Elements. Regardless, this needs to be stored somewhere and somehow, such as in a SQL database, content management system, or as a separate data file, like an XML or CSV file.

How you merge this content into new files, define what storage platform it goes to, and how extracted data can drive indexing considerations is all controlled by the Export activity's Export Behavior configuration.

Export Behaviors

The Export activity exports documents according to an Export Behavior. This is a set of export property configurations based on the Content Type (i.e. Document Type of a Content Model) assigned to a document Batch Folder during document classification. Once a Batch Folder is assigned a Document Type, you have something you can point to that controls the flow of traffic out of Grooper.

For documents "A", build a PDF file and put them in folder "A" in a file system, for example. For documents "B", put them in folder "B" and export their data to a database while you're at it. For document "C", you might do something entirely different. Or, you might perform essentially the same export for all Document Types in a Content Model. Export Behavior configurations are how you tell Grooper what to do for one Document Type or another upon export.

Export Behaviors can be configured for any Content Type object. This includes a parent Content Model or any of its descendant Document Types or Content Categories.

This allows you to use the Content Model's hierarchy to determine how you want to export documents of a certain Document Type.

  • If you want to perform the same, generic export for all Document Types in a Content Model, you can configure a single Export Behavior solely for the Content Model applying to all its child Document Types.
  • If a group of Document Types under a single Content Category all should be exported in the same manner, you can configure an Export Behavior for the Content Category. Those settings will apply to any of its child Document Types.
  • If every Document Type or certain Document Types have their own specific export configuration, you can configure individual Export Behaviors for one or more Document Types (or all of them!).

Export Behaviors can be configured in one of two ways:

  1. Using the Behaviors property of a Content Type object
    • A Content Model
    • A Content Category
    • Or, a Document Type
  2. As part of the Export activity's property configuration

In either case, export settings are added as one or more Export Definitions of the Export Behavior. Once a document is classified and it is assigned a Document Type its Export Behavior's configured Export Definition(s) will define how the document content is exported. The main difference is how you get to the Export Behavior property.

Content Type Export Behaviors

An Export Behavior configuration can be added to any Content Type object (i.e. Content Models, Content Categories, and Document Types) using its Behaviors property. Doing so will control how a Document Type "behaves" upon export.

  1. For example, here we have a Content Model selected in the Node Tree.
  2. To add an Export Behavior, first select the Behaviors property.
  3. Then, press the ellipsis button at the end of the property.

  1. This will bring up the Behaviors collection editor window.
  2. Press the "Add" button.
  3. Select Export Behavior.
    • You can only configure one Export Behavior per Content Type object.
    • Children Content Type objects will inherit export settings from their parent Content Type's Export Behavior configuration.
    • However, multiple Export Behaviors may be added by configuring the Behaviors property of multiple Content Types. For example, if every Document Type needed a unique Export Behavior configuration, you could configure the Behaviors property for each one, adding one Export Behavior to the Behaviors list for each one.

  1. You will see the Export Behavior added to the Behaviors list.
  2. Selecting it, you can now add one or more Export Definitions with the Export Definitions property.

When configured using the Behaviors property of a Content Type object, the Export activity will export Batch Folder content in a Batch according to the Export Definition settings configured for the Batch Folder's assigned Document Type

  • Or its parent Content Category or parent Content Model depending on which Content Type's Behavior property is configured in the Content Model's hierarchy.

Export Activity Export Behaviors

Export Behaviors can also be configured as part of the Export activity's configuration. These are called "local" Export Behaviors. They are local to the Export activity in the Batch Process.

  1. For example, here we have a working Batch Process selected in the Node Tree.
  2. And we have the Export step of the Batch Process selected.
  3. To add an Export Behavior, select the Export Behaviors property.
  4. Then, press the ellipsis button at the end of the property.

  1. This will bring up the Export Behaviors collection editor window.
  2. Press the "Add" button to add a new Export Behavior
  3. An Export Behavior will be added to the list.
  4. With the Export Behavior selected you must define which Content Type the behavior applies to using the Content Type property.
    • Note in both cases, a Content Type is involved in configuring Export Behaviors. Whether local to the Export activity or as part of a Content Model's configuration, Grooper needs to know what to do upon export, given a certain Content Type (and its children Content Types if scoped to a Content Model or Content Category). Once Grooper knows what kind of document it's looking at, we can then inform it what to do in terms of exporting its document content.
  5. Using the dropdown menu, select which Content Type scope should utilize the Export Behavior by selecting either a top-level parent Content Model or one of its child Content Categories or Document Types.
    • Keep in mind you can only select a single Content Type here. You can only configure one Export Behavior per Content Type object.
    • Children Content Type objects will inherit export settings from their parent Content Type's Export Behavior configuration.
  6. However, multiple Export Behaviors may be added locally to the Export activity. For example, if every Document Type needed a unique Export Behavior configuration, you could add one Export Behavior to the list for each one.

  1. Once a Content Type is selected, you can add one more more Export Definitions with the Export Definitions property.

Export Definitions

Regardless of whether the Export Behavior is set up directly on the Content Type object or with the Export activity's local property grid, how document content is exported is defined using one or more Export Definitions.

Export Definitions functionally determine three things:

  1. Location - Where the document content ends up upon export. In other words, the storage platform you're exporting to.
  2. Content - What document content is exported: image content, full text content, and/or extracted data content.
  3. Format - What format the exported content takes, such as a PDF file or XML data file.

Your primary consideration is Location. Where do you want these files and/or data to end up? Are you exporting files to a Windows file system? Are you exporting data to a database? Are you exporting content to a content management system, like Box.com?

When configuring an Export Definition the first thing you will add is an Export Type. This determines what export endpoint you're using to export document content. The Export activity will deliver document content to the storage platform determined by the Export Type.

  1. To add an Export Type, press the "Add" button in the Export Definitions collection editor.
  2. This can be one of the following options:
    • CMIS Export - To export content using a CMIS Connection
    • Data Export - To export data to a SQL database or ODBC compliant database
    • File Export - To export files to a Windows file system
    • FTP Export - To export files to an FTP server
    • IMAP Export - To export files to an IMAP email server
    • SFTP Export - To export files to an SFTP server

Each Export Type defines connection to the endpoint storage location slightly differently.

CMIS Export

For CMIS Export, document content is exported over a CMIS Connection.

  1. The CMIS Connection defines the connection settings for one of several storage platforms available as CMIS Bindings. CMIS Repositories are storage locations imported into Grooper as children of the CMIS Connection object. The CMIS Repository property here establishes connection to a storage platform connected to Grooper via the CMIS Connection.
    • Depending on the CMIS Binding the CMIS Repository will represent a different storage location. This could be a Windows file system folder for the NTFS binding. This could be a SharePoint site for the SharePoint binding. This could be an email inbox for the Exchange or IMAP bindings.

For more information, please visit the CMIS Repository and CMIS Export articles.

Data Export

File Export

For File Export, document content is exported to a Windows file system folder.

  1. The Target Folder property defines what folder you want to export content to. It is always best practice to use a fully qualified UNC path, to disambiguate file and folder locations on one networked machine from another.


The File Export Export Type is a carry over from the older methods of exporting to a Windows file system in previous versions. File Export exists mostly for backwards compatibility, but it can still be utilized for simple file system exports.

In current versions, using the NTFS CMIS Binding and the CMIS Export Export Type is a preferable method to export document content to a Windows file system.

IMAP Export

For IMAP Export document content is exported to email servers using the IMAP protocol.

  1. The Mail Server property defines the host name (or IP address) of the email server you want to export to.
    • For example, the server used to connect to an Outlook 365 inbox is "outlook.office365.com"
  2. The User Name and Password properties define the logon information to connect to the mailbox.
  3. The Target Folder property defines what email folder you want to export content to.


The IMAP Export Export Type is a carry over from the older methods of exporting across the IMAP protocol. IMAP Export exists mostly for backwards compatibility, but it can still be utilized for simple exports to email boxes.

In current versions, using the IMAP CMIS Binding and the CMIS Export Export Type is a preferable method to export document content to an IMAP server.

Furthermore, when connecting to a Microsoft Outlook inbox, the Exchange CMIS Binding is preferable to the IMAP binding. The Exchange binding has increased functionality specifically designed for the Outlook messaging system.

FTP Export

For FTP Export, document content is exported to an FTP site using the FTP protocol.

  1. The FTP Server URL property defines what site you want to export to.
  2. The User Name and Password properties define the logon information to connect to the FTP site.


The FTP Export Export Type is a carry over from the older methods of exporting to FTP sites. FTP Export exists mostly for backwards compatibility, but it can still be utilized for simple exports to FTP folders.

In current versions, using the FTP CMIS Binding and the CMIS Export Export Type is a preferable method to export document content to an FTP site.

SFTP Export

For SFTP Export, document content is exported to an SFTP site using the SFTP protocol.

  1. The Host Name property defines what site you want to export to.
  2. The User Name and Password properties define the logon information to connect to the SFTP site.


The SFTP Export Export Type is a carry over from the older methods of exporting to SFTP sites. SFTP Export exists mostly for backwards compatibility, but it can still be utilized for simple exports to SFTP folders.

In current versions, using the SFTP CMIS Binding and the CMIS Export Export Type is a preferable method to export document content to an SFTP site.


Shared Behavior Modes

Navigating Data Model Hierarchy Upon Export

Thread Pool Guidance