Export (Activity): Difference between revisions

From Grooper Wiki
 
(17 intermediate revisions by the same user not shown)
Line 11: Line 11:
[[File:Asset 22@4x.png]]
[[File:Asset 22@4x.png]]
|
|
You may download the ZIP(s) below and upload it into your own Grooper environment (version 2024). The first contains one or more '''Batches''' of sample documents.  The second contains one or more '''Projects''' with resources used in examples throughout this article.
You may download the ZIP(s) below and upload it into your own Grooper environment (version 2025). The first contains one or more '''Batches''' of sample documents.  The second contains one or more '''Projects''' with resources used in examples throughout this article.
* [[Media:2025_Wiki_Export_Batch.zip]]
* [[Media:2025 Wiki Export Batch.zip]]
* [[Media:2025_Wiki_Export_Project.zip]]
* [[Media:2025 Wiki Export Project.zip]]
|}
|}


Line 39: Line 39:
In terms of its content, you can break up a document processed by Grooper into (at least) three meaningful components:
In terms of its content, you can break up a document processed by Grooper into (at least) three meaningful components:


# The document's image
# The document's image (This may be made up of its child Batch Pages' images)
# The document's full text
# The document's full text (This may be made up of its child Batch Pages' full text)
# The document's extracted data
# The document's extracted data (Its Data Model values collected by the Extract activity)


Each of these different kinds of content is another layer that comprises a whole document (represented as a '''Batch Folder''' in a '''Batch''', its child '''Batch Pages''' and/or files attached to the '''Batch Folder'''). Grooper's job is to take source material (scanned pages or imported files), derive the content you desire (such as extracting '''Data Elements''' from a '''Data Model'''), and using the '''Export''' activity recombine this content into derivable files or data to one or more storage endpoints.
Each of these different kinds of content is another layer that comprises a whole document (represented as a '''Batch Folder''' in a '''Batch''', its child '''Batch Pages''' and/or files attached to the '''Batch Folder'''). Grooper's job is to take source material (scanned pages or imported files), derive the content you desire (such as extracting '''Data Elements''' from a '''Data Model'''), and using the '''Export''' activity recombine this content into derivable files or data to one or more storage endpoints.
Line 82: Line 82:
</blockquote>
</blockquote>


The '''Export''' activity exports documents according to an '''''Export Behavior'''''. This is a set of export property configurations based on the '''''Content Type''''' (i.e. '''Document Type''' of a '''Content Model''') assigned to a document '''Batch Folder''' during document classification. Once a '''Batch Folder''' is assigned a '''Document Type''', you have something you can point to that controls the flow of traffic out of Grooper.
The Export activity exports documents according to an "Export Behavior". This is a set of export property configurations based on the Content Type assigned to a '''Batch Folder''' during document classification (In other words, the documents '''Document Type'''). Once a Batch Folder is assigned a Document Type, the Export Behavior controls the flow of traffic out of Grooper when Export runs.


For documents "A", build a PDF file and put them in folder "A" in a file system, for example. For documents "B", put them in folder "B" and export their data to a database while you're at it. For document "C", you might do something entirely different. Or, you might perform essentially the same export for all '''Document Types''' in a '''Content Model'''. '''''Export Behavior''''' configurations are how you tell Grooper what to do for one '''Document Type''' or another upon export.
Export Behaviors accommodate simple and complex export logic.


'''''Export Behaviors''''' can be configured for any '''Content Type''' object. This includes a parent '''Content Model''' or any of its descendant '''Document Types''' or '''Content Categories'''.
Example simple logic:
* Build a PDF for all documents. Export them to a base folder location. Use the same naming convention for all of them.
 
Example complex logic:
* For documents "A", build a PDF file and put them in folder "A" in a file system.
* For documents "B", put them in folder "B" and export their data to a database while you're at it.
* For documents "C", build a PDF and a JSON file, put them in folder "C" and follow some special naming convention.
 
 
Export Behaviors can be configured for any Content Type. This includes a parent '''Content Model''' or any of its descendant '''Document Types''' or '''Content Categories'''.
{|cellpadding=10 cellspacing=5
{|cellpadding=10 cellspacing=5
|
|
Line 96: Line 105:
|}
|}


This allows you to use the '''Content Model's''' hierarchy to determine how you want to export documents of a certain '''Document Type'''.  
This allows you to use the Content Model's hierarchy to determine how you want to export documents of a certain Document Type. Content Types will inherit an Export Behavior from a parent if they do not have one of their own configured.  
* If you want to perform the same, generic export for all '''Document Types''' in a '''Content Model''', you can configure a single '''''Export Behavior''''' solely for the '''Content Model''' applying to all its child '''Document Types'''.
* If you want to perform the same, generic export for all Document Types in a Content Model, you can configure a single Export Behavior solely for the '''Content Model'''. It will apply to all its child Document Types.
* If a group of '''Document Types''' under a single '''Content Category''' all should be exported in the same manner, you can configure an '''''Export Behavior''''' for the '''Content Category'''. Those settings will apply to any of its child '''Document Types'''.
* If a group of Document Types under a single Content Category all should be exported in the same manner, you can configure an Export Behavior for the '''Content Category'''. Those settings will apply to any of its child Document Types.
* If every '''Document Type''' or certain '''Document Types''' have their own specific export configuration, you can configure individual '''''Export Behaviors''''' for one or more '''Document Types''' (or all of them!).
* If every '''Document Type''' has their own specific export configuration, you can configure individual Export Behaviors for all of them. Or if one '''Document Type''' needs its own special export configuration, you can configure an Export Behavior just for it.
**<li class="fyi-bullet"> Content Types will inherit an Export Behavior from a parent if they do not have one of their own configured. If they do have their own configured, it will override the parent's Export Behavior.


'''''Export Behaviors''''' can be configured in one of two ways:


# Using the '''''Behaviors''''' property of a '''Content Type''' object
=== Adding Export Behaviors to Content Types===
#* A '''Content Model'''
#* A '''Content Category'''
#* Or, a '''Document Type'''
# As part of the '''Export''' activity's property configuration


In either case, export settings are added as one or more '''''Export Definitions''''' of the '''''Export Behavior'''''. Once a document is classified and it is assigned a '''Document Type''' its '''''Export Behavior's''''' configured '''''Export Definition(s)''''' will define how the document content is exported. The main difference is how you get to the '''''Export Behavior''''' property.
An Export Behavior configuration can be added to any Content Type node (Content Model, Content Category, or Document Type) using its Behaviors property. Doing so will control how a '''Document Type''' "behaves" upon export.
 
=== Content Type Export Behaviors ===
An '''''Export Behavior''''' configuration can be added to any '''Content Type''' object (i.e. '''Content Models''', '''Content Categories''', and '''Document Types''') using its '''''Behaviors''''' property. Doing so will control how a '''Document Type''' "behaves" upon export.




Line 118: Line 120:


# For example, here we have a '''Content Model''' selected in the Node Tree.
# For example, here we have a '''Content Model''' selected in the Node Tree.
# To add an '''''Export Behavior''''', first select the '''''Behaviors''''' property.
# To add an Export Behavior, first select the Behaviors property.
# Then, press the ellipsis button at the end of the property.
# Then, press the ellipsis button at the end of the property.
# This will bring up the '''''Behaviors''''' collection editor window.
# This will bring up the Behaviors collection editor window.
# Press the "Add" button.
# Press the "Add" button.
# Select ''Export Behavior''.
# Select "Export Behavior".
#* You can only configure one '''''Export Behavior''''' per '''Content Type''' object.
#* You can only configure one Export Behavior per '''Content Type''' object.
#* Children '''Content Type''' objects will inherit export settings from their parent '''Content Type's''' '''''Export Behavior''''' configuration.
#* Children '''Content Type''' objects will inherit export settings from their parent '''Content Type's''' Export Behavior configuration.
#* However, multiple '''''Export Behaviors''''' may be added by configuring the '''''Behaviors''''' property of multiple '''Content Types'''. For example, if every '''Document Type''' needed a unique '''''Export Behavior''''' configuration, you could configure the '''''Behaviors''''' property for each one, adding one '''''Export Behavior''''' to the '''''Behaviors''''' list for each one.
#* However, multiple Export Behaviors may be added by configuring the Behaviors property of multiple '''Content Types'''. For example, if every '''Document Type''' needed a unique Export Behavior configuration, you could configure the Behaviors property for each one, adding one Export Behavior to the Behaviors list for each one.
# You will see the '''''Export Behavior''''' added to the '''''Behaviors''''' list.
# You will see the Export Behavior added to the Behaviors list.
# Selecting it, you can now add one or more '''''Export Definitions''''' with the '''''Export Definitions''''' property.
# Selecting it, you can now add one or more Export Definitions with the Export Definitions property.


{|class="fyi-box"
{|class="fyi-box"
Line 133: Line 135:
'''FYI'''
'''FYI'''
|
|
When configured using the '''''Behaviors''''' property of a '''Content Type''' object, the '''Export''' activity will export '''Batch Folder''' content in a '''Batch''' according to the '''''Export Definition''''' settings configured for the '''Batch Folder's''' assigned '''Document Type'''
When configured using the Behaviors property of a '''Content Type''' object, the '''Export''' activity will export '''Batch Folder''' content in a '''Batch''' according to the Export Definition settings configured for the '''Batch Folder's''' assigned '''Document Type'''
* Or its parent '''Content Category''' or parent '''Content Model''' depending on which '''Content Type's''' '''''Behavior''''' property is configured in the '''Content Model's''' hierarchy.
* Or its parent '''Content Category''' or parent '''Content Model''' depending on which '''Content Type's''' Behavior property is configured in the '''Content Model's''' hierarchy.
|}
|}


=== Export Activity Export Behaviors ===
== Export Definitions ==
'''''Export Behaviors''''' can also be configured as part of the '''Export''' activity's configuration. These are called "local" '''''Export Behaviors'''''. They are local to the '''Export''' activity in the '''Batch Process'''.
 
 
 
[https://app.supademo.com/demo/cm7p8ckao0s21hilgjlg35rr3 Click here for an interactive walkthrough]
# For example, here we have a working '''Batch Process''' selected in the Node Tree.
#* And we have the '''Export''' step of the '''Batch Process''' selected.
# To add an '''''Export Behavior''''', select the '''''Export Behaviors''''' property.
# Then, press the ellipsis button at the end of the property.
# This will bring up the '''''Export Behaviors''''' collection editor window.
# Press the "Add" button to add a new '''''Export Behavior'''''
# An '''''Export Behavior''''' will be added to the list.
# With the '''''Export Behavior''''' selected you ''must'' define which '''Content Type''' the behavior applies to using the '''''Content Type''''' property.
#* Note in both cases, a '''Content Type''' is involved in configuring '''''Export Behaviors'''''. Whether local to the '''Export''' activity or as part of a '''Content Model's''' configuration, Grooper needs to know what to do upon export, given a certain '''Content Type''' (and its children '''Content Types''' if scoped to a '''Content Model''' or '''Content Category'''). Once Grooper knows what kind of document it's looking at, we can then inform it what to do in terms of exporting its document content.
# Using the dropdown menu, select which '''Content Type''' scope should utilize the '''''Export Behavior''''' by selecting either a top-level parent '''Content Model''' or one of its child '''Content Categories''' or '''Document Types'''.
#* Keep in mind you can only select a single '''Content Type''' here. You can only configure one '''''Export Behavior''''' per '''Content Type''' object.
#* Children '''Content Type''' objects will inherit export settings from their parent '''Content Type's''' '''''Export Behavior''''' configuration.
#* However, multiple '''''Export Behaviors''''' may be added locally to the '''Export''' activity. For example, if every '''Document Type''' needed a unique '''''Export Behavior''''' configuration, you could add one '''''Export Behavior''''' to the list for each one.
# Once a '''Content Type''' is selected, you can add one more more '''''Export Definitions''''' with the '''''Export Definitions''''' property.
 
=== Export Definitions ===
<blockquote>
<blockquote>
{{#lst:Glossary|Export Definition}}
{{#lst:Glossary|Export Definition}}
</blockquote>
</blockquote>


How document content is exported is defined using one or more '''''Export Definitions''''''''''Export Definitions''''' functionally determine three things:
How document content is exported is defined using one or more Export Definitions.  Export Definitions functionally determine three things:


# '''Location''' - Where the document content ends up upon export. In other words, the storage platform you're exporting to.
# '''Destination''' - Where the document content ends up upon export. In other words, the storage platform you're exporting to.
# '''Content''' - What document content is exported:  attached file content, image content, full text content, and/or extracted data content.
# '''Content''' - What document content is exported:  attached file content, image content, full text content, and/or extracted data content.
# '''Format''' - What format the exported content takes, such as a PDF file or XML data file.
# '''Format''' - What format the exported content takes, such as a PDF file or XML data file.




Your primary consideration is '''Location'''. Where do you want these files and/or data to end up?  Are you exporting files to a Windows file system?  Are you exporting data to a database?  Are you exporting content to a content management system, like Box.com?
Your primary consideration is "destination". Where do you want these files and/or data to end up?  Are you exporting files to a Windows file system?  Are you exporting data to a database?  Are you exporting content to a content management system, like Box.com?


When configuring an '''''Export Definition''''' the first thing you will add is an '''''Export Definition Type''''' (or "'''''Export Type'''''"). This determines what export endpoint you're using to export document content. The '''Export''' activity will deliver document content to the storage platform determined by the '''''Export Definition'''.
When configuring an Export Definition the first thing you will add is an Export Definition Type (or "Export Type"). This determines what export endpoint you're using to export document content. The '''Export''' activity will deliver document content to the storage platform determined by the Export Definition'''.




[https://app.supademo.com/demo/cm7tejl9c1yawhilghvxwjnz0 Click here for an interactive walkthrough]
[https://app.supademo.com/demo/cm7tejl9c1yawhilghvxwjnz0 Click here for an interactive walkthrough]
# To add an '''''Export Definition''''', press the "Add" button in the '''''Export Definitions''''' collection editor.
# To add an Export Definition, press the "Add" button in the Export Definitions collection editor.
# This can be one of the following '''''Export Definition Types''''':
# This can be one of the following Export Definition Types:
#* '''''CMIS Export''''' - To export content using a '''[[CMIS Connection]]'''
#* CMIS Export - To export content using a '''[[CMIS Connection]]'''
#* '''''Data Export''''' - To export data to a SQL database or ODBC compliant database
#* Data Export - To export data to a SQL database or ODBC compliant database
#* '''''File Export''''' - To export files to a Windows file system
#* File Export - To export files to a Windows file system
#* '''''FTP Export''''' - To export files to an FTP server
#* FTP Export - To export files to an FTP server
#* '''''IMAP Export''''' - To export files to an IMAP email server
#* IMAP Export - To export files to an IMAP email server
#* '''''SFTP Export''''' - To export files to an SFTP server
#* SFTP Export - To export files to an SFTP server




==== Export Definition Types ====
==== Export Definition types ====
Each '''''Export Definition Type''''' (or "'''''Export Type'''''") defines connection to the endpoint storage location slightly differently.
Each Export Definition defines connection to the storage destination slightly differently.


===== CMIS Export =====
===== CMIS Export =====
For '''''CMIS Export''''', document content is exported over a '''CMIS Connection'''.
For CMIS Export, document content is exported over a '''CMIS Connection'''.




[https://app.supademo.com/demo/cm7tg5fzr1zayhilgy3zqpqot Click here for an interactive walkthrough]
[https://app.supademo.com/demo/cm7tg5fzr1zayhilgy3zqpqot Click here for an interactive walkthrough]


# The '''''CMIS Repository''''' property here establishes connection to a storage platform connected to Grooper via the '''CMIS Connection'''.
# The CMIS Repository property here establishes connection to a storage platform connected to Grooper via the '''CMIS Connection'''.
#* The '''CMIS Connection''' defines the connection settings for one of several storage platforms available in Grooper. '''CMIS Repositories''' are storage locations imported into Grooper as children of the '''CMIS Connection''' object.  
#* The '''CMIS Connection''' defines the connection settings for one of several storage platforms available in Grooper. '''CMIS Repositories''' are storage locations imported into Grooper as children of the '''CMIS Connection''' object.  
#* Depending on the connection type (or "binding") the '''CMIS Repository''' will represent a different storage location. This could be a Windows file system folder for the '''''NTFS''''' binding. This could be a SharePoint site for the '''''SharePoint''''' binding. This could be an email inbox for the '''''Exchange''''' or '''''IMAP''''' bindings.
#* Depending on the connection type (or "binding") the '''CMIS Repository''' will represent a different storage location. This could be a Windows file system folder for the NTFS binding. This could be a SharePoint site for the SharePoint binding. This could be an email inbox for the Exchange or IMAP bindings.


For more information, please visit the '''[[CMIS Repository]]''' and '''''[[CMIS Export]]''''' articles.
For more information, please visit the '''[[CMIS Repository]]''' and [[CMIS Export]] articles.


===== Data Export =====
===== Data Export =====
For '''''Data Export''''', extracted data content is exported to a SQL database or ODBC compliant database, using a '''Data Connection''' object.
For Data Export, extracted data content is exported to a SQL database or ODBC compliant database, using a '''Data Connection''' object.




[https://app.supademo.com/demo/cm7tkuuvd259bhilggjc1819e Click here for an interactive walkthrough]
[https://app.supademo.com/demo/cm7tkuuvd259bhilggjc1819e Click here for an interactive walkthrough]
# The '''''Connection''''' property connects Grooper to the database connected to Grooper via the referenced '''Data Connection''' object.
# The Connection property connects Grooper to the database connected to Grooper via the referenced '''Data Connection''' object.
#* The '''Data Connection''' object stores all the connection settings required for Grooper to transmit data to the database, including the database server's name and any required logon settings.
#* The '''Data Connection''' object stores all the connection settings required for Grooper to transmit data to the database, including the database server's name and any required logon settings.
# The '''''Table Mappings''''' property defines what '''Data Model''' elements should be exported to which table columns in the database.
# The Table Mappings property defines what '''Data Model''' elements should be exported to which table columns in the database.


{|class="fyi-box"
{|class="fyi-box"
Line 216: Line 197:
In previous versions of Grooper, collected document data were exported to a database using the '''Database Export''' activity. Files, on the other hand, were exported using the '''Document Export''' activity.
In previous versions of Grooper, collected document data were exported to a database using the '''Database Export''' activity. Files, on the other hand, were exported using the '''Document Export''' activity.


In version 2021, the two export activities were combined into a single activity, the '''Export''' activity. You will now add and configure a '''''Data Export''''' definition to export data to a SQL or ODBC compliant database.
In version 2021, the two export activities were combined into a single activity, the '''Export''' activity. You will now add and configure a Data Export definition to export data to a SQL or ODBC compliant database.
|}
|}


For more information, please visit the '''[[Data Connection]]''' and '''''[[Data Export]]''''' articles.
For more information, please visit the '''[[Data Connection]]''' and [[Data Export]] articles.


===== File Export =====
===== File Export =====
For '''''File Export''''', document content is exported to a Windows file system folder.
For File Export, document content is exported to a Windows file system folder.




[https://app.supademo.com/demo/cm7unqnok2ryrhilgs1shbh33 Click here for an interactive walkthrough]
[https://app.supademo.com/demo/cm7unqnok2ryrhilgs1shbh33 Click here for an interactive walkthrough]
# The '''''Target Folder''''' property defines what folder you want to export content to. It is '''''always''''' best practice to use a fully qualified UNC path, to disambiguate file and folder locations on one networked machine from another.
# The Target Folder property defines what folder you want to export content to. It is always best practice to use a fully qualified UNC path, to disambiguate file and folder locations on one networked machine from another.


{|class="attn-box"
{|class="attn-box"
Line 232: Line 213:
&#9888;
&#9888;
|
|
The '''''File Export''''' definition is a carry over from older methods of exporting to a Windows file system in previous versions of Grooper. ''File Export'' exists mostly for backwards compatibility, but it can still be utilized for simple file system exports.
The File Export definition is a carry over from older methods of exporting to a Windows file system in previous versions of Grooper. File Export exists mostly for backwards compatibility, but it can still be utilized for simple file system exports.


In current versions, using the '''''[[CMIS Export]]''''' definition with an '''''[[NTFS]]''''' connection is a preferable method to export document content to a Windows file system.
In current versions, using the [[CMIS Export]] definition with an [[NTFS]] connection is a preferable method to export document content to a Windows file system.
|}
|}


===== IMAP Export =====
===== IMAP Export =====
For '''''IMAP Export''''' document content is exported to email servers using the IMAP protocol.
For IMAP Export document content is exported to email servers using the IMAP protocol.




[https://app.supademo.com/demo/cm83gpp9r08ccvp25cy6g97h7 Click here for an interactive walkthrough]
[https://app.supademo.com/demo/cm83gpp9r08ccvp25cy6g97h7 Click here for an interactive walkthrough]
# The '''''Mail Server''''' property defines the host name (or IP address) of the email server you want to export to.
# The Mail Server property defines the host name (or IP address) of the email server you want to export to.
#* For example, the server used to connect to an Outlook 365 inbox is "outlook.office365.com"  
#* For example, the server used to connect to an Outlook 365 inbox is "outlook.office365.com"  
# The '''''User Name''''' and '''''Password''''' properties define the logon information to connect to the mailbox.
# The User Name and Password properties define the logon information to connect to the mailbox.
# The '''''Target Folder''''' property defines what email folder you want to export content to.
# The Target Folder property defines what email folder you want to export content to.


{|class="attn-box"
{|class="attn-box"
Line 251: Line 232:
&#9888;
&#9888;
|
|
The '''''IMAP Export''''' definition is a carry over from older methods of exporting across the IMAP protocol in previous versions of Grooper. '''''IMAP Export''''' exists mostly for backwards compatibility, but it can still be utilized for simple exports to email boxes.
The IMAP Export definition is a carry over from older methods of exporting across the IMAP protocol in previous versions of Grooper. IMAP Export exists mostly for backwards compatibility, but it can still be utilized for simple exports to email boxes.


In current versions, using the '''''[[CMIS Export]]''''' definition with an '''''[[IMAP]]''''' connection is a preferable method to export document content to an IMAP server.
In current versions, using the [[CMIS Export]] definition with an [[IMAP]] connection is a preferable method to export document content to an IMAP server.


Furthermore, when connecting to a Microsoft Outlook inbox, the '''''[[Exchange]]''''' binding is preferable to the '''''IMAP''''' binding. The '''''Exchange''''' binding has increased functionality specifically designed for the Outlook messaging system.
Furthermore, when connecting to a Microsoft Outlook inbox, the [[Exchange]] binding is preferable to the IMAP binding. The Exchange binding has increased functionality specifically designed for the Outlook messaging system.
|}
|}


===== FTP Export =====
===== FTP Export =====
For '''''FTP Export''''', document content is exported to an FTP site using the FTP protocol.
For FTP Export, document content is exported to an FTP site using the FTP protocol.




[https://app.supademo.com/demo/cm7utzdsi2w4ahilgezfulorx Click here for an interactive walkthrough]
[https://app.supademo.com/demo/cm7utzdsi2w4ahilgezfulorx Click here for an interactive walkthrough]
# The '''''FTP Server URL''''' property defines what site you want to export to.
# The FTP Server URL property defines what site you want to export to.
# The '''''User Name''''' and '''''Password''''' properties define the logon information to connect to the FTP site.
# The User Name and Password properties define the logon information to connect to the FTP site.


{|class="attn-box"
{|class="attn-box"
Line 270: Line 251:
&#9888;
&#9888;
|
|
The '''''FTP Export''''' definition is a carry over from older methods of exporting to FTP sites in pervious versions of Grooper. '''''FTP Export''''' exists mostly for backwards compatibility, but it can still be utilized for simple exports to FTP folders.
The FTP Export definition is a carry over from older methods of exporting to FTP sites in pervious versions of Grooper. FTP Export exists mostly for backwards compatibility, but it can still be utilized for simple exports to FTP folders.


In current versions, using the '''''[[CMIS Export]]''''' definition with an '''''[[FTP]]''''' connection is a preferable method to export document content to an FTP site.
In current versions, using the [[CMIS Export]] definition with an [[FTP]] connection is a preferable method to export document content to an FTP site.
|}
|}


===== SFTP Export =====
===== SFTP Export =====
For '''''SFTP Export''''', document content is exported to an SFTP site using the SFTP protocol.
For SFTP Export, document content is exported to an SFTP site using the SFTP protocol.




[https://app.supademo.com/demo/cm7uw5emu2xdlhilg9nwqydpf Click here for an interactive walkthrough]
[https://app.supademo.com/demo/cm7uw5emu2xdlhilg9nwqydpf Click here for an interactive walkthrough]
# The '''''Host Name''''' property defines what site you want to export to.
# The Host Name property defines what site you want to export to.
# The '''''User Name''''' and '''''Password''''' properties define the logon information to connect to the SFTP site.
# The User Name and Password properties define the logon information to connect to the SFTP site.


{|class="attn-box"
{|class="attn-box"
Line 287: Line 268:
&#9888;
&#9888;
|
|
The '''''SFTP Export''''' definition is a carry over from older methods of exporting to SFTP sites in previous versions of Grooper. '''''SFTP Export''''' exists mostly for backwards compatibility, but it can still be utilized for simple exports to SFTP folders.
The SFTP Export definition is a carry over from older methods of exporting to SFTP sites in previous versions of Grooper. SFTP Export exists mostly for backwards compatibility, but it can still be utilized for simple exports to SFTP folders.


In current versions, using the '''''[[CMIS Export]]''''' definition with an '''''[[SFTP]]''''' connection is a preferable method to export document content to an SFTP site.
In current versions, using the [[CMIS Export]] definition with an [[SFTP]] connection is a preferable method to export document content to an SFTP site.
|}
|}


===== Which Export Definition is right for me? =====
==== Which Export Definition is right for me? ====


When choosing an Export Definition, you should be asking yourself "Do I want to export files, data, or both?". What '''content''' you want to export will inform which '''location''' (or locations) you export to. Your answer to this question will impact which Export Definition you choose and how you configure it to export document '''Batch Folder''' content.
When choosing an Export Definition, you should be asking yourself "Do I want to export files, data, or both?". What '''content''' you want to export will inform which destination (or destinations) you export to. Your answer to this question will impact which Export Definition you choose and how you configure it to export document '''Batch Folder''' content.


{|cellpadding=10 cellpadding=5
{|cellpadding=10 cellpadding=5
|valign=top style="width:10%"|
|valign=top style="width:10%"|
'''''Data Only'''''
Data Only
|
|
If you're purely exporting document data content (values collected from the '''Extract''' activity) and nothing else, you're likely looking to export data to a database.  
If you're purely exporting document data content (values collected from the '''Extract''' activity) and nothing else, you're likely looking to export data to a database.  
* Use '''''Data Export''''' to do this.
* Use Data Export to do this.
|-
|-
|valign=top|
|valign=top|
'''''Files Only'''''
Files Only
|
|
If you're looking to export files, such as PDFs, TIFs, and text files, you have more options depending on the storage location you want your document files to wind up in.
If you're looking to export files, such as PDFs, TIFs, and text files, you have more options depending on the storage destination you want the files to wind up in.


Use any of the following Export Definitions, depending on where you want to export.
Use any of the following Export Definitions, depending on where you want to export.
* '''''CMIS Export'''''
* CMIS Export
* '''''File Export'''''
* File Export
* '''''FTP Export'''''
* FTP Export
* '''''SFTP Export'''''
* SFTP Export
* '''''Mail Export''''' (very uncommon)
* Mail Export (very uncommon)


All of these Export Definitions have a configurable '''''Export Format''''' property, which will allow you to build an export file of a given format out of '''Batch Folder''' content.
All of these Export Definitions have a configurable Export Format property, which will allow you to build an export file of a given format out of '''Batch Folder''' content.
:*<li class="fyi-bullet">For example, the '''''PDF Format''''' can be configured to build a PDF file from the '''Batch Folder's''' image content (using the images of its child '''Batch Pages''') or its file content (its attached PDF) and the full text data content obtained from OCR.
:*<li class="fyi-bullet">For example, the PDF Format can be configured to build a PDF file from the '''Batch Folder's''' image content (using the images of its child '''Batch Pages''') or its file content (its attached PDF) and the full text data content obtained from OCR.
|-
|-
|valign=top|
|valign=top|
'''''Both Data and Files'''''
Both Data and Files
|
|
When exporting document content, there are a variety of ways to export both data and files.
When exporting document content, there are a variety of ways to export both data and files.


# Commonly, if you want to export ''both'' data and files, you will simply add multiple '''''Export Definitions'''''.
# Commonly, if you want to export ''both'' data and files, you will simply add multiple Export Definitions.
#* A '''''Data Export''''' to export data to a database and one of the other '''''Export Types''''' to export files to a storage repository of your choice.
#* A Data Export to export data to a database and one of the other Export Types to export files to a storage repository of your choice.
# You can also export data as a file itself, such as an XML, JSON, or text data file.
# You can also export data as a file itself, such as an XML, JSON, or text data file.
#* These are '''''Export Format''''' options available when configuring '''''CMIS Export''''', '''''File Export''''', '''''FTP Export''''', or '''''SFTP Export'''''.
#* These are Export Format options available when configuring CMIS Export, File Export, FTP Export, or SFTP Export.
#* We will detail all '''''Export Formats''''' [[#Export Formats|below]].
#* We will detail all Export Formats [[#Export Formats|below]].
# Depending on the CMIS Repository you're exporting to, you can use ''''CMIS Export''''' to export ''both'' files and data using '''''CMIS Export''''' only.
# Depending on the CMIS Repository you're exporting to, you can use ''''CMIS Export to export ''both'' files and data using CMIS Export only.
#* This will be highly dependent on the capabilities of the storage platform (and the CMIS Binding used to connect to it).
#* This will be highly dependent on the capabilities of the storage platform (and the CMIS Binding used to connect to it).
#* For more information, please visit the '''''[[CMIS Export]]''''' article.
#* For more information, please visit the [[CMIS Export]] article.
# Grooper's '''''PDF Data Mapping''''' behavior more fully leverages the capabilities of the PDF format when exporting PDF files. With this functionality, you can store classification and extraction data via bookmarking and PDF metadata.
# Grooper's PDF Data Mapping behavior more fully leverages the capabilities of the PDF format when exporting PDF files. With this functionality, you can store classification and extraction data via bookmarking and PDF metadata.
#* This allows you to store most, if not all, document content in the PDF file itself.
#* This allows you to store most, if not all, document content in the PDF file itself.
#* For more information, please visit the '''''[[PDF Data Mapping]]''''' article.
#* For more information, please visit the [[PDF Data Mapping]] article.
|}
|}


==== Export Formats ====
== Export Formats ==
When exporting content to an export location you must determine what format that content takes. There's all different types of files out there. Some of them are better suited to house different types of content than others. XML files are great for storing data, but not so much for image content. TIF files are great for image content, but not so much for full text data.
When exporting content to an export destination you must determine what format that content takes. There's all different types of files out there. Some of them are better suited to house different types of content than others. XML files are great for storing data, but not so much for image content. TIF files are great for image content, but not so much for full text data.
 
Export Formats define what file types are exported by an [[Export Behavior]]'s [[Export Definition]]. They are used to generate output files from document content (including files attached to the Batch Folder, child pages, and extracted data).  


Export Formats determine what file is generated by an Export Definition. There are currently ten (10) Export Format options available in Grooper.
Export Formats can be used to generate a variety of file types including:
* '''PDF Format''' - This will output a PDF file from the Batch Folder content.  This includes capabilities to embed full text data obtained from the Recognize activity.
* PDFs and TIFFs from document images
* '''TIF Format''' - This will output a multipage TIF file using the Batch Folder's image content (as in its child Batch Pages' images)
* JSON, XML, CSV and TXT metadata files from extracted Data Model data
* '''XML Metadata''' - This will output extracted Data Model values to an XML file.
* ZIP archives of file attachments/images from descendant Batch Folder/Batch Pages
** All Data Model values are recorded, including each cell value from Data Tables and Data Field values in multi-instance Data Sections.
 
** The XML includes additional information collected for each "data instance", including a value's page location data.
=== Export Formats summarized ===
** This XML file uses a schema developed by Grooper. You can reformat this schema by applying an XSLT transform with the XML Transform activity. Or, if you have an XSD schema file, you can use XML Format to conform the Data Model values to the XML schema.
 
* '''XML Format''' - This will output extracted Data Model values to an XML file and format it according to an XSD schema file.
There are currently ten (10) Export Formats. They can be divided into 3 categories:
* '''JSON Metadata''' - This will output extracted Data Model values to a JSON file. The JSON layout can be "Simple" or "Full"
* Merge Formats
** Full - This is a detailed JSON file that includes values, location data, confidence scores and more. This is the entire "Grooper.DocumentData.json" file generated for each document when Extract runs.
* Metadata Formats
** Simple - This is a compact JSON file that includes values only. This is preferable for users who just want a simple JSON file with the values Grooper collected from a document.
* Other Formats
** In both cases, all Data Model values are recorded, including each cell value from Data Tables and Data Field values in multi-instance Data Sections.
 
* '''Simple Metadata''' - This will output extracted Data Model values to a text file.
==== Merge Formats ====
** This file formats Data Fields and their values as simple "key-value pairs".
 
** Only single instance data is recorded (No Data Table data. No multi-instance Data Section data)
These formats generate an output file by merging content from a Batch Folder's children (and sometimes content stored on the Batch Folder itself) into a single file. Merge Formats combine multiple Batch Pages or other children of a Batch Folder into a single file, such as a multipage PDF or TIF.
* '''Delimited Metadata''' - This will output extracted Data Model values to a value-delimited text file.
:*<li class="fyi-bullet">Merge Formats are used by both the [[Export]] activity and the [[Merge]] activity.
** You can choose the file extension and delimiter in its configuration. You can configure this to make comma separated value (CSV) files.
:** When added to an Export Definition in an Export Behavior, Export will build the file and export it to the folder destination configured in the Export Definition.
** Only single instance data is recorded (No Data Table data. No multi-instance Data Section data)
:** When configured for the Merge activity, Merge will build the file and attach it to the Batch Folder.
* '''Text Format''' - This will output full text content only, generated from OCR data, as a text file.
:*<li class="attn-bullet"> ''XML Format is an outlier in this category''. It does not generate the file by merging content from the Batch Folder's children. Instead, it generates an XML file from extracted Data Model data stored on the Batch Folder. XML Format is in the Merge Format category simply because the Merge activity can use it.
* '''ZIP Format''' - This will output a ZIP file containing the file attachment for all descendent nodes.
 
* '''Attached File''' - This will output a Batch Folder's "main attachment file" or an attached file by name.
;'''PDF Format'''
** For files that were imported from a digital source, the attachment file is the file attached to the Batch Folder when it was created on import.
: This will output a PDF file from the Batch Folder content (its child Batch Pages and in some configurations other content).  This includes capabilities to embed full text data obtained from the Recognize activity.
** This option can also output any file attached to a Batch Folder by referencing a filename.  This is how Grooper exports files generated by activities such as XML Transform, Text Transform, Merge or custom scripted activities.
;'''TIF Format'''
** If the Batch Folder has no attachment, this option will generate an image version of the document from all child Batch Pages in the folder.
: This will output a multipage TIF file using the Batch Folder's child Batch Page images content.
;'''XML Format'''
: This will output extracted Data Model values to an XML file and format it according to an XSD schema file.
:* XML Format is designed to be used with Data Models generated by the XML Schema Importer using an XSD schema file.
:* XML Format differs from the XML Metadata format. XML Format creates an XML file that conforms to an XSD schema file selected by the user. XML Metadata creates an XML file that conforms to a schema defined by Grooper.
; '''ZIP Format'''
: This will output a ZIP file containing the file attachment for all descendent nodes.
 
==== Metadata Formats ====
 
These formats generate an output file containing metadata extracted from a Grooper document. The Metadata Formats build various text-based files from the Batch Folder's extracted Data Model and its fields.
:*<li class="attn-bullet"> When noted below, certain formats can only output single-instance Data Fields and their values (not Data Table values or values in multi-instance Data Sections).
 
; '''Delimited Metadata'''
: This will output extracted Data Model values to a value-delimited text file. You can choose the file extension and delimiter in its configuration. You can configure this to make comma separated value (CSV) files.
:*<li class="attn-bullet"> Only single instance Data Fields are output.
; '''Simple Metadata'''
: This will output extracted Data Model values to a text file. This file formats Data Fields and their values as simple "key-value pairs".
:*<li class="attn-bullet"> Only single instance Data Fields are output.
;'''JSON Metadata'''
: This will output extracted Data Model values to a JSON file. :The JSON layout can be "Simple" or "Full"
:* Full - This is a detailed JSON file that includes values, location data, confidence scores and more. This is the entire "Grooper.DocumentData.json" file generated for each document when Extract runs.
:* Simple - This is a compact JSON file that includes values only. This is preferable for users who just want a simple JSON file with the values Grooper collected from a document.
;'''XML Metadata'''
: This will output extracted Data Model values to an XML file.
:* The XML includes additional information collected for each "data instance", including a value's page location data.
:* The XML Metadata format differs from the "XML Format". XML Format creates an XML file that conforms to an XSD schema file selected by the user. XML Metadata creates an XML file that conforms to a schema defined by Grooper.
 
==== Other Formats ====
 
There are 2 Export Formats that do not fit into the other categories:
* Attached File
* Text Format
 
; '''Attached File'''
:This will output a Batch Folder's main "attachment file" or an attached file by name.
:* For files that were imported from a digital source, the attachment file is the file attached to the Batch Folder when it was created on import.
:* This option can also output any file attached to a Batch Folder by referencing a filename.  This is how Grooper exports files generated by activities such as [[XML Transform]], [[Text Transform]], [[Merge]] or custom scripted activities.
:* If the Batch Folder has no attachment, this option will generate an image version of the document from all child Batch Pages in the folder.
; '''Text Format'''
: This will output full text content only, generated from OCR data, as a text file.
 
=== How to add Export Formats ===
 
Export Formats can be configured for any of the Export Definitions that export files (all Export Definitions except Data Export).
 
# From an Export Behavior's Export Definitions editor, select the Export Definition you wish to configure.
# Find the "Export Formats" property.
#:*<li class="fyi-bullet"> For some Export Definitions (like CMIS Export), you will need to configure some required properties before the Export Formats property appears.
# Open the Export Formats editor (Press the "..." button)..
# To add an Export Format, press the "Add" button.
# Select the Export Format you wish to use from the dropdown list.
# If necessary, you can add multiple Export Formats by pressing the Add button again.
<div style="position: relative; box-sizing: content-box; max-height: 80vh; max-height: 80svh; width: 100%; aspect-ratio: 1.7777777777777777; padding: 10px 0 40px 0;"><iframe src="https://app.supademo.com/embed/cm7uyr32q2zizhilgdkkipiwg?embed_v=2&utm_source=embed" loading="lazy" title="2025 Wiki - Export - Export Formats" allow="clipboard-write" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
 
=== Export Formats detailed ===


Below we will briefly describe each Export Format to give you a better idea of the files they create.


Below we will briefly describe each Export Format to give you a better idea of what content you can export with each format.
==== Merge Formats ====


===== PDF Format =====
===== PDF Format =====
Line 392: Line 431:
<br clear=all>
<br clear=all>


===== XML Metadata =====
===== TIF Format =====


The XML Metadata format will output extracted Data Model values to an XML file. The XML uses an XML schema developed by Grooper do detail information about extracted "data instances", including values, page location information, confidence scores, and more.
The TIF Format will output image content only as a TIF (Tagged Image Format) file.
* Document level information (including the Batch Folder's classified Document Type) are found in the <code><Document></code> tag.
* TIF is a format used to store high quality raster graphics for graphic design or publishing.  
* Data Field values and information are found in the <code><Field></code> tags.
* Keep in mind this is an image only format. If you want text-behind embedded in your files, you must use the PDF Format.
* Extracted table values are found in <code><TableCell></code> tags as children of <code><TableRow></code> and <code>&lt;Table&gt;</code> parent tags.
[https://app.supademo.com/demo/cm7xo352u0ch5gvwcxzivc143 Click here for an interactive walkthrough]


[https://app.supademo.com/demo/cm7xjr4cw09wwgvwcsig5iyba Click here for an interactive walkthrough]
[[File:Export-export-formats-15.png|right|502px]]
[[File:Export-export-formats-04.png|right|502px]]
<big>Example Output</big>
<big>Example Output</big>
# Document level information (including the Batch Folder's classified Document Type) are found in the <code><Document></code> tag.
# Data Field values and information are found in the <code><Field></code> tags.
# Extracted table values are found in <code><TableCell></code> tags as children of <code><TableRow></code>
{|class="fyi-box"
|
FYI
|
XML data can be reformatted using XSLT style sheets using the [[XML Transform]] activity.
|}
<br clear=all>
<br clear=all>


===== XML Format =====
===== XML Format =====


Much like the XML Metadata format, "XML Format" will output extracted Data Model values to an XML file. The difference is in how the XML's schema is formatted.
"XML Format" will output extracted Data Model values to an XML file.
 
XML Format and XML Metadata both generate an XML file from a document's Data Model. The difference is in how the XML's schema is formatted.
* XML Metadata uses a set schema developed by Grooper.
* XML Metadata uses a set schema developed by Grooper.
* XML Format allows users to select an XSD schema file. This way, they can format the data Grooper collects to whatever schema they want.
* XML Format allows users to select an XSD schema file.  
 
XML Format allows users to format the data Grooper collects to whatever schema they want without transforming the Grooper XML schema using [[XML Transform]] or XML transformations outside of Grooper.


[https://app.supademo.com/demo/cm7xutc260flkgvwcsnnwob4r Click here for an interactive walkthrough]
[https://app.supademo.com/demo/cm7xutc260flkgvwcsnnwob4r Click here for an interactive walkthrough]


===== JSON Metadata =====
===== ZIP Format =====
The ZIP Export Format enables you to export multiple documents as a ZIP file. A single ZIP file will be generated containing the file attachments for all descendent Batch Folders.
[https://app.supademo.com/demo/cm7xqoxi20dwfgvwc5oul8whb Click here for an interactive walkthrough]


The "JSON Metadata" format will output extracted Data Model values to a JSON file. The JSON layout can be "Simple" or "Full"
==== Metadata Formats ====
** Full - This is a detailed JSON file that includes values, location data, confidence scores and more. This is the entire "Grooper.DocumentData.json" file generated for each document when Extract runs.
** Simple - This is a compact JSON file that includes values only. This is preferable for users who just want a simple JSON file with the values Grooper collected from a document.
 
[https://app.supademo.com/demo/cm7w8vyht001n336t0pcutfni Click here for an interactive walkthrough]


===== Simple Metadata =====
===== Simple Metadata =====
Line 438: Line 468:
* The keys-value pairs are separated by a delimiter, which is "=" buy default.
* The keys-value pairs are separated by a delimiter, which is "=" buy default.
*: Ex: <code>fieldName=fieldValue</code>
*: Ex: <code>fieldName=fieldValue</code>
*<li class="attn-bullet"> Only single instance Data Fields are output.


[https://app.supademo.com/demo/cm7wfcw1m02pa336tajev6bu1 Click here for an interactive walkthrough]
[https://app.supademo.com/demo/cm7wfcw1m02pa336tajev6bu1 Click here for an interactive walkthrough]
Line 449: Line 480:
# Extracted values are on the right.
# Extracted values are on the right.
<br clear=all>
<br clear=all>


===== Delimited Metadata =====
===== Delimited Metadata =====


The Delimited Metadata format outputs extracted Data Field values to a character delimited text file.
* This formats Data Field values as a delimiter-separated value array (i.e <code>value1,value2,value3</code>).
* Use the "Text Extension" property to choose a file extension. TXT is the default.
* Use the "Delimiter" property to define the character. This is a comma (<code>,</code>) by default.
* Use the "Delimiter Escape" property to replace a delimiter in the Data Field's value with a different character (Ex: swap a comma for a semicolon).
* Use the "Include Header" property to include a header row in the file. The header row is populated with the Data Field's names.
*<li class="attn-bullet"> Only single instance Data Fields are output.
[https://app.supademo.com/demo/cm7xw9rjz0gowgvwct8wxifah Click here for an interactive walkthrough]
[https://app.supademo.com/demo/cm7xw9rjz0gowgvwct8wxifah Click here for an interactive walkthrough]
# The Delimited Metadata format also outputs extracted Data Field values to a text file.
[[File:Export-export-formats-12.png|right|502px]]
# This formats Data Field values as a delimiter-separated value array, delimited by the character entered for the Delimiter property (a comma by default).
# You must enter a value for the Delimiter Escape property.
#* This property will replace a delimiter character occurring in a Data Field's extracted value.
#* For example, we've entered a semicolon character. So, a comma (our Delimiter character) would be swapped for a semicolon.
# Using a header row is useful for this kind of format. The header row is populated with the Data Field's names. Turn Include Header to True to include a header row in the text file.
 
[[File:Export-export-formats-12.png|left|502px|class=left-with-bullets]]
 
<big>Example Output</big>
<big>Example Output</big>


Line 470: Line 499:
<br clear=all>
<br clear=all>


===== TIF Format =====
===== JSON Metadata =====


The TIF Format will output image content only as a TIF (Tagged Image Format) file.
The "JSON Metadata" format will output extracted Data Model values to a JSON file. The JSON layout can be "Simple" or "Full"
* TIF is a format used to store high quality raster graphics for graphic design or publishing.  
** Full - This is a detailed JSON file that includes values, location data, confidence scores and more. This is the entire "Grooper.DocumentData.json" file generated for each document when Extract runs.
* Keep in mind this is an image only format. If you want text-behind embedded in your files, you must use the PDF Format.
** Simple - This is a compact JSON file that includes values only. This is preferable for users who just want a simple JSON file with the values Grooper collected from a document.
[https://app.supademo.com/demo/cm7xo352u0ch5gvwcxzivc143 Click here for an interactive walkthrough]


[[File:Export-export-formats-15.png|right|502px]]
[https://app.supademo.com/demo/cm7w8vyht001n336t0pcutfni Click here for an interactive walkthrough]
<big>Example Output</big>
<br clear=all>
===== Text Format =====


The Text Format will output full text content only, generated from OCR data, as a text (TXT) file.
===== XML Metadata =====
* This is the same data you would see in a Batch Folder's "Text Rendition" in the Document Viewer.
* Text data generated by the Recognize activity is used to create the file. The "Grooper.Characters.txt" file is used to build the file.
*:*<li class="fyi-bullet"> Technically, a text file would be generated from either OCR data or native text data.


The XML Metadata format will output extracted Data Model values to an XML file. The XML uses an XML schema developed by Grooper do detail information about extracted "data instances", including values, page location information, confidence scores, and more.
* Document level information (including the Batch Folder's classified Document Type) are found in the <code><Document></code> tag.
* Data Field values and information are found in the <code><Field></code> tags.
* Extracted table values are found in <code><TableCell></code> tags as children of <code><TableRow></code> and <code>&lt;Table&gt;</code> parent tags.


[https://app.supademo.com/demo/cm7xoqd2x0cnegvwcffhhnlk3 Click here for an interactive walkthrough]
[https://app.supademo.com/demo/cm7xjr4cw09wwgvwcsig5iyba Click here for an interactive walkthrough]
[[File:Export-export-formats-04.png|right|502px]]
<big>Example Output</big>


[[File:Export-export-formats-17.png|right|502px]]
# Document level information (including the Batch Folder's classified Document Type) are found in the <code><Document></code> tag.
<big>Example Output</big>
# Data Field values and information are found in the <code><Field></code> tags.
# Extracted table values are found in <code><TableCell></code> tags as children of <code><TableRow></code>


Upon export, this will generate a text file from the Batch Folder's raw OCR text data, generated from the Recognize activity.
{|class="fyi-box"
|
FYI
|
XML data can be reformatted using XSLT style sheets using the [[XML Transform]] activity.
|}
<br clear=all>
<br clear=all>
==== Other Formats ====
===== Attached File =====
===== Attached File =====
[[File:2023_Export_Activity_02_Export_Behaviors_02_1_Export_Definitions_02_1_2_Export_Formats_11.png|right]]
[[File:2023_Export_Activity_02_Export_Behaviors_02_1_Export_Definitions_02_1_2_Export_Formats_11.png|right]]
Line 521: Line 558:
:* This is not ''always'' ideal.
:* This is not ''always'' ideal.
:* When an Import Job imports a file into Grooper, a Batch Folder is created and the file is attached to it. This is the Batch Folder's main attachment file.
:* When an Import Job imports a file into Grooper, a Batch Folder is created and the file is attached to it. This is the Batch Folder's main attachment file.
:** If you simply want to export the same file you imported to export location, no further Export Format configuration is required.
:** If you simply want to export the same file you imported to export destination, no further Export Format configuration is required.
:** However, if you want to export a ''new file generated by Grooper'' you will need to (1) delete the Attached File format and (2) add one of your choosing (most typically a PDF Format).
:** However, if you want to export a ''new file generated by Grooper'' you will need to (1) delete the Attached File format and (2) add one of your choosing (most typically a PDF Format).


===== ZIP Format =====
===== Text Format =====
The ZIP Export Format enables you to export multiple documents as a ZIP file. A single ZIP file will be generated containing the file attachments for all descendent Batch Folders.
 
The Text Format will output full text content only, generated from OCR data, as a text (TXT) file.
[https://app.supademo.com/demo/cm7xqoxi20dwfgvwc5oul8whb Click here for an interactive walkthrough]
* This is the same data you would see in a Batch Folder's "Text Rendition" in the Document Viewer.
* Text data generated by the Recognize activity is used to create the file. The "Grooper.Characters.txt" file is used to build the file.
*:*<li class="fyi-bullet"> Technically, a text file would be generated from either OCR data or native text data.
 


===== How to add Export Formats =====
[https://app.supademo.com/demo/cm7xoqd2x0cnegvwcffhhnlk3 Click here for an interactive walkthrough]


Export Formats can be configured for any of the Export Definitions that export files (all of them except Data Export).
[[File:Export-export-formats-17.png|right|502px]]
<big>Example Output</big>


[https://app.supademo.com/demo/cm7uyr32q2zizhilgdkkipiwg Click here for an interactive walkthrough]
Upon export, this will generate a text file from the Batch Folder's raw OCR text data, generated from the Recognize activity.
# From an Export Behavior's Export Definitions editor, select the Export Definition you wish to configure.
<br clear=all>
# Find the "Export Formats" property.
#:*<li class="fyi-bullet"> For some Export Definitions (like CMIS Export), you will need to configure some required properties before the Export Formats property appears.
# Open the Export Formats editor (Press the "..." button)..
# To add an Export Format, press the "Add" button.
# Select the Export Format you wish to use from the dropdown list.
# If necessary, you can add multiple Export Formats by pressing the Add button again.


== Local and Shared Export Behaviors ==
== Local and Shared Export Behaviors ==

Latest revision as of 14:17, 3 September 2025

This article is about the current version of Grooper.

Note that some content may still need to be updated.

20252024 20232021

output Export is an Activity that transfers documents and extracted information to external file systems and content management systems, completing the data processing workflow.

Export is a "Code Activity" set on a Batch Process Step, typically added as one of the last steps (if not the last step) of a Batch Process. It allows Grooper users to deliver processed Batch content to an external system. Whether exporting Batch Folders as PDF files to a Windows directory, exporting extracted Data Model fields to a SQL database, exporting to a content management system, or some combination of multiple exports to multiple systems, the Export activity handles how document Batch Folders in a Batch ultimately leave Grooper after they have been classified and had their data extracted.

How documents are exported (what gets exported, where they go, and what format the exported content takes) is all controlled by Export Behaviors. This is a set of properties configured to control how Batch Folder content is exported based on its Document Type classification. Export Behaviors can be configured locally, configured as part of the Export activity's property configuration, or can be configured for a particular Content Type, by configuring the Behaviors property of a Content Model and/or its descendant Content Categories or Document Types.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2025). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

About

So you've ingested some documents into a Batch. You've obtained their full text data with the Recognize activity, either through OCR or extracting their native embedded text. You've classified these documents, assigning the Batch Folders a Document Type from a Content Model during the Classify activity. You've collected the data you want from these documents during the Extract activity. Now what?

You need to get these documents and that data out of Grooper!

Enter the Export activity. Grooper is designed to be a document processing platform. It is a powerful tool to model document sets and their data (according to a Content Model) and put unprocessed pages or files through a step by step list of processing instructions (according to a Batch Process) to ultimately organize them and collect information from them. However, Grooper is not designed to be a content management system or a storage platform. Once your documents are organized and Grooper has extracted the data you want from them, you generally want to put those files and data in an external endpoint, such as a file system, a database, a content management system or some combination thereof.

The Export activity's job is to get document content out of Grooper, according to your specifications. Using one or more Export Behavior configurations, you can control how processed document content is exported, how its indexed in which storage location, what data goes where, what file format certain content should take, and more.

FYI

How you export documents in Grooper underwent some serious changes in version 2023. In previous versions, there were two separate export activities: Document Export and Database Export.

To simplify things, we combined these two Activities into the singular Export activity. Whether you're exporting document files or data to a database, you use the Export activity and Export Behavior configurations in either case.

Just What Is "Document Content"?

We're going to talk a lot about "document content" throughout this article. Ultimately, the Export activity controls what content is exported and how it is exported. So, what do we mean by "document content"?

In terms of its content, you can break up a document processed by Grooper into (at least) three meaningful components:

  1. The document's image (This may be made up of its child Batch Pages' images)
  2. The document's full text (This may be made up of its child Batch Pages' full text)
  3. The document's extracted data (Its Data Model values collected by the Extract activity)

Each of these different kinds of content is another layer that comprises a whole document (represented as a Batch Folder in a Batch, its child Batch Pages and/or files attached to the Batch Folder). Grooper's job is to take source material (scanned pages or imported files), derive the content you desire (such as extracting Data Elements from a Data Model), and using the Export activity recombine this content into derivable files or data to one or more storage endpoints.

Image Content

The document's image is simply what the viewer physically sees when viewing the document. Whether scanned pages or a digital file, like a PDF, this content comprises the pixels on the screen you're looking at when reading a document. This content can be altered in a Batch Process by the Image Processing activity, which is a typical part of processing scanned documents to clean up the image before OCR. Upon Export, Grooper can build a new file from these images, or just export whatever image content was originally imported.

Full Text Content

A good deal of document processing automation requires machine readable text to parse words, phrases and other text data. Grooper obtains a document's full text data through the Recognize activity, OCRing images or extracting embedded digital text.

  • In the case of OCR text data, these results can then be embedded into exported PDF files as another part of its content during Export.
  • Text data can ONLY be embedded behind images (either image formats like JPEG or TIFF or single-image PDF pages).
  • Text data obtained from a native-text PDF pages CANNOT be embedded into a native-text PDF page.
    • You would have to convert the native-text PDF page into an image first to embed the text data, which is extraordinarily atypical.

Extracted Data Content

Last but not least, the Extract activity in a Batch Process will collect information from the document, according to its classified Document Type and Data Model. This may be simple indexing data, even just the Document Type assigned during the Classify activity. This may be every meaningful data point on the document, obtained from a Data Model with hundreds of extracted Data Elements. Regardless, this needs to be stored somewhere and somehow, such as in a SQL database, content management system, or as a separate data file, like an XML or CSV file.

How you merge this content into new files, define what storage platform it goes to, and how extracted data can drive indexing considerations is all controlled by the Export activity's Export Behavior configuration.

Export Behaviors

An Export Behavior defines the parameters for exporting classified folder Batch Folder content from Grooper to other systems. This includes where they are exported to (what content management system, file system, database etc), what content is exported (attached files, images, and/or data), how it is formatted (PDF, CSV, XML etc), folder pathing, file naming and data mappings (for Data Export and CMIS Export).

The Export activity exports documents according to an "Export Behavior". This is a set of export property configurations based on the Content Type assigned to a Batch Folder during document classification (In other words, the documents Document Type). Once a Batch Folder is assigned a Document Type, the Export Behavior controls the flow of traffic out of Grooper when Export runs.

Export Behaviors accommodate simple and complex export logic.

Example simple logic:

  • Build a PDF for all documents. Export them to a base folder location. Use the same naming convention for all of them.

Example complex logic:

  • For documents "A", build a PDF file and put them in folder "A" in a file system.
  • For documents "B", put them in folder "B" and export their data to a database while you're at it.
  • For documents "C", build a PDF and a JSON file, put them in folder "C" and follow some special naming convention.


Export Behaviors can be configured for any Content Type. This includes a parent Content Model or any of its descendant Document Types or Content Categories.

This allows you to use the Content Model's hierarchy to determine how you want to export documents of a certain Document Type. Content Types will inherit an Export Behavior from a parent if they do not have one of their own configured.

  • If you want to perform the same, generic export for all Document Types in a Content Model, you can configure a single Export Behavior solely for the Content Model. It will apply to all its child Document Types.
  • If a group of Document Types under a single Content Category all should be exported in the same manner, you can configure an Export Behavior for the Content Category. Those settings will apply to any of its child Document Types.
  • If every Document Type has their own specific export configuration, you can configure individual Export Behaviors for all of them. Or if one Document Type needs its own special export configuration, you can configure an Export Behavior just for it.
    • Content Types will inherit an Export Behavior from a parent if they do not have one of their own configured. If they do have their own configured, it will override the parent's Export Behavior.


Adding Export Behaviors to Content Types

An Export Behavior configuration can be added to any Content Type node (Content Model, Content Category, or Document Type) using its Behaviors property. Doing so will control how a Document Type "behaves" upon export.


Click here for an interactive walkthrough

  1. For example, here we have a Content Model selected in the Node Tree.
  2. To add an Export Behavior, first select the Behaviors property.
  3. Then, press the ellipsis button at the end of the property.
  4. This will bring up the Behaviors collection editor window.
  5. Press the "Add" button.
  6. Select "Export Behavior".
    • You can only configure one Export Behavior per Content Type object.
    • Children Content Type objects will inherit export settings from their parent Content Type's Export Behavior configuration.
    • However, multiple Export Behaviors may be added by configuring the Behaviors property of multiple Content Types. For example, if every Document Type needed a unique Export Behavior configuration, you could configure the Behaviors property for each one, adding one Export Behavior to the Behaviors list for each one.
  7. You will see the Export Behavior added to the Behaviors list.
  8. Selecting it, you can now add one or more Export Definitions with the Export Definitions property.

FYI

When configured using the Behaviors property of a Content Type object, the Export activity will export Batch Folder content in a Batch according to the Export Definition settings configured for the Batch Folder's assigned Document Type

  • Or its parent Content Category or parent Content Model depending on which Content Type's Behavior property is configured in the Content Model's hierarchy.

Export Definitions

Export Behaviors are defined by adding and configuring one or more Export Definitions (See Export Definition Types or the Export Definitions section of the Export article). An Export Definition defines export parameters to external systems, such as file systems, content management repositories, databases, or mail servers.

How document content is exported is defined using one or more Export Definitions. Export Definitions functionally determine three things:

  1. Destination - Where the document content ends up upon export. In other words, the storage platform you're exporting to.
  2. Content - What document content is exported: attached file content, image content, full text content, and/or extracted data content.
  3. Format - What format the exported content takes, such as a PDF file or XML data file.


Your primary consideration is "destination". Where do you want these files and/or data to end up? Are you exporting files to a Windows file system? Are you exporting data to a database? Are you exporting content to a content management system, like Box.com?

When configuring an Export Definition the first thing you will add is an Export Definition Type (or "Export Type"). This determines what export endpoint you're using to export document content. The Export activity will deliver document content to the storage platform determined by the Export Definition.


Click here for an interactive walkthrough

  1. To add an Export Definition, press the "Add" button in the Export Definitions collection editor.
  2. This can be one of the following Export Definition Types:
    • CMIS Export - To export content using a CMIS Connection
    • Data Export - To export data to a SQL database or ODBC compliant database
    • File Export - To export files to a Windows file system
    • FTP Export - To export files to an FTP server
    • IMAP Export - To export files to an IMAP email server
    • SFTP Export - To export files to an SFTP server


Export Definition types

Each Export Definition defines connection to the storage destination slightly differently.

CMIS Export

For CMIS Export, document content is exported over a CMIS Connection.


Click here for an interactive walkthrough

  1. The CMIS Repository property here establishes connection to a storage platform connected to Grooper via the CMIS Connection.
    • The CMIS Connection defines the connection settings for one of several storage platforms available in Grooper. CMIS Repositories are storage locations imported into Grooper as children of the CMIS Connection object.
    • Depending on the connection type (or "binding") the CMIS Repository will represent a different storage location. This could be a Windows file system folder for the NTFS binding. This could be a SharePoint site for the SharePoint binding. This could be an email inbox for the Exchange or IMAP bindings.

For more information, please visit the CMIS Repository and CMIS Export articles.

Data Export

For Data Export, extracted data content is exported to a SQL database or ODBC compliant database, using a Data Connection object.


Click here for an interactive walkthrough

  1. The Connection property connects Grooper to the database connected to Grooper via the referenced Data Connection object.
    • The Data Connection object stores all the connection settings required for Grooper to transmit data to the database, including the database server's name and any required logon settings.
  2. The Table Mappings property defines what Data Model elements should be exported to which table columns in the database.

FYI

In previous versions of Grooper, collected document data were exported to a database using the Database Export activity. Files, on the other hand, were exported using the Document Export activity.

In version 2021, the two export activities were combined into a single activity, the Export activity. You will now add and configure a Data Export definition to export data to a SQL or ODBC compliant database.

For more information, please visit the Data Connection and Data Export articles.

File Export

For File Export, document content is exported to a Windows file system folder.


Click here for an interactive walkthrough

  1. The Target Folder property defines what folder you want to export content to. It is always best practice to use a fully qualified UNC path, to disambiguate file and folder locations on one networked machine from another.

The File Export definition is a carry over from older methods of exporting to a Windows file system in previous versions of Grooper. File Export exists mostly for backwards compatibility, but it can still be utilized for simple file system exports.

In current versions, using the CMIS Export definition with an NTFS connection is a preferable method to export document content to a Windows file system.

IMAP Export

For IMAP Export document content is exported to email servers using the IMAP protocol.


Click here for an interactive walkthrough

  1. The Mail Server property defines the host name (or IP address) of the email server you want to export to.
    • For example, the server used to connect to an Outlook 365 inbox is "outlook.office365.com"
  2. The User Name and Password properties define the logon information to connect to the mailbox.
  3. The Target Folder property defines what email folder you want to export content to.

The IMAP Export definition is a carry over from older methods of exporting across the IMAP protocol in previous versions of Grooper. IMAP Export exists mostly for backwards compatibility, but it can still be utilized for simple exports to email boxes.

In current versions, using the CMIS Export definition with an IMAP connection is a preferable method to export document content to an IMAP server.

Furthermore, when connecting to a Microsoft Outlook inbox, the Exchange binding is preferable to the IMAP binding. The Exchange binding has increased functionality specifically designed for the Outlook messaging system.

FTP Export

For FTP Export, document content is exported to an FTP site using the FTP protocol.


Click here for an interactive walkthrough

  1. The FTP Server URL property defines what site you want to export to.
  2. The User Name and Password properties define the logon information to connect to the FTP site.

The FTP Export definition is a carry over from older methods of exporting to FTP sites in pervious versions of Grooper. FTP Export exists mostly for backwards compatibility, but it can still be utilized for simple exports to FTP folders.

In current versions, using the CMIS Export definition with an FTP connection is a preferable method to export document content to an FTP site.

SFTP Export

For SFTP Export, document content is exported to an SFTP site using the SFTP protocol.


Click here for an interactive walkthrough

  1. The Host Name property defines what site you want to export to.
  2. The User Name and Password properties define the logon information to connect to the SFTP site.

The SFTP Export definition is a carry over from older methods of exporting to SFTP sites in previous versions of Grooper. SFTP Export exists mostly for backwards compatibility, but it can still be utilized for simple exports to SFTP folders.

In current versions, using the CMIS Export definition with an SFTP connection is a preferable method to export document content to an SFTP site.

Which Export Definition is right for me?

When choosing an Export Definition, you should be asking yourself "Do I want to export files, data, or both?". What content you want to export will inform which destination (or destinations) you export to. Your answer to this question will impact which Export Definition you choose and how you configure it to export document Batch Folder content.

Data Only

If you're purely exporting document data content (values collected from the Extract activity) and nothing else, you're likely looking to export data to a database.

  • Use Data Export to do this.

Files Only

If you're looking to export files, such as PDFs, TIFs, and text files, you have more options depending on the storage destination you want the files to wind up in.

Use any of the following Export Definitions, depending on where you want to export.

  • CMIS Export
  • File Export
  • FTP Export
  • SFTP Export
  • Mail Export (very uncommon)

All of these Export Definitions have a configurable Export Format property, which will allow you to build an export file of a given format out of Batch Folder content.

  • For example, the PDF Format can be configured to build a PDF file from the Batch Folder's image content (using the images of its child Batch Pages) or its file content (its attached PDF) and the full text data content obtained from OCR.

Both Data and Files

When exporting document content, there are a variety of ways to export both data and files.

  1. Commonly, if you want to export both data and files, you will simply add multiple Export Definitions.
    • A Data Export to export data to a database and one of the other Export Types to export files to a storage repository of your choice.
  2. You can also export data as a file itself, such as an XML, JSON, or text data file.
    • These are Export Format options available when configuring CMIS Export, File Export, FTP Export, or SFTP Export.
    • We will detail all Export Formats below.
  3. Depending on the CMIS Repository you're exporting to, you can use 'CMIS Export to export both files and data using CMIS Export only.
    • This will be highly dependent on the capabilities of the storage platform (and the CMIS Binding used to connect to it).
    • For more information, please visit the CMIS Export article.
  4. Grooper's PDF Data Mapping behavior more fully leverages the capabilities of the PDF format when exporting PDF files. With this functionality, you can store classification and extraction data via bookmarking and PDF metadata.
    • This allows you to store most, if not all, document content in the PDF file itself.
    • For more information, please visit the PDF Data Mapping article.

Export Formats

When exporting content to an export destination you must determine what format that content takes. There's all different types of files out there. Some of them are better suited to house different types of content than others. XML files are great for storing data, but not so much for image content. TIF files are great for image content, but not so much for full text data.

Export Formats define what file types are exported by an Export Behavior's Export Definition. They are used to generate output files from document content (including files attached to the Batch Folder, child pages, and extracted data).

Export Formats can be used to generate a variety of file types including:

  • PDFs and TIFFs from document images
  • JSON, XML, CSV and TXT metadata files from extracted Data Model data
  • ZIP archives of file attachments/images from descendant Batch Folder/Batch Pages

Export Formats summarized

There are currently ten (10) Export Formats. They can be divided into 3 categories:

  • Merge Formats
  • Metadata Formats
  • Other Formats

Merge Formats

These formats generate an output file by merging content from a Batch Folder's children (and sometimes content stored on the Batch Folder itself) into a single file. Merge Formats combine multiple Batch Pages or other children of a Batch Folder into a single file, such as a multipage PDF or TIF.

  • Merge Formats are used by both the Export activity and the Merge activity.
    • When added to an Export Definition in an Export Behavior, Export will build the file and export it to the folder destination configured in the Export Definition.
    • When configured for the Merge activity, Merge will build the file and attach it to the Batch Folder.
  • XML Format is an outlier in this category. It does not generate the file by merging content from the Batch Folder's children. Instead, it generates an XML file from extracted Data Model data stored on the Batch Folder. XML Format is in the Merge Format category simply because the Merge activity can use it.
PDF Format
This will output a PDF file from the Batch Folder content (its child Batch Pages and in some configurations other content). This includes capabilities to embed full text data obtained from the Recognize activity.
TIF Format
This will output a multipage TIF file using the Batch Folder's child Batch Page images content.
XML Format
This will output extracted Data Model values to an XML file and format it according to an XSD schema file.
  • XML Format is designed to be used with Data Models generated by the XML Schema Importer using an XSD schema file.
  • XML Format differs from the XML Metadata format. XML Format creates an XML file that conforms to an XSD schema file selected by the user. XML Metadata creates an XML file that conforms to a schema defined by Grooper.
ZIP Format
This will output a ZIP file containing the file attachment for all descendent nodes.

Metadata Formats

These formats generate an output file containing metadata extracted from a Grooper document. The Metadata Formats build various text-based files from the Batch Folder's extracted Data Model and its fields.

  • When noted below, certain formats can only output single-instance Data Fields and their values (not Data Table values or values in multi-instance Data Sections).
Delimited Metadata
This will output extracted Data Model values to a value-delimited text file. You can choose the file extension and delimiter in its configuration. You can configure this to make comma separated value (CSV) files.
  • Only single instance Data Fields are output.
Simple Metadata
This will output extracted Data Model values to a text file. This file formats Data Fields and their values as simple "key-value pairs".
  • Only single instance Data Fields are output.
JSON Metadata
This will output extracted Data Model values to a JSON file. :The JSON layout can be "Simple" or "Full"
  • Full - This is a detailed JSON file that includes values, location data, confidence scores and more. This is the entire "Grooper.DocumentData.json" file generated for each document when Extract runs.
  • Simple - This is a compact JSON file that includes values only. This is preferable for users who just want a simple JSON file with the values Grooper collected from a document.
XML Metadata
This will output extracted Data Model values to an XML file.
  • The XML includes additional information collected for each "data instance", including a value's page location data.
  • The XML Metadata format differs from the "XML Format". XML Format creates an XML file that conforms to an XSD schema file selected by the user. XML Metadata creates an XML file that conforms to a schema defined by Grooper.

Other Formats

There are 2 Export Formats that do not fit into the other categories:

  • Attached File
  • Text Format
Attached File
This will output a Batch Folder's main "attachment file" or an attached file by name.
  • For files that were imported from a digital source, the attachment file is the file attached to the Batch Folder when it was created on import.
  • This option can also output any file attached to a Batch Folder by referencing a filename. This is how Grooper exports files generated by activities such as XML Transform, Text Transform, Merge or custom scripted activities.
  • If the Batch Folder has no attachment, this option will generate an image version of the document from all child Batch Pages in the folder.
Text Format
This will output full text content only, generated from OCR data, as a text file.

How to add Export Formats

Export Formats can be configured for any of the Export Definitions that export files (all Export Definitions except Data Export).

  1. From an Export Behavior's Export Definitions editor, select the Export Definition you wish to configure.
  2. Find the "Export Formats" property.
    • For some Export Definitions (like CMIS Export), you will need to configure some required properties before the Export Formats property appears.
  3. Open the Export Formats editor (Press the "..." button)..
  4. To add an Export Format, press the "Add" button.
  5. Select the Export Format you wish to use from the dropdown list.
  6. If necessary, you can add multiple Export Formats by pressing the Add button again.

Export Formats detailed

Below we will briefly describe each Export Format to give you a better idea of the files they create.

Merge Formats

PDF Format

The PDF Format will output a PDF file from the Batch Folder content. This will be either:

  • From an an imported PDF file attached to the Batch Folder on import.
  • Or more commonly, the Batch Folder's child Batch Pages. The PDF's pages are generated from each child Batch Page.

The PDF Format can also embed text data into image-based pages. This is how Grooper creates text searchable PDFs using OCR text obtained from the Recognize activity.

  • To do this, under PDF Format's "Build Options", turn "Searchable" to "True".

Another important property to note is the "Always Build" property.

  • This property will force the PDF file to be generated, even if there is a PDF file attached to the Batch Folder.
  • Enabling this property is important when using the Split Pages activity on imported PDF files. Turning "Always Build" to "True" will stitch together a new PDF file from the processed Batch Pages, rather than exporting the PDF file attached to the Batch Folder.
  • Enabling this property is important when using the PDF Data Mapping behavior. This will ensure PDF Data Mapping will always build a new PDF file and export it, rather than an imported PDF file that is attached to the Batch Folder.

Click here for an interactive walkthrough

Example Output

This will export a PDF file, according to the PDF Format's property grid settings.

You can see here, we have text-behind generated from the Searchable property. We can select this text with our cursor.

  • BE AWARE!! Text data can ONLY be embedded behind images (either image formats like JPEG or TIF or single-image PDF pages). Enabling the Searchable property will only embed searchable text behind image-based pages.
    • Text data obtained from a native-text PDF pages CANNOT be embedded into a native-text PDF page.
    • You would have to convert the native-text PDF page into an image first to embed the text data. This is atypical, but necessary when dealing with poorly formed or corrupt PDFs.


TIF Format

The TIF Format will output image content only as a TIF (Tagged Image Format) file.

  • TIF is a format used to store high quality raster graphics for graphic design or publishing.
  • Keep in mind this is an image only format. If you want text-behind embedded in your files, you must use the PDF Format.

Click here for an interactive walkthrough

Example Output

XML Format

"XML Format" will output extracted Data Model values to an XML file.

XML Format and XML Metadata both generate an XML file from a document's Data Model. The difference is in how the XML's schema is formatted.

  • XML Metadata uses a set schema developed by Grooper.
  • XML Format allows users to select an XSD schema file.

XML Format allows users to format the data Grooper collects to whatever schema they want without transforming the Grooper XML schema using XML Transform or XML transformations outside of Grooper.

Click here for an interactive walkthrough

ZIP Format

The ZIP Export Format enables you to export multiple documents as a ZIP file. A single ZIP file will be generated containing the file attachments for all descendent Batch Folders.

Click here for an interactive walkthrough

Metadata Formats

Simple Metadata

The Simple Metadata format will output extracted Data Field values to a text file.

  • This file formats Data Fields and their values as simple "key-value pairs".
  • The Data Field's name is the key. Its value extracted from the document is the value.
  • The keys-value pairs are separated by a delimiter, which is "=" buy default.
    Ex: fieldName=fieldValue
  • Only single instance Data Fields are output.

Click here for an interactive walkthrough

Example Output

As you can see here, Data Fields are exported to a text file as a simple list of key-value pairs.

  1. Data Field names are on the left.
  2. The Delimiter character (an equals sign by default) is in the middle.
  3. Extracted values are on the right.


Delimited Metadata

The Delimited Metadata format outputs extracted Data Field values to a character delimited text file.

  • This formats Data Field values as a delimiter-separated value array (i.e value1,value2,value3).
  • Use the "Text Extension" property to choose a file extension. TXT is the default.
  • Use the "Delimiter" property to define the character. This is a comma (,) by default.
  • Use the "Delimiter Escape" property to replace a delimiter in the Data Field's value with a different character (Ex: swap a comma for a semicolon).
  • Use the "Include Header" property to include a header row in the file. The header row is populated with the Data Field's names.
  • Only single instance Data Fields are output.

Click here for an interactive walkthrough

Example Output

  1. Extracted Data Field values are exported as a comma-delimited list of values in a text file.
  2. Note since we enabled Include Header, we have a header row of our Data Field names output as well.


JSON Metadata

The "JSON Metadata" format will output extracted Data Model values to a JSON file. The JSON layout can be "Simple" or "Full"

    • Full - This is a detailed JSON file that includes values, location data, confidence scores and more. This is the entire "Grooper.DocumentData.json" file generated for each document when Extract runs.
    • Simple - This is a compact JSON file that includes values only. This is preferable for users who just want a simple JSON file with the values Grooper collected from a document.

Click here for an interactive walkthrough

XML Metadata

The XML Metadata format will output extracted Data Model values to an XML file. The XML uses an XML schema developed by Grooper do detail information about extracted "data instances", including values, page location information, confidence scores, and more.

  • Document level information (including the Batch Folder's classified Document Type) are found in the <Document> tag.
  • Data Field values and information are found in the <Field> tags.
  • Extracted table values are found in <TableCell> tags as children of <TableRow> and <Table> parent tags.

Click here for an interactive walkthrough

Example Output

  1. Document level information (including the Batch Folder's classified Document Type) are found in the <Document> tag.
  2. Data Field values and information are found in the <Field> tags.
  3. Extracted table values are found in <TableCell> tags as children of <TableRow>

FYI

XML data can be reformatted using XSLT style sheets using the XML Transform activity.


Other Formats

Attached File

Several files are attached to Batch Folders throughout its lifecycle in Grooper.

  • When a file is imported by an Import Provider, the file is attached to the Batch Folder created in Grooper. This is the Batch Folder's "attachment file".
  • When Extract runs, a "Grooper.DocumentData.json" file is attached to the Batch Folder.
  • When layout data is collected a "Grooper.LayoutData.json" file is attached to the Batch Folder.
  • When activities like XML Transform, Text Transform, or Merge generate files, they are attached to the Batch Folder.
    • Be aware, Merge can be configured to replace a Batch Folder's attachment file or attach an additional file.

When configuring the Attached File format, there is only one configurable property: Filename

  • Leaving this property blank will export the Batch Folder's attachment file.
    • If the Batch Folder has no attachment, Grooper will generate an image version of the document from all child Batch Pages in the folder.
  • Use the Filename property to reference a file attached to a Batch Folder by name. The referenced file will be exported instead of the attachment file.
    • This is how Grooper exports custom generated files from activities such as XML Transform, Text Transform, Merge, or custom scripted activities.

Click here for an interactive walkthrough

More on the Attached File format

When using the "Attached File" format, be aware of two common scenarios involving a Batch Folders main "attachment file".

Common scenario 1: In Batch Processes where a Merge step runs before Export, this will export the file generated by Merge.
  • This is typically ideal.
  • In typical configurations, the Merge activity replaces the Batch Folder's main attachment file"(if present) with the PDF or TIF it generates. So, no further Export Format configuration is required to export a Grooper generated PDF or TIF document.
Common scenario 2: In Batch Processes where a Batch is created by an Import Job and a Merge step is not present, this will export whatever file was imported at the start of the Batch Process.
  • This is not always ideal.
  • When an Import Job imports a file into Grooper, a Batch Folder is created and the file is attached to it. This is the Batch Folder's main attachment file.
    • If you simply want to export the same file you imported to export destination, no further Export Format configuration is required.
    • However, if you want to export a new file generated by Grooper you will need to (1) delete the Attached File format and (2) add one of your choosing (most typically a PDF Format).
Text Format

The Text Format will output full text content only, generated from OCR data, as a text (TXT) file.

  • This is the same data you would see in a Batch Folder's "Text Rendition" in the Document Viewer.
  • Text data generated by the Recognize activity is used to create the file. The "Grooper.Characters.txt" file is used to build the file.
    • Technically, a text file would be generated from either OCR data or native text data.


Click here for an interactive walkthrough

Example Output

Upon export, this will generate a text file from the Batch Folder's raw OCR text data, generated from the Recognize activity.

Local and Shared Export Behaviors

THIS SECTION IS ONLY DOCUMENTED FOR COMPLETENESS. WE DO NOT ADVICE ADDING EXPORT BEHAVIORS LOCAL TO AN EXPORT ACTIVITY

While it is technically possible to add Export Behaviors local to the Export activity itself, it is not our advice to do so.

It is our best practice advice to add Export Behaviors and manage them from Content Types.

  • It is easier to manage Export Behaviors from the Content Type they are applied to.
  • Even when adding Export Behaviors local to the Export activity itself, you still have to reference a Content Type the behavior applies to.
  • The only reason why an option exists to configure Export Behaviors local to the Export activity is to allow older versions of Grooper to upgrade to newer versions appropriately.
  • As you will see, it gets very confusing once you start adding Export Behaviors to both Content Types and local to an Export step.

Remember, Export Behaviors can be configured in one of two ways:

  1. Using the Behaviors property of a Content Type object
  2. As part of the Export activity's property configuration

In general, most users will choose to do one or the other. This may just be as simple as what your preference is.

  • BE AWARE: It is our best practice advice to configure new Export Behaviors on Content Types (i.e. the Content Model and/or its child Content Categories and Document Types).
  • BE AWARE: If you have upgraded from a version of Grooper before Export Behaviors, your Export Behavior configurations will be on the Export activity.


IN EXCEPTIONALLY RARE CIRCUMSTANCES you may configure Export Behaviors on BOTH a Content Type AND local to the Export activity.

Grooper needs to understand which one should take priority preference, or if both should execute in one way or another. This can accommodate more complex exports, but there are different ways you can define how 'Export Behaviors are shared between Content Types and local Export activity configurations.

This is what the Shared Behavior Mode property of the Export activity is for. It defines how "local" and "shared" Export Behaviors are executed when the Export activity exports Batch Folders in a Batch.

  • BE AWARE: The Shared Behavior Mode is only required for the most complex of export operations. In the vast majority of cases, you will not need to bother yourself with this setting. you only need to concern yourself with this setting if you must configure both "local" and "shared" Export Behaviors.

Local Behaviors vs Shared Behaviors

Local Behaviors

"Local behaviors" are Export Behaviors configured in an Export activity's local property grid.

Shared Behaviors

"Shared behaviors" are Export Behaviors configured for a Content Type object (Content Models, Content Categories, and Document Types), using its Behaviors property.

Shared Behavior Mode

The Shared Behavior Mode is only exposed only if an Export Behavior local to the Export activity is configured. It is configured in the Export activity's property grid.

It can be set to one of the following values:

  • None - Only the Export step's locally configured Export Behaviors will execute.
    • NO shared Export Behaviors configured on Content Types will execute.
  • LocalOrShared - The Export activity will execute all local Export Behaviors first.
    • Shared Export Behaviors only execute if none are specified for that Content Type in the local Export Behaviors.
  • SharedOrLocal - The Export activity will execute all shared Export Behaviors first.
    • Local Export Behaviors only execute if none are specified for a Content Type's shared Export Behaviors.
  • LocalAndShared - Both sets of Export Behaviors will execute, but local Export Behaviors are executed first.
  • SharedAndLocal - Both sets of Export Behaviors will execute, but shared Export Behaviors are executed first.



Imagine you have both "shared" and "local" Export Behaviors for two Document Types: "Orange" and "Red"

Shared Behaviors (Configured with the "Orange" and "Red" Document Type's set of Behaviors properties)

  • The "Orange" Document Type's Export Behavior will export a PDF file to a folder named "Folder A".
  • The "Red" Document Type's Export Behavior will export a PDF file to a folder named "Folder A".


Local Behaviors (Configured with the Export activity's local property grid)

  • The "Orange" Document Type's Export Behavior will export an XML file to a folder named "Folder B"
  • There is no Export Behavior configured for the "Red" Document Type.

Depending on how you configure the Shared Behavior Mode property, you're going to end up with different results for your export.

"Local Export Behavior" unconfigured

With the Export Behaviors property unconfigured the Shared Behavior Mode will remain hidden. Only the Document Type's shared behaviors will execute.

  • The "Orange" documents would export as a PDF file and be placed in "Folder A".
  • The "Red" documents would export as a PDF file and be placed in "Folder B".

"Local Export Behavior" configured / "Shared Behavior Mode" set to 'None'

With local Export Behaviors and the Shared Behavior Mode set to None, only the local Export Behaviors will execute.

  • The "Orange" documents would export as an XML file and be placed in "Folder B".
  • No files would export for the "Red" documents. There is no local Export Behavior for this Document Type.

"Local Export Behavior" configured / "Shared Behavior Mode" set to 'SharedOrLocal'

With the Shared Behavior Mode set to SharedOrLocal, shared behaviors will execute first. Local behaviors will only execute if no shared behavior is present.

Only our shared Export Behaviors execute in this instance. Grooper doesn't even bother to look at the local Export Behavior for the "Orange" and "Red" Document Types because shared Export Behaviors are present for these Document Types.

  • The "Orange" documents would export as a PDF file and be placed in "Folder A" (and not an XML file in "Folder B".
  • The "Red" documents would export as a PDF file and be placed in "Folder A".

"Local Export Behavior" configured / "Shared Behavior Mode" set to 'LocalOrShared'

With the Shared Behavior Mode set to LocalOrShared, local behaviors will execute first. Shared behaviors will only execute if no local behavior is present.

In our case, no local Export Behavior is present for "Red" documents, but there is a shared Export Behavior configured on the "Red" Document Type.

  • So, the "Orange" documents export an XML file to "Folder B", using the local behavior.
  • However, the "Red" documents export a PDF file to "Folder A", using the shared behavior since no local behavior was found.

"Local Export Behavior" configured / "Shared Behavior Mode" set to 'LocalAndShared' or 'SharedAndLocal'

For the And modes (LocalAndShared and SharedAndLocal) both local and shared Export Behaviors execute. The only difference is which one executes first.

  • For LocalAndShared local behaviors execute first.
  • For SharedAndLocal shared behaviors execute first.

In our case, we end up with the same result either way.

  • The "Orange" documents export a PDF file to "Folder A", using the shared behavior, and they export an XML file to "Folder B", using the local behavior.
  • The "Red" documents export a PDF file to "Folder A", using the local behavior.



'LocalAndShared' vs 'SharedAndLocal'

Now, you should be asking yourself "If the result was the same for both LocalAndShared and SharedAndLocal, why even have two different options?"

That's not always going to be the case. And this is where things can get tricky with Shared Behavior Modes.

Issues can occur if you are exporting the same file type, with the same name, to the same folder with both shared and local Export Behaviors. If both Export Behaviors are configured to export a PDF Format file, for example, but with different File Format configurations, you could end up with a situation where one behavior overwrites the other's export. This may be what you want to do. It may not. Just be aware since both Export Behaviors execute, either the local or shared behavior can potentially overwrite whichever one exported a file first.

Imagine our shared and local behavior configurations were a little bit different.


Shared Behaviors

  • The "Orange" Document Type's Export Behavior will export a PDF file to a folder named "Folder A".
  • The "Red" Document Type's Export Behavior will export a PDF file to a folder named "Folder A".


Local Behaviors

  • The "Orange" Document Type's Export Behavior will export an XML file to a folder named "Folder B"
  • The "Red" Document Type's Export Behavior will also export a PDF file to "Folder A".
    • However, it has a different Export Format configuration, indicated by "v2" in the diagram.


With these configurations and Export's Shared Behavior Mode set to SharedAndLocal, we would end up overwriting a file. First, the shared behaviors would execute, then the local behaviors would execute. For the "Red" documents, the shared behavior would export its version of a PDF, then the local behavior would export its version of a PDF. If the two files, both the same PDF file type, share the same name, the default configurations will overwrite existing files in a folder location.

  • So, the "Orange" documents export a PDF file to "Folder A", using the shared behavior, and they export an XML file to "Folder B", using the local behavior.
    • This is the same result as demonstrated previously.
  • The difference is for the "Red" documents.
    • The shared behaviors would execute first, exporting a PDF file to "Folder A". Then, the local behaviors would execute, exporting the local Export Behavior's configuration of the PDF (version "2"), overwriting the first export, leaving you with the local behavior's PDF format.

Be aware of this possibility when configuring SharedAndLocal or LocalAndShared exports. If the file names, types and export folder locations are the same, you may end up overwriting a file. If this is your intention, great! If not, you will need to ensure the file names for the files generated by shared and local behaviors are unique to avoid one file being overwritten.

Processing Queue Guidance

"Code Activities" in a Batch Process can be automated using an Activity Processing Grooper service. The Activity Processing service will act like a Windows service and automatically start tasks in a Batch, as processing threads in your system's resources become available. This is one of the ways Grooper leverages your system resources for parallel processing.

Imagine you're running Grooper on a machine with eight (8) processing threads. If you have a Batch with five (5) Batch Folders, and each one is on the Recognize step of the Batch Process, there's no need for your system to process each Batch Folder sequentially (with each Batch Folder waiting to be processed until the one before it is finished).

  • You have 8 threads and 5 Batch Folders in this scenario.
  • Each one of those threads can process one Batch Folder as a single task.
  • With 8 available threads, all 5 Batch Folders could be processed concurrently by 5 individual threads.
  • This is multi-threaded Activity processing.


When automating Export steps in a Batch Process, you may need to execute the activity single threaded.

Depending on which external storage system you're exporting to, you may run into errors if you attempt to run the Export activity multi-threaded.

  • This may be due to a storage platform limiting the number of concurrent connections to the repository.
    • For example, licensing limitations for the platform may restrict how many connections can be made to the repository at a time (as is the case for ApplicationXtender).
  • This may be a self-imposed throttle to avoid network/latency related errors when uploading to cloud based platforms (such as Box.com or Microsoft SharePoint).
  • This may otherwise be required or preferable for platforms whose file transfer protocol expects users to upload files one at a time.
    • If you have 5 threads all attempting to upload 5 different Batch Folders from the same machine, 4 of those Batch Folders are going to kick back to Grooper in an error state in this scenario.


For scenarios like these, it is preferable to run the Export activity single-threaded, ensuring only one Batch Folder is processed at a time. As well as automating Batch Processing activities, Activity Processing services allow you to control thread resources by assigning activities a Processing Queue and limiting the number of maximum threads available for that Processing Queue.

Next, we will show you how to create a single threaded Processing Queue for an Export activity, and set up an Activity Processing service that utilizes it. This will effectively throttle your export, so Batch Folders are indeed only exported one at a time, avoiding any issues with external platforms that cannot handle multi-threaded exports.

1. Add a Processing Queue

The first thing you'll need to do is add a Processing Queue object. A Processing Queue defines the "bucket" of threads available to one step or another in a Batch Process. In our case, this will allow us to limit the number of threads the Export step uses to a single thread.

Click here for an interactive walkthrough

To add a Processing Queue:

  1. Right-click the Queues folder in the Node Tree.
  2. Select "Add".
  3. Select "Processing Queue"
  4. This will bring up a new window to name the Processing Queue. Enter a name.
    • We named ours "Export Throttle"
  5. Select "Execute."
  6. This will add a new Processing Queue object to the Node Tree.
  7. FYI: No further object configuration is technically required at this point.
    • However, if you want the safest implementation of a single-threaded Processing Queue, totally ensuring only a single Export task is processed per repository environment, you can change the Concurrency Mode property from Multiple to Single. With the Single mode, only a single task will run per Grooper repository.

2. Assign the Processing Queue

Next, we need to tell our Batch Process which step should use our new Processing Queue.

Click here for an interactive walkthrough.

  1. By default, all Batch Process steps use the "Default" Processing Queue.
  2. In the Batch Step property grid, Processing Queues are assigned with the Processing Queue property.



We want to tell the Export step of this Batch Process to use a different Processing Queue, the new one we just created.

  1. Select the Export step in the Batch Process.
  2. Select the Queue Name property.
  3. Using the dropdown menu, select the Processing Queue you wish to use.
    • In our case, the "Export Throttle" Processing Queue.

|

3. Configure an Activity Processing Service

On to Grooper Command Console! Grooper services are installed and edited using Grooper Command Console. Open Grooper Command Console to install a new Activity Processing service.

Grooper Command Console must be run as an administrator to install and edit services.

  1. In Grooper Command Console enter the following command:
services install <connectionNo> <typeName> <userName> <password> [threadCount] [queueName]
  • <connectionNo> (required):
  • Replace this with the integer representing the appropriate connection. Use the connections list command to get a list of your connections.
  • <typeName> (required):
  • Since we are installing an Activity Processing service, replace this with ActivityProcessing.
  • <userName> and <password> (required):
  • Replace these with the appropriate Active Directory credentials of your Grooper Service User.
  • [threadCount] and [queueName] (optional):
  • If you want to specify a specific thread count, replace [threadCount] with an appropriate integer. Setting this to an integer of "1" will specify this service to only use a single procssing thread.
  • If you want to specify a queue name, replace [queueName] with the name of an appropriate Processing Queue object. Enter the name here of the Processing Queue that was created in the Queues folder object on the "Design" page.