CMIS Import (Import Provider): Difference between revisions

From Grooper Wiki
No edit summary
Line 3: Line 3:
<blockquote>{{#lst:Glossary|CMIS Import}}</blockquote>
<blockquote>{{#lst:Glossary|CMIS Import}}</blockquote>


Documents are imported from '''CMIS Connections''' using either the '''''Import Descendants''''' or '''''Import Query Results''''' providers.  These can be used in two ways:
Documents are imported from '''CMIS Connections''' using either the Import Descendants or Import Query Results providers.  These can be used in two ways:


* To perform manual "ad-hoc" imports when creating a new '''[[Batch]]''' on the "Imports" page.
* To perform manual "ad-hoc" imports when creating a new '''[[Batch]]''' on the "Imports" page.
* To perform automated, scheduled imports using one or more '''Import Watcher''' Grooper services.
* To perform automated, scheduled imports using one or more '''Import Watcher''' Grooper services.


'''''Import Descendants''''' will import all documents within a designated folder location of a '''[[CMIS Repository]]'''.  '''''Import Query Results''''' allows you to use a query syntax similar to a SQL query (called a CMISQL query) to set conditions for import based on the item's available metadata, such as a documents name, file type, creation date, archive status, or other variables.
Import Descendants will import all documents within a designated folder location of a '''[[CMIS Repository]]'''.  Import Query Results allows you to use a query syntax similar to a SQL query (called a CMISQL query) to set conditions for import based on the item's available metadata, such as a documents name, file type, creation date, archive status, or other variables.
 
<u><big>'''SharePoint'''</big></u>: {{#lst:Glossary|SharePoint}}
 
{{#lst:CMIS+ (Concept)|cmisplus}}
{{#lst:CMIS+ (Concept)|cmisplus}}


== About CMIS Import ==
== About CMIS Import ==
The '''''CMIS Import''''' provider is split into two different '''''Import Providers'''''
The CMIS Import provider is split into two different Import Providers


* '''''Import Descendants'''''
* Import Descendants
* '''''Import Query Results'''''
* Import Query Results


These providers are designed to import files from a folder structure of an on-premise or cloud-based document storage platform.  This is the primary method of '''Batch''' creation when importing digital documents into Grooper to process them with a '''Batch Process'''.   
These providers are designed to import files from a folder structure of an on-premise or cloud-based document storage platform.  This is the primary method of '''Batch''' creation when importing digital documents into Grooper to process them with a '''Batch Process'''.   
Line 25: Line 22:


# A '''CMIS Connection''' object must made and configured.  This will connect Grooper to the document storage platform.
# A '''CMIS Connection''' object must made and configured.  This will connect Grooper to the document storage platform.
#* This may be a connection to a Windows folder, an email inbox, a true CMIS content management system, or other document storage platforms.  What the '''CMIS Connection''' connects to is determined by the '''''CMIS Binding''''' selected when configuring the '''''Connection Type''''' property of the '''CMIS Connection''' object.
#* This may be a connection to a Windows folder, an email inbox, a true CMIS content management system, or other document storage platforms.  What the '''CMIS Connection''' connects to is determined by the CMIS Binding selected when configuring the Connection Type property of the '''CMIS Connection''' object.
# A '''CMIS Repository''' must be imported.  This will create an object Grooper can use to import documents from the folders in the document storage platform.
# A '''CMIS Repository''' must be imported.  This will create an object Grooper can use to import documents from the folders in the document storage platform.
#* This acts as a "go-between" or a "hub" for Grooper to pull in documents from the content's source. Or, you may think of this as Grooper's representation of a folder location in the document storage platform.
#* This acts as a "go-between" or a "hub" for Grooper to pull in documents from the content's source. Or, you may think of this as Grooper's representation of a folder location in the document storage platform.
Line 31: Line 28:
For more information on adding a '''CMIS Connection''' and importing a '''CMIS Repository''', visit the '''[[CMIS Connection]]''' article.
For more information on adding a '''CMIS Connection''' and importing a '''CMIS Repository''', visit the '''[[CMIS Connection]]''' article.


As for the difference between the '''''Import Descendants''''' and '''''Import Query Results''''' providers, you can think of '''''Import Query Results''''' as a more specialized version of '''''Import Descendants'''''.
As for the difference between the Import Descendants and Import Query Results providers, you can think of Import Query Results as a more specialized version of Import Descendants.


* '''''Import Descendants''''' is intended to import the full contents of a folder location.  It imports the "descendant" files of a parent folder.
* Import Descendants is intended to import the full contents of a folder location.  It imports the "descendant" files of a parent folder.
* '''''Import Query Results''''' allows you to selectively import files using a SQL-like query (called a CMISQL query).  Only files returned by the query will be imported.  For example, using an '''''Exchange''''' or '''''IMAP''''' '''CMIS Connection''', you could query an inbox for emails from a specific sender and only import those emails.
* Import Query Results allows you to selectively import files using a SQL-like query (called a CMISQL query).  Only files returned by the query will be imported.  For example, using an Exchange or IMAP '''CMIS Connection''', you could query an inbox for emails from a specific sender and only import those emails.
** Note:  There are some import filtering capabilities available to '''''Import Descendants''''' as well using a SQL-like query.  However, the CMISQL querying capabilities of '''''Import Query Results''''' are much more robust.   
** Note:  There are some import filtering capabilities available to Import Descendants as well using a SQL-like query.  However, the CMISQL querying capabilities of Import Query Results are much more robust.   
** That said, only certain '''''CMIS Bindings''''' can take advantage of this increased CMISQL query functionality.  The following '''''CMIS Bindings''''' are '''''not''''' currently suitable for the '''''Import Query Results''''' provider.
** That said, only certain CMIS Bindings can take advantage of this increased CMISQL query functionality.  The following CMIS Bindings are not currently suitable for the Import Query Results provider.
*** '''''FTP'''''
*** FTP
*** '''''SFTP'''''
*** SFTP
*** '''''NTFS''''' (If the folder path is not indexed by the Windows Search service and/or Windows Search is not running on the storage server)
*** NTFS (If the folder path is not indexed by the Windows Search service and/or Windows Search is not running on the storage server)


=== Import Descendants ===
=== Import Descendants ===
Line 50: Line 47:
[https://app.supademo.com/demo/cm8ddjcr91rb92ugqp7k9phqo Click here for an interactive walkthrough]
[https://app.supademo.com/demo/cm8ddjcr91rb92ugqp7k9phqo Click here for an interactive walkthrough]


Back top the '''''Import Descendants''''' configuration screen, the '''CMIS Repository''' object is used to point Grooper to this folder location for import.
Back top the Import Descendants configuration screen, the '''CMIS Repository''' object is used to point Grooper to this folder location for import.


*The '''''Repository''''' property is configured to assign the '''CMIS Repository''' where the documents are located.
*The Repository property is configured to assign the '''CMIS Repository''' where the documents are located.
** Here the '''CMIS Repository''' named "Import and Export" connecting to the "Import and Export" folder of the local drive.
** Here the '''CMIS Repository''' named "Import and Export" connecting to the "Import and Export" folder of the local drive.


*The '''''Base Folder''''' property is configured to traverse the folder structure of the '''CMIS Repository'''.
*The Base Folder property is configured to traverse the folder structure of the '''CMIS Repository'''.
** Here, we don't want to import ''all'' documents from ''every'' folder in the "Import and Export" folder.  We just want to import from the "Grooper Import Folder".
** Here, we don't want to import ''all'' documents from ''every'' folder in the "Import and Export" folder.  We just want to import from the "Grooper Import Folder".


*The '''''Import Filter''''' property allows you to perform some basic import filtering to selectively choose which documents you want to import.
*The Import Filter property allows you to perform some basic import filtering to selectively choose which documents you want to import.
** <code>SELECT * FROM File</code> is the default filter.  It will import all files from the selected folder location.
** <code>SELECT * FROM File</code> is the default filter.  It will import all files from the selected folder location.
** This is a SQL-like query to specify conditions for document import.   
** This is a SQL-like query to specify conditions for document import.   
**<li class="attn-bullet"> BE AWARE: '''''Import Descendants''''' has limited filtering compared to e '''''Import Query Results'''''.
**<li class="attn-bullet"> BE AWARE: Import Descendants has limited filtering compared to e Import Query Results.
*** '''''Import Query Results''''' was created to expand on this functionality. It provides more filtering options for the '''''CMIS Connection Types''''' supported by '''''Import Query Results'''''.
*** Import Query Results was created to expand on this functionality. It provides more filtering options for the CMIS Connection Types supported by Import Query Results.
*** '''''Import Descendants''''' DOES NOT support the "IN_FOLDER" or "IN_TREE" predicates. '''''Import Descendants''''' will ''always'' import ''all'' documents in ''all'' subfolders from the base folder.
*** Import Descendants DOES NOT support the "IN_FOLDER" or "IN_TREE" predicates. Import Descendants will ''always'' import ''all'' documents in ''all'' subfolders from the base folder.
*The '''''Content Type''''' property allows you to optionally assign the incoming documents with a '''Document Type'''.
*The Content Type property allows you to optionally assign the incoming documents with a '''Document Type'''.
** You can use this property to assign a default classification for all incoming documents.
** You can use this property to assign a default classification for all incoming documents.


[[File:2023_CMIS_Import_02_About_CMIS_Import_02_1_Import_Descendants_General_Settings_02(1).png]]
[[File:2023_CMIS_Import_02_About_CMIS_Import_02_1_Import_Descendants_General_Settings_02(1).png]]


==== Processing Options Settings ====
=== Import Query Results ===
The most important part of the '''''Processing Options''''' property section is the '''''Import Mode''''' property.
==== The Same, But Different ====
The Import Query Results provider's configuration panel is almost identical to the Import Descendants provider's configuration panel.  ''Both'' providers share the same Processing Options, Disposition, and Batch Creation property settings.  See the [[#Import Descendants|Import Descendants]] section for brief descriptions of these property sections.


The '''''Import Mode''''' property allows control over the connections Grooper makes and/or retains to the imported documents.


For importing, documents contain two important sets of information:
The big difference between the two providers is the highlighted CMIS Query property.  This allows users to enter a SQL-like query (called a CMISQL query) to selectively import documents from their source, based on certain metadata properties.  Only files returned by the query will be imported. 
* For example, you may want to only import documents of a certain file type(s).  You could include the file extension(s) as the query condition (or one of many conditions). 
* For another example, you can use CMISQL queries to easily filter email messages when importing from an inbox.  If you only wanted to import messages from a certain sender, from an certain folder, with a certain subject line and only ones that have not been read, you could filter out any emails that didn't meet those query conditions by comparing metadata properties (like "Sender" and "Subject") to your criteria.


* Content - Images and native text data
{|class="attn-box"
* Properties - Metadata associated with the file. Digital information, such as the document's filename, file type, creation date, and more.
|
'''&#9888;'''
|Only certain external storage platforms are currently queryable with the CMIS Query property. The following CMIS Binding sources cannot be queried currently.  As such, they are not suitable for Import Query Results.  You should instead use Import Descendants for the following CMIS Bindings.


Depending on the '''''Import Mode''''' selected all, some, or none of this information will be copied to your Grooper Repository's file store (in the case of the document's content) and database (in the case of the document's properties).  See below for more in depth explanation of each of the '''''Import Mode''''' options.
:&bull; FTP
:&bull; SFTP
:&bull; NTFS (If the folder path is not indexed by the Windows Search service and/or Windows Search is not running on the storage server)
|}


[[File:2023_CMIS_Import_02_About_CMIS_Import_02_1_Import_Descendants_Processing_Options_Settings_01(2).png]]
[[File:2023_CMIS_Import_02_About_CMIS_Import_02_3_Import_Query_Results_The_Same_But_Different_01.png]]




<big>Copy</big>
Just like with Import Descendants, there are some minimum requirements before configuring Import Query Results.  A '''CMIS Connection''' object must be created and a '''CMIS Repository''' must be imported.
* Both properties and content will be loaded. This is a total duplication of the document from its source to your Grooper Repository's local file store.  This is the slowest import mode, because the full content of each document is copied during a ''single-threaded'' import process. As such, this mode is not well-suited for high-volume imports, but provides some useful advantages in low-volume import scenarios.


*For example, ''Copy'' mode allows items to be deleted immediately on import. Also, ''Full'' mode avoids the need for any follow-up content loading operations in the '''Batch Process'''.
[https://app.supademo.com/demo/cm8ergggh03ij12zdakzyt3jp Click here for an interactive walkthrough]


*This mode was called ''Full'' in older versions of Grooper.


<big>Sparse</big>
*Properties will be loaded, but content will not. This mode is much faster than a ''Full'' import, because no content files are copied into your local Grooper file store. Instead, a link is saved on each Grooper document, and content is retrieved on demand directly from the '''CMIS Repository'''. This type of document is often referred to as a "sparse" document. Sparse documents can be used just like any other document, with the caveat that display and processing speeds may be reduced.  Grooper has to traverse the document link in order to display or process the document's image.


* However, after a ''Sparse'' import, document content can be loaded multi-threaded using the '''Execute''' activity in a '''Batch Process'''.  This can overall lead to importing a document's content faster than a ''Full'' import.  While the  
==== CMIS Query Configuration ====
** Choose ''CMIS Document Link'' as the '''''Object Type''''' and ''Load Content'' as the '''''Command'''''
Upon pressing the ellipsis button at the end of the CMIS Query property, the CMIS Query Editor window will appear.


<big>Link Only</big>
This interface allows you to configure the CMISQL query based on available metadata from the CMIS Binding.  For example, the Exchange binding has a selection of queryable metadata for email messages, such as the email's subject, sender and date the message was received.
* No content or properties will be loaded, making this the fastest import mode. It imports nothing more than a link to each document, and offloads all property and content loading to parallel operations in the Batch Process.


* However, this does not produce a usable document in Grooper. After a ''LinkOnly'' import, document content ''must'' be loaded using the '''Execute''' activity in a '''Batch Process'''.
[https://app.supademo.com/demo/cm8g1l4us17x112zdo9ti0ae3 Click here for an interactive walkthrough]
** Choose ''CMIS Document Link'' as the '''''Object Type''''' and ''Load Content'' as the '''''Command'''''


* You can think of the ''Link Only'' option as an even sparser sparse import.
For an in depth explanation of the CMIS Query Editor and how to use it to craft a CMISQL query, please visit the [[CMIS Query]] article.


=== Settings shared between Import Descendants and Import Query Results ====


See the table bellow for a summary of the '''''Import Mode''''' options.
The following properties/settings are part of configuring either provider.
 
{|cellpadding=10 cellspacing=5
|'''''Import Mode'''''||'''Speed'''||'''Comments'''
|-valign=top
|''Full''||Slow||Full import of content and their properties.
* Required if deleting content from the source on import.
|-valign=top
||''Sparse''||Fast||Imports a link to the document's source and its properties but not their content.
* This produces a usable document in Grooper without copying the full content into Grooper, saving time upon import.
* This mode is the same as enabling the old '''''Sparse Import''''' property in previous versions.
|-valign=top
||''Link Only''||Fastest||Only imports a link to the document's source.
*Does not produce a usable document. The document's properties must be loaded in a step in a Batch Process.
|}


==== Disposition Settings ====
==== Import Mode: Sparse ====
The '''''Disposition''''' property settings allow you to do something with the source documents after importing them into Grooper, namely delete them, move them, or do nothing and just leave them alone where they came from. This is often leveraged with the '''Import Watcher''' Grooper service to prevent repeatedly importing the same document.
[[File:2023_CMIS_Import_02_About_CMIS_Import_02_1_Import_Descendants_Processing_Options_Settings_01(2).png|right]]
Sparse imports are a way of speeding up the time it takes to import files.


In our example here, the '''''Move to Folder''''' property is configured to move the PDF documents to a folder named "Imported Documents".
Files can be imported using one of three "Import Modes":
* The folder location you're moving documents to ''must'' be accessible via the connected '''CMIS Repository'''.
* '''Copy''' - The files are fully copied to the Grooper Repository on import.
** The file's properties are stored in the Grooper Database entry for the Batch Folder.
** A copy of the file is attached to the Batch Folder and stored in the Grooper File Store.
* '''Sparse''' - Only the file properties are copied to the Grooper Repository.
** The file's properties are stored in the Grooper Database entry for the Batch Folder.
** The file content is ''not'' copied over to the Grooper File Store. Instead, it is accessed by the link attached to the Batch Folder.
* '''Link Only''' (seldom used) - Nothing is copied to the Grooper Repository. Only a link to the source file is attached to the Batch Folder.


If using the ''Full'' '''''Import Mode''''', you can enable the '''''Delete Item''''' property to delete each document after it is imported into the Grooper '''Batch'''.
* This property is ONLY available when choosing the ''Full'' '''''Import Mode.'''''  A sparsely imported document needs to call to the import storage location in order to load the document's image for display or processing.  If you deleted the document upon import, you wouldn't be able to view it or do anything with it.


The '''''Update Properties''''' property allows you to alter the document's property values upon import.  Property values are updated using a list of "key-value pairs" where the "key" is the name of the property and the "value" is what change you want to make to that property. You can type one entry per line in the format <code>key=value</code>.
<big>How does using "Sparse" speed up import?</big>
* Examples:
* <code>Archive=true</code> Sets the archive attribute on a file
* <code>Status=PENDING</code> Sets the "Status" field on ApplicationXtender documents.
* <code>Imported=true</code> Sets the "Imported" field on SharePoint documents.
* <code>IsRead=true</code> Sets the "IsRead" flag on an Exchange message.


[[File:2023_CMIS_Import_02_About_CMIS_Import_02_1_Import_Disposition_Settings_01.png]]
It increases the parallelism of the overall import operation.


Import operations must run "single threaded" in Grooper. That means regardless of how much compute your server has, it's only ever going to use a single processing thread to import files.
* When you're importing hundreds or thousands of documents by copying them from a source location to the Grooper File Store, it takes a long time for the Import Job to complete.
* By only importing a link to the file content, Sparse mode ''dramatically'' speeds up the time it takes to get a usable document into Grooper.
* Then, the first step in your Batch Process should be "Execute" using the "CMIS Document Link > Load" command. This will allow you to load the files into the Grooper File Store using multiple threads.
* The end result is the overall import operation will be as if you had used the "Copy" mode. But it will be done in a way that runs multi-threaded.
<br clear=all>
==== Batch Creation Settings ====
==== Batch Creation Settings ====
It's likely you're importing documents because you want to run them through a '''Batch Process'''.  The '''''Batch Creation''''' property settings allow you to define which '''Batch Process''' you wish to use to process the imported documents.


This is done using the '''''Starting Step''''' property, selecting a '''Batch Process Step''' in a '''Batch Process''' from the published '''Batch Processes''' in the Grooper Repository.  Upon import, a new '''Batch''' is created with each document as a '''Batch Folder''', and the selected '''Batch Process''' assigned to the '''Batch'''.
[[File:2023_CMIS_Import_02_About_CMIS_Import_02_1_Import_Descendants_Batch_Creation_Settings_01.png|right]]


There are also further properties to control '''Batch''' creation.  You can limit the number of documents imported per '''Batch''' using the '''''Maximum Items per Batch''''' property.  By default, new '''Batches''' are named with a date/time stamp.  However, the '''''Batch Name Prefix''''' allows you to tack on a prefix to the '''Batch's''' name for easier identification.  The '''''Start Paused''''' property will automatically trigger the '''Batch Process''' if set to ''False''.
The Batch Creation settings allow you to define which '''Batch Process''' you wish to use to process the imported files.


[[File:2023_CMIS_Import_02_About_CMIS_Import_02_1_Import_Descendants_Batch_Creation_Settings_01.png]]
You '''''must''''' configure the "Starting Step" property to assign the Batch Process.
* Use the property's dropdown editor to select a Batch Process Step from a list of Batch Processes.
* Only published Batch Processes will appear in this list.


=== Import Query Results ===
==== The Same, But Different ====
The '''''Import Query Results''''' provider's configuration panel is almost identical to the '''''Import Descendants''''' provider's configuration panel.  ''Both'' providers share the same '''''Processing Options''''', '''''Disposition''''', and '''''Batch Creation''''' property settings.  See the [[#Import Descendants|Import Descendants]] section for brief descriptions of these property sections.


Other notable Batch Creation properties:
* "Start Paused" - This determines if the Batch starts in a paused state or not. If "False", the first step's tasks will be automatically submitted to Activity Processing services. If "True", you will have to manually start the Batch.
* "Max Items Per Batch" - The default is 2500, meaning each Batch will have a maximum of 2500 Batch Folders before creating a new Batch on import. For users who want more Batches with fewer documents, lower this number.
* "Organize By Date" - This will organize Batches into subfolders in the Production branch in Grooper according to the year / month / day the Batch was created.
* "Priority" and "Increment Priority" - Controls the task processing priority for the Batch. "Increment Priority" is useful when submitting large user-directed imports from the Imports Page to ensure the first Batch created is the first that is fully processed by Activity Processing services.
<br clear=all>


The big difference between the two providers is the highlighted '''''CMIS Query''''' property.  This allows users to enter a SQL-like query (called a CMISQL query) to selectively import documents from their source, based on certain metadata properties.  Only files returned by the query will be imported. 
==== Disposition Settings ====
* For example, you may want to only import documents of a certain file type(s).  You could include the file extension(s) as the query condition (or one of many conditions). 
[[File:2023_CMIS_Import_02_About_CMIS_Import_02_1_Import_Disposition_Settings_01.png|right]]
* For another example, you can use CMISQL queries to easily filter email messages when importing from an inbox.  If you only wanted to import messages from a certain sender, from an certain folder, with a certain subject line and only ones that have not been read, you could filter out any emails that didn't meet those query conditions by comparing metadata properties (like "Sender" and "Subject") to your criteria.
The "Disposition" settings allow you to do something with the source files after importing them into Grooper. This is important when using an Import Watcher service to schedule imports. If you do not configure a Disposition property, the imported file will remain in the same state after the Import Job completes. This can cause the Import Watcher to repeatedly attempt to import the same file over and over again.
 
{|class="attn-box"
|
'''&#9888;'''
|Only certain external storage platforms are currently queryable with the '''''CMIS Query''''' property.  The following '''''CMIS Binding''''' sources '''''cannot''''' be queried currently.  As such, they are '''''not''''' suitable for '''''Import Query Results'''''.  You should instead use '''''Import Descendants''''' for the following '''''CMIS Bindings'''''.
 
:&bull; '''''FTP'''''
:&bull; '''''SFTP'''''
:&bull; '''''NTFS''''' (If the folder path is not indexed by the Windows Search service and/or Windows Search is not running on the storage server)
|}
 
[[File:2023_CMIS_Import_02_About_CMIS_Import_02_3_Import_Query_Results_The_Same_But_Different_01.png]]
 
 
Just like with '''''Import Descendants''''', there are some minimum requirements before configuring '''''Import Query Results'''''. A '''CMIS Connection''' object must be created and a '''CMIS Repository''' must be imported.
 
[https://app.supademo.com/demo/cm8ergggh03ij12zdakzyt3jp Click here for an interactive walkthrough]
 
 
 
==== CMIS Query Configuration ====
Upon pressing the ellipsis button at the end of the '''''CMIS Query''''' property, the '''''CMIS Query Editor''''' window will appear.
 
This interface allows you to configure the CMISQL query based on available metadata from the '''''CMIS Binding'''''. For example, the '''''Exchange''''' binding has a selection of queryable metadata for email messages, such as the email's subject, sender and date the message was received.
 
[https://app.supademo.com/demo/cm8g1l4us17x112zdo9ti0ae3 Click here for an interactive walkthrough]


For an in depth explanation of the '''''CMIS Query Editor''''' and how to use it to craft a CMISQL query, please visit the [[CMIS Query]] article.
There are three "Disposition" options:
* "Delete Item" - Turning this to "True" will simply delete the source file after the Import Job completes.
**<li class="attn-bullet"> You may only configure this option if the "Import Mode" is set to "Copy".
* "Move To Folder" - This will move the files to a different folder in the CMIS Repository after the Import Job completes.
* "Update Properties" - This will update one or more file properties after the Import Job completes. Properties are updated by listing them as "key-value pairs" in the "Update Properties" list editor where <code>key=value</code>.
** Examples:
**<code>Archive=false</code> Sets the archive attribute on each imported file to "false".
** <code>IsRead=true</code> Marks each imported email message as read.
** <code>Status=PENDING</code> Sets a "Status" field on a document in an AppEnhancer application (assuming there is a "Status" field in the application).
<br clear=all>

Revision as of 17:13, 23 May 2025

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 202320212.90

CMIS Import refers to two Import Providers used to import content from settings_system_daydream CMIS Repositories: Import Descendants and Import Query Results. CMIS Imports allow users to import from various on-premise and cloud based storage platforms (including Windows folders, Outlook inboxes, Box accounts, AppEnhancer applications and more).

Documents are imported from CMIS Connections using either the Import Descendants or Import Query Results providers. These can be used in two ways:

  • To perform manual "ad-hoc" imports when creating a new Batch on the "Imports" page.
  • To perform automated, scheduled imports using one or more Import Watcher Grooper services.

Import Descendants will import all documents within a designated folder location of a CMIS Repository. Import Query Results allows you to use a query syntax similar to a SQL query (called a CMISQL query) to set conditions for import based on the item's available metadata, such as a documents name, file type, creation date, archive status, or other variables.

About CMIS+

"CMIS" stands for "Content Management Interoperability Services". It is an open standard that allows different content management systems to inter-operate over the Internet. Grooper expanded on this idea in version 2.72 to create our "CMIS+" architecture. CMIS+ unifies all content platforms under a single framework as if they were traditional CMIS endpoints.


Now, Grooper connects to all available external storage platforms by creating and configuring a CMIS Connection.

  • Once a CMIS Connection is created, Grooper can "interoperate" with these platforms.
  • "Interoperability " means Grooper has the same access to control the system as a human being does.
  • Grooper has a "one-to-one" connection to the platform, allowing full and total control.
  • Because we standardize connection to non-CMIS systems, this includes platforms like NTFS file systems (Windows) that are not CMIS servers.


Using this architecture, Grooper is able to create a simpler and more efficient import and export workflow, using a variety of storage platforms.

  • You now use CMIS Import providers and CMIS Export for any storage platform you can connect to with a CMIS Connection.
  • This also speeds up development for adding new connection types for import/export operations.

Anatomy of a CMIS Connection

When connecting Grooper to external storage platforms, you'll start by creating a CMIS Connection. There are three important parts to understanding a CMIS Connection:

  1. The CMIS Connection itself
  2. The platform it's connecting to. This is defined by the "CMIS Binding" (aka "connection type") selected for the CMIS Connection's "Connection Settings".
  3. Its child CMIS Repositories
    • "Repository" is just a general term for a location where data lives. Different systems refer to "repositories" in different ways.
      • A folder in Windows could be a repository. An email inbox could be a repository. A document library in SharePoint could be a repository. An application in ApplicationEnhancer (formerly ApplicationXtender) could be a repository.
      • "Repository" is a normalized way of referring to various terms used by various storage platforms.


For newer users, the difference between a CMIS Connection and a CMIS Repository can be confusing. The key distinction is as follows:

  • CMIS Connections connect to storage platforms.
    • It's the phone number you dial.
    • The specific platform you're connecting to is defined in its "Connection Settings".
  • CMIS Repositories represent a location within the connected platform.
    • It's the person on the other end of that phone number you want to talk to.
    • CMIS Repositories represent storage locations (typically folders) in the storage platform. They are added as children to a parent CMIS Connection.
    • The CMIS Repository nodes are what Grooper actually uses when configuring import/export operations.
      • You don't talk to a phone number. You talk to a person.
      • You don't reference the parent CMIS Connection when configuring CMIS Import or CMIS Export. Instead you reference a CMIS Repository.

Basic creation steps

There are three basic steps involved to connect Grooper to external storage platforms:

  1. Create a CMIS Connection
  2. Configure the "Connection Settings".
    • Choose what platform you want to connect to (the CMIS Binding).
    • Enter the connection settings required to connect to the platform (This will differ from platform to platform)
  3. Add child CMIS Repositories by importing the storage locations.
    • Importing a CMIS Repository is not the same as importing documents to a new Batch.
      • "Importing" here is more like importing a reference (or bringing the repository into a framework Grooper can use).
      • Upon importing the CMIS Repository, Grooper has full file access to that location in the storage platform.

CMIS Bindings (aka "connection types")

How you configure a CMIS Connection only differs based on what platform you're connecting to. Connection settings include folder paths, URL addresses or usernames or passwords.

  • Example: Connecting to a Windows folder requires a networked folder's UNC path.
  • Example: Connecting to a SharePoint site requires a URL address.
  • Example: Connecting to a email inbox requires an server host name.
  • Example: Connecting to Application Extender, Box, SharePoint, OneDrive, Exchange (Outlook) and more requires a username and password.


Each platform has its own connection requirements. These connection settings and the logic required to interoperate between Grooper and a specific platform are defined by the different "CMIS Binding"

Each CMIS Binding provides the settings and logic to connect Grooper to CMS platforms and file systems for import and export operations.

  • Example: The "Exchange" binding contains all the information Grooper uses to connect to Microsoft Exchange email servers (i.e. Outlook inboxes).
  • Example: The "AppXtender" binding contains all the information Grooper uses to connect to the ApplicationEnhancer (formerly AppXtender) content management system.
  • Example: The "NTFS" binding contains all the information Grooper uses to connect to a Windows file system.
  • And so on.


The first step in configuring a CMIS Connection is choosing what platform you want to connect to. You do this by selecting a "CMIS Binding".

  • You will commonly hear "CMIS Binding" referred to as a "CMIS connection type" or "connection type".
  • Or just "connection", as in an "Exchange connection".

Current CMIS Bindings (aka "connection types")

Grooper can connect to the following storage platforms using below using CMIS Bindings:

Most Commonly Used

Somewhat Commonly Used

Less Commonly Used

  • FTP (File Transfer Protocol) and SFTP (SSH File Transfer Protocol) servers.
  • IMAP mail servers

Least Used

  • Content management systems using CMIS 1.0 or CMIS 1.1 servers.
  • The FileBound document management platform.
  • The IBM FileNet platform.


About CMIS Import

The CMIS Import provider is split into two different Import Providers

  • Import Descendants
  • Import Query Results

These providers are designed to import files from a folder structure of an on-premise or cloud-based document storage platform. This is the primary method of Batch creation when importing digital documents into Grooper to process them with a Batch Process.

In order to do this, a few requirements must be met first.

  1. A CMIS Connection object must made and configured. This will connect Grooper to the document storage platform.
    • This may be a connection to a Windows folder, an email inbox, a true CMIS content management system, or other document storage platforms. What the CMIS Connection connects to is determined by the CMIS Binding selected when configuring the Connection Type property of the CMIS Connection object.
  2. A CMIS Repository must be imported. This will create an object Grooper can use to import documents from the folders in the document storage platform.
    • This acts as a "go-between" or a "hub" for Grooper to pull in documents from the content's source. Or, you may think of this as Grooper's representation of a folder location in the document storage platform.

For more information on adding a CMIS Connection and importing a CMIS Repository, visit the CMIS Connection article.

As for the difference between the Import Descendants and Import Query Results providers, you can think of Import Query Results as a more specialized version of Import Descendants.

  • Import Descendants is intended to import the full contents of a folder location. It imports the "descendant" files of a parent folder.
  • Import Query Results allows you to selectively import files using a SQL-like query (called a CMISQL query). Only files returned by the query will be imported. For example, using an Exchange or IMAP CMIS Connection, you could query an inbox for emails from a specific sender and only import those emails.
    • Note: There are some import filtering capabilities available to Import Descendants as well using a SQL-like query. However, the CMISQL querying capabilities of Import Query Results are much more robust.
    • That said, only certain CMIS Bindings can take advantage of this increased CMISQL query functionality. The following CMIS Bindings are not currently suitable for the Import Query Results provider.
      • FTP
      • SFTP
      • NTFS (If the folder path is not indexed by the Windows Search service and/or Windows Search is not running on the storage server)

Import Descendants

Configuration Panel

Click here for an interactive walkthrough

General Settings

Click here for an interactive walkthrough

Back top the Import Descendants configuration screen, the CMIS Repository object is used to point Grooper to this folder location for import.

  • The Repository property is configured to assign the CMIS Repository where the documents are located.
    • Here the CMIS Repository named "Import and Export" connecting to the "Import and Export" folder of the local drive.
  • The Base Folder property is configured to traverse the folder structure of the CMIS Repository.
    • Here, we don't want to import all documents from every folder in the "Import and Export" folder. We just want to import from the "Grooper Import Folder".
  • The Import Filter property allows you to perform some basic import filtering to selectively choose which documents you want to import.
    • SELECT * FROM File is the default filter. It will import all files from the selected folder location.
    • This is a SQL-like query to specify conditions for document import.
    • BE AWARE: Import Descendants has limited filtering compared to e Import Query Results.
      • Import Query Results was created to expand on this functionality. It provides more filtering options for the CMIS Connection Types supported by Import Query Results.
      • Import Descendants DOES NOT support the "IN_FOLDER" or "IN_TREE" predicates. Import Descendants will always import all documents in all subfolders from the base folder.
  • The Content Type property allows you to optionally assign the incoming documents with a Document Type.
    • You can use this property to assign a default classification for all incoming documents.

Import Query Results

The Same, But Different

The Import Query Results provider's configuration panel is almost identical to the Import Descendants provider's configuration panel. Both providers share the same Processing Options, Disposition, and Batch Creation property settings. See the Import Descendants section for brief descriptions of these property sections.


The big difference between the two providers is the highlighted CMIS Query property. This allows users to enter a SQL-like query (called a CMISQL query) to selectively import documents from their source, based on certain metadata properties. Only files returned by the query will be imported.

  • For example, you may want to only import documents of a certain file type(s). You could include the file extension(s) as the query condition (or one of many conditions).
  • For another example, you can use CMISQL queries to easily filter email messages when importing from an inbox. If you only wanted to import messages from a certain sender, from an certain folder, with a certain subject line and only ones that have not been read, you could filter out any emails that didn't meet those query conditions by comparing metadata properties (like "Sender" and "Subject") to your criteria.

Only certain external storage platforms are currently queryable with the CMIS Query property. The following CMIS Binding sources cannot be queried currently. As such, they are not suitable for Import Query Results. You should instead use Import Descendants for the following CMIS Bindings.
• FTP
• SFTP
• NTFS (If the folder path is not indexed by the Windows Search service and/or Windows Search is not running on the storage server)


Just like with Import Descendants, there are some minimum requirements before configuring Import Query Results. A CMIS Connection object must be created and a CMIS Repository must be imported.

Click here for an interactive walkthrough


CMIS Query Configuration

Upon pressing the ellipsis button at the end of the CMIS Query property, the CMIS Query Editor window will appear.

This interface allows you to configure the CMISQL query based on available metadata from the CMIS Binding. For example, the Exchange binding has a selection of queryable metadata for email messages, such as the email's subject, sender and date the message was received.

Click here for an interactive walkthrough

For an in depth explanation of the CMIS Query Editor and how to use it to craft a CMISQL query, please visit the CMIS Query article.

Settings shared between Import Descendants and Import Query Results =

The following properties/settings are part of configuring either provider.

Import Mode: Sparse

Sparse imports are a way of speeding up the time it takes to import files.

Files can be imported using one of three "Import Modes":

  • Copy - The files are fully copied to the Grooper Repository on import.
    • The file's properties are stored in the Grooper Database entry for the Batch Folder.
    • A copy of the file is attached to the Batch Folder and stored in the Grooper File Store.
  • Sparse - Only the file properties are copied to the Grooper Repository.
    • The file's properties are stored in the Grooper Database entry for the Batch Folder.
    • The file content is not copied over to the Grooper File Store. Instead, it is accessed by the link attached to the Batch Folder.
  • Link Only (seldom used) - Nothing is copied to the Grooper Repository. Only a link to the source file is attached to the Batch Folder.


How does using "Sparse" speed up import?

It increases the parallelism of the overall import operation.

Import operations must run "single threaded" in Grooper. That means regardless of how much compute your server has, it's only ever going to use a single processing thread to import files.

  • When you're importing hundreds or thousands of documents by copying them from a source location to the Grooper File Store, it takes a long time for the Import Job to complete.
  • By only importing a link to the file content, Sparse mode dramatically speeds up the time it takes to get a usable document into Grooper.
  • Then, the first step in your Batch Process should be "Execute" using the "CMIS Document Link > Load" command. This will allow you to load the files into the Grooper File Store using multiple threads.
  • The end result is the overall import operation will be as if you had used the "Copy" mode. But it will be done in a way that runs multi-threaded.


Batch Creation Settings

The Batch Creation settings allow you to define which Batch Process you wish to use to process the imported files.

You must configure the "Starting Step" property to assign the Batch Process.

  • Use the property's dropdown editor to select a Batch Process Step from a list of Batch Processes.
  • Only published Batch Processes will appear in this list.


Other notable Batch Creation properties:

  • "Start Paused" - This determines if the Batch starts in a paused state or not. If "False", the first step's tasks will be automatically submitted to Activity Processing services. If "True", you will have to manually start the Batch.
  • "Max Items Per Batch" - The default is 2500, meaning each Batch will have a maximum of 2500 Batch Folders before creating a new Batch on import. For users who want more Batches with fewer documents, lower this number.
  • "Organize By Date" - This will organize Batches into subfolders in the Production branch in Grooper according to the year / month / day the Batch was created.
  • "Priority" and "Increment Priority" - Controls the task processing priority for the Batch. "Increment Priority" is useful when submitting large user-directed imports from the Imports Page to ensure the first Batch created is the first that is fully processed by Activity Processing services.


Disposition Settings

The "Disposition" settings allow you to do something with the source files after importing them into Grooper. This is important when using an Import Watcher service to schedule imports. If you do not configure a Disposition property, the imported file will remain in the same state after the Import Job completes. This can cause the Import Watcher to repeatedly attempt to import the same file over and over again.

There are three "Disposition" options:

  • "Delete Item" - Turning this to "True" will simply delete the source file after the Import Job completes.
    • You may only configure this option if the "Import Mode" is set to "Copy".
  • "Move To Folder" - This will move the files to a different folder in the CMIS Repository after the Import Job completes.
  • "Update Properties" - This will update one or more file properties after the Import Job completes. Properties are updated by listing them as "key-value pairs" in the "Update Properties" list editor where key=value.
    • Examples:
    • Archive=false Sets the archive attribute on each imported file to "false".
    • IsRead=true Marks each imported email message as read.
    • Status=PENDING Sets a "Status" field on a document in an AppEnhancer application (assuming there is a "Status" field in the application).