CMIS Import (Import Provider): Difference between revisions

From Grooper Wiki
No edit summary
(41 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<onlyinclude>
{{AutoVersion}}
<blockquote style="font-size:14pt">
''CMIS Import'' is an '''''[[Import Provider]]''''' used to import content over a '''CMIS Connection''', allowing users to import from various on-premise and cloud based storage platforms.
</blockquote>


Documents are imported from '''CMIS Connections''' using either the '''Import Descendants''' or '''Import Query Results''' providers.  These can be used in two ways:
<blockquote>{{#lst:Glossary|CMIS Import}}</blockquote>


* To perform manual "ad-hoc" imports when creating a new '''[[Batch]]''' in Grooper Dashboard or Grooper Design Studio.
Documents are imported from '''CMIS Connections''' using either the '''''Import Descendants''''' or '''''Import Query Results''''' providers.  These can be used in two ways:
 
* To perform manual "ad-hoc" imports when creating a new '''[[Batch]]''' on the "Imports" page.
* To perform automated, scheduled imports using one or more '''Import Watcher''' Grooper services.
* To perform automated, scheduled imports using one or more '''Import Watcher''' Grooper services.


'''Import Descendants''' will import all documents within a designated folder location of a '''[[CMIS Repository]]'''.  '''Import Query Results''' allows you to use a query syntax similar to a SQL query (called a CMISQL query) to set conditions for import based on a documents name, file type, creation date, archive status, or other variables.
'''''Import Descendants''''' will import all documents within a designated folder location of a '''[[CMIS Repository]]'''.  '''''Import Query Results''''' allows you to use a query syntax similar to a SQL query (called a CMISQL query) to set conditions for import based on the item's available metadata, such as a documents name, file type, creation date, archive status, or other variables.
</onlyinclude>


== About CMIS ==
<u><big>'''SharePoint'''</big></u>: {{#lst:Glossary|SharePoint}}


[[CMIS]] stands for "Content Management Interoperability Services".  It is an open standard that allows different content management systems to inter-operate over the Internet.  This standard protocol allows Grooper to use many different platforms for importing and exporting documents and their contents.  Once a '''[[CMIS Connection]]''' object is created, Grooper can exchange documents with these platforms.  "Interoperability " means Grooper has the same access to control the system as a human being does.  It is a "one-to-one" connection to the platform, allowing full and total control.
{{#lst:CMIS+ (Concept)|cmisplus}}


Upon connecting to an external content management system, Grooper will be able to see the "repositories" associated with it.  A repository, in computer science, is a general term for a location where data lives.  Different systems refer to "repositories" in different ways.  An email inbox could be a repository.  A folder in Windows could be a repository.  A cabinet in ApplicationXtender could be a repository.  It's a place to put things.  We standardize the various terms used by various storage platforms to simply "repository". 
== About CMIS Import ==
The '''''CMIS Import''''' provider is split into two different '''''Import Providers'''''


These repositories are "imported" into Grooper as a '''[[CMIS Repository]]''' object, as a child of the '''CMIS Connection''' object.  This doesn't import data into Grooper in the traditional sense of importing documents into a batch.  "Importing" here is more like bringing the repository into a framework Grooper can use (creating the '''CMIS Connection''' object).  Upon importing the repository Grooper has full file access to that location in the storage platform.
* '''''Import Descendants'''''
* '''''Import Query Results'''''


For our purposes, repositories are like filing cabinets full of documents.  Once a connection is established, it's like giving Grooper a key to that cabinet.  You can open the various drawers of that cabinet.  You can pull out files and put files intoThe storage platform or content management system is like the cabinet.  The '''CMIS Connection''' object is like the key.  The '''CMIS Repository''' object is like a drawer in the cabinet.  You "connect" to the cabinet by turning the key.  You "import" the repository by opening the drawer.  Now you can see there are documents in there!  You can take them out.  You can read them and put them back in.  You can put new ones inYou can use this "open" connection to the "drawer" however you need.
These providers are designed to import files from a folder structure of an on-premise or cloud-based document storage platformThis is the primary method of '''Batch''' creation when importing digital documents into Grooper to process them with a '''Batch Process'''.   


== CMIS+ Architecture ==
In order to do this, a few requirements must be met first.


Grooper expanded on this idea in version 2.72 to create our CMIS+ architecture. CMIS+ unifies all content platforms under a single framework as if they were traditional CMIS endpointsPrior to version 2.72, there was only one type of '''CMIS Connection''', a true CMIS connection using CMIS 1.0 or CMIS 1.1 serversNow, connections to additional non-CMIS document storage platforms can be made via "''CMIS Bindings''".  This provides standardized access to document content and metadata across a variety of external storage platforms.
# A '''CMIS Connection''' object must made and configured.  This will connect Grooper to the document storage platform.
#* This may be a connection to a Windows folder, an email inbox, a true CMIS content management system, or other document storage platforms.  What the '''CMIS Connection''' connects to is determined by the '''''CMIS Binding''''' selected when configuring the '''''Connection Type''''' property of the '''CMIS Connection''' object.
# A '''CMIS Repository''' must be importedThis will create an object Grooper can use to import documents from the folders in the document storage platform.
#* This acts as a "go-between" or a "hub" for Grooper to pull in documents from the content's source. Or, you may think of this as Grooper's representation of a folder location in the document storage platform.


Using this architecture, Grooper is able to create a simpler and more efficient import and export workflow, using a variety of storage platforms.  You now use [[CMIS Import]] and [[CMIS Export]] providers, regardless of the storage platform.  They connect to a '''CMIS Repository''' imported from a '''CMIS Connection''' and use that as Grooper's import or export path.
For more information on adding a '''CMIS Connection''' and importing a '''CMIS Repository''', visit the '''[[CMIS Connection]]''' article.


How you create a '''CMIS Connection''' only differs from ''CMIS Binding'' to ''CMIS Binding'', as each binding has a different way of connecting to it.  You don't connect to an Outlook inbox the same way you connect to a Windows file folder, for example.
As for the difference between the '''''Import Descendants''''' and '''''Import Query Results''''' providers, you can think of '''''Import Query Results''''' as a more specialized version of '''''Import Descendants'''''.


=== CMIS Bindings ===
* '''''Import Descendants''''' is intended to import the full contents of a folder location.  It imports the "descendant" files of a parent folder.
* '''''Import Query Results''''' allows you to selectively import files using a SQL-like query (called a CMISQL query).  Only files returned by the query will be imported.  For example, using an '''''Exchange''''' or '''''IMAP''''' '''CMIS Connection''', you could query an inbox for emails from a specific sender and only import those emails.
** Note:  There are some import filtering capabilities available to '''''Import Descendants''''' as well using a SQL-like query.  However, the CMISQL querying capabilities of '''''Import Query Results''''' are much more robust. 
** That said, only certain '''''CMIS Bindings''''' can take advantage of this increased CMISQL query functionality.  The following '''''CMIS Bindings''''' are '''''not''''' currently suitable for the '''''Import Query Results''''' provider.
*** '''''FTP'''''
*** '''''SFTP'''''
*** '''''NTFS''''' (If the folder path is not indexed by the Windows Search service and/or Windows Search is not running on the storage server)


A ''CMIS Binding'' provides connectivity logic for external storage platforms, allowing '''CMIS Connection''' objects to import and export content.  Grooper's CMIS+ architecture expands connectivity from traditional CMIS servers to a variety of on-premise and cloud-based storage platforms by exposing connections to these platforms as ''CMIS Bindings''.  Each individual ''CMIS Binding'' contains the settings and logic required to exchange documents between Grooper and each distinct platform.  For example, the ''AppXtender Binding'' contains all the information Grooper uses to connect to the ApplicationXtender content management system.
=== Import Descendants ===
==== Configuration Panel ====


''CMIS Bindings'' are used when creating a '''CMIS Connection''' object. The first step to creating a '''CMIS Connection''' is to configure the '''''Connection Type''''' property.   Which binding you use (and therefore which platform you connect to) is set here.  First, the user selects which ''CMIS Binding'' they want to use, selecting which storage platform they want to connect to.  The second step is to enter the connection settings for that binding, such as login information for many bindings.
[https://app.supademo.com/demo/cm8d4b2wt1f812ugqwtbi17bu Click here for an interactive walkthrough]


=== Current CMIS Bindings ===
==== General Settings ====


Grooper can connect to the following storage platforms using below using ''CMIS Bindings'':
[https://app.supademo.com/demo/cm8ddjcr91rb92ugqp7k9phqo Click here for an interactive walkthrough]


* The [[AppXtender (CMIS Binding)|ApplicationXtender]] document management platform.
Back top the '''''Import Descendants''''' configuration screen, the '''CMIS Repository''' object is used to point Grooper to this folder location for import.
* The [[Box (CMIS Binding)|Box]] cloud storage platform.
* The [[FileBound (CMIS Binding)|FileBound]] document management platform.
* [[CMIS (CMIS Binding)|Content management systems]] using CMIS 1.0 or CMIS 1.1 servers.
* The following Microsoft content platforms
** The [[Exchange (CMIS Binding)|Microsoft Exchange]] mail server platform.
** The [[OneDrive (CMIS Binding)|Microsoft OneDrive]] cloud storage platform.
** [[SharePoint (CMIS Binding)|Microsoft SharePoint]] sites.
* [[FTP (CMIS Binding)|FTP]] (File Transfer Protocol) and [[SFTP (CMIS Binding)|SFTP]] (SSH File Transfer Protocol) servers.
* [[IMAP (CMIS Binding)|IMAP]] mail servers
* The Microsoft Windows [[NTFS (CMIS Binding)|NTFS]] file system.


== Import Descendants ==
*The '''''Repository''''' property is configured to assign the '''CMIS Repository''' where the documents are located.
** Here the '''CMIS Repository''' named "Import and Export" connecting to the "Import and Export" folder of the local drive.


{|style="background-color:#ed2330; color:white"
*The '''''Base Folder''''' property is configured to traverse the folder structure of the '''CMIS Repository'''.
|This section needs expandingSome info is located at the [[Import Descendants]] article.
** Here, we don't want to import ''all'' documents from ''every'' folder in the "Import and Export" folderWe just want to import from the "Grooper Import Folder".
|}


== Import Query Results ==
*The '''''Import Filter''''' property allows you to perform some basic import filtering to selectively choose which documents you want to import.
** <code>SELECT * FROM File</code> is the default filter.  It will import all files from the selected folder location.
** This is a SQL-like query to specify conditions for document import. 
**<li class="attn-bullet"> BE AWARE: '''''Import Descendants''''' has limited filtering compared to e '''''Import Query Results'''''.
*** '''''Import Query Results''''' was created to expand on this functionality. It provides more filtering options for the '''''CMIS Connection Types''''' supported by '''''Import Query Results'''''.
*** '''''Import Descendants''''' DOES NOT support the "IN_FOLDER" or "IN_TREE" predicates. '''''Import Descendants''''' will ''always'' import ''all'' documents in ''all'' subfolders from the base folder.
*The '''''Content Type''''' property allows you to optionally assign the incoming documents with a '''Document Type'''.
** You can use this property to assign a default classification for all incoming documents.


{|style="background-color:#ed2330; color:white"
[[File:2023_CMIS_Import_02_About_CMIS_Import_02_1_Import_Descendants_General_Settings_02(1).png]]
|This section needs expanding. Some info is located at the [[Import Query Results]] article.
|}


==== Processing Options Settings ====
The most important part of the '''''Processing Options''''' property section is the '''''Import Mode''''' property.


The '''''Import Mode''''' property allows control over the connections Grooper makes and/or retains to the imported documents.


== Version Differences ==
For importing, documents contain two important sets of information:


=== CMIS+ Infrastructure (2.72) ===
* Content - Images and native text data
* Properties - Metadata associated with the file. Digital information, such as the document's filename, file type, creation date, and more.


As of 2.72, Grooper utilizes what we call the [[CMIS+]] infrastructure.  This unifies all content platforms under a single framework as CMIS endpoints.  Prior to version 2.72, there was only one type of CMIS Connection, a CMIS connection using CMIS 1.0 or CMIS 1.1 servers.  Now, connections to additional non-CMIS document storage platforms can be made via "[[CMIS Binding]]s".
Depending on the '''''Import Mode''''' selected all, some, or none of this information will be copied to your Grooper Repository's file store (in the case of the document's content) and database (in the case of the document's properties).  See below for more in depth explanation of each of the '''''Import Mode''''' options.


The settings and logic used to connect Grooper to an individual storage platform is contained in its corresponding [[CMIS Binding]].  For example, the [[AppXtender (CMIS Binding)|AppXtender Binding]] contains all the information Grooper uses to connect to the ApplicationXtender content management system. Which binding you use (and therefore which platform you connect to) is set by creating a [[CMIS Connection]] and choosing the appropriate binding as the "Connection Type".
[[File:2023_CMIS_Import_02_About_CMIS_Import_02_1_Import_Descendants_Processing_Options_Settings_01(2).png]]


Using this architecture, Grooper is able to create a more standardized import and export workflow.  You now use [[CMIS Import]] and CMIS Export providers regardless of the storage platform.  They connect to a [[CMIS Repository]] imported from a [[CMIS Connection]] and use that as Grooper's import or export path.


Only how you create a [[CMIS Connection]] differs from [[CMIS Binding]] to CMIS Binding, as each binding has a different way of connecting to it (You don't connect to an Outlook inbox the same way you connect to a Windows file folder, for example).
<big>Copy</big>
* Both properties and content will be loaded. This is a total duplication of the document from its source to your Grooper Repository's local file store.  This is the slowest import mode, because the full content of each document is copied during a ''single-threaded'' import process. As such, this mode is not well-suited for high-volume imports, but provides some useful advantages in low-volume import scenarios.


=== Legacy Providers (2.72) ===
*For example, ''Copy'' mode allows items to be deleted immediately on import. Also, ''Full'' mode avoids the need for any follow-up content loading operations in the '''Batch Process'''.


Old import and export providers should be replaced with this new functionality.  While Grooper's older import and export providers are available as "Legacy Import" and "Legacy Export" providers, these components are depreciated.  They will still function but will no longer be upgraded in future versions of Grooper.
*This mode was called ''Full'' in older versions of Grooper.


Grooper can import documents using [[CMIS Connection]]s via "Import Descendents" and "Import Query Results".  Grooper can export via the CMIS Export providers, [[CMIS Export#Mapped Export|Mapped Export]] and [[CMIS Export#Unmapped Export|Unmapped Export]].
<big>Sparse</big>
*Properties will be loaded, but content will not. This mode is much faster than a ''Full'' import, because no content files are copied into your local Grooper file store. Instead, a link is saved on each Grooper document, and content is retrieved on demand directly from the '''CMIS Repository'''. This type of document is often referred to as a "sparse" document. Sparse documents can be used just like any other document, with the caveat that display and processing speeds may be reduced.  Grooper has to traverse the document link in order to display or process the document's image.


=== New Connection Types (2.72) ===
* However, after a ''Sparse'' import, document content can be loaded multi-threaded using the '''Execute''' activity in a '''Batch Process'''. This can overall lead to importing a document's content faster than a ''Full'' import.  While the
** Choose ''CMIS Document Link'' as the '''''Object Type''''' and ''Load Content'' as the '''''Command'''''


By creating the [[CMIS+]] architecture, we have been able to create new connections between Grooper and content management systems.  Grooper can now connect to Microsoft OneDrive, SharePoint, and Exchange via new [[CMIS Binding]]s.  Since these were created as [[CMIS Binding]]s, they can be used by the [[CMIS Import]] and CMIS Export providers.  Instead of having to create three new import providers and three new export providers for a total of six brand new components, we can use the already established CMIS import and export providers in the CMIS+ framework. A user can create a [[CMIS Connection]] using the [[OneDrive (CMIS Binding)|OneDrive]], [[SharePoint (CMIS Binding)|SharePoint]] or [[Exchange (CMIS Binding)|Exchange]] Bindings, and use the same import and export providers for them as any of the other [[CMIS Bindings]].
<big>Link Only</big>
* No content or properties will be loaded, making this the fastest import mode. It imports nothing more than a link to each document, and offloads all property and content loading to parallel operations in the Batch Process.


This will also allow Grooper to create [[CMIS Binding]]s to connect to currently unavailable content management systems in the future much quicker and easier.
* However, this does not produce a usable document in Grooper.  After a ''LinkOnly'' import, document content ''must'' be loaded using the '''Execute''' activity in a '''Batch Process'''.
** Choose ''CMIS Document Link'' as the '''''Object Type''''' and ''Load Content'' as the '''''Command'''''


=== Import Mode (2.72) ===
* You can think of the ''Link Only'' option as an even sparser sparse import.


In version 2.72 the "Import Mode" property replaces previous versions' "Sparse Import" property.


Import Mode allows control over the connections Grooper makes and/or retains to the imported documents.
See the table bellow for a summary of the '''''Import Mode''''' options.


{|
{|cellpadding=10 cellspacing=5
|-
|'''''Import Mode'''''||'''Speed'''||'''Comments'''
|'''Mode'''||'''Speed'''||'''Comments'''
|-valign=top
|-valign="top"
|''Full''||Slow||Full import of content and their properties.
|style="width:15%"|Full||style="width:15%"|Slow||
* Full import of content and their properties.
* Required if deleting content from the source on import.
* Required if deleting content from the source on import.
|-valign="top"
|-valign=top
|Sparse||Fast||
||''Sparse''||Fast||Imports a link to the document's source and its properties but not their content.
* Imports a link to the document's source and its properties but not their content.
* This produces a usable document in Grooper without copying the full content into Grooper, saving time upon import.
* This produces a usable document in Grooper without copying the full content into Grooper, saving time upon import.
* This mode is the same as enabling the old "Sparse Import" property in previous versions.
* This mode is the same as enabling the old '''''Sparse Import''''' property in previous versions.
|-valign="top"
|-valign=top
|Link Only||Fastest||
||''Link Only''||Fastest||Only imports a link to the document's source.
* Only imports a link to the document's source.
*Does not produce a usable document. The document's properties must be loaded in a step in a Batch Process.
* Does not produce a usable document. The document's properties must be loaded in a step in a Batch Process.
|}
|}


==== Full ====
==== Disposition Settings ====
The '''''Disposition''''' property settings allow you to do something with the source documents after importing them into Grooper, namely delete them, move them, or do nothing and just leave them alone where they came from.  This is often leveraged with the '''Import Watcher''' Grooper service to prevent repeatedly importing the same document.
 
In our example here, the '''''Move to Folder''''' property is configured to move the PDF documents to a folder named "Imported Documents".
* The folder location you're moving documents to ''must'' be accessible via the connected '''CMIS Repository'''.
 
If using the ''Full'' '''''Import Mode''''', you can enable the '''''Delete Item''''' property to delete each document after it is imported into the Grooper '''Batch'''.
* This property is ONLY available when choosing the ''Full'' '''''Import Mode.'''''  A sparsely imported document needs to call to the import storage location in order to load the document's image for display or processing.  If you deleted the document upon import, you wouldn't be able to view it or do anything with it.


Both properties and content will be loaded. This is the slowest import mode, because the full content of each document is copied during a single-threaded import process. As such, this mode is not well-suited for high-volume imports, but provides some useful advantages in low-volume import scenarios.
The '''''Update Properties''''' property allows you to alter the document's property values upon import. Property values are updated using a list of "key-value pairs" where the "key" is the name of the property and the "value" is what change you want to make to that property. You can type one entry per line in the format <code>key=value</code>.
* Examples:
* <code>Archive=true</code> Sets the archive attribute on a file
* <code>Status=PENDING</code> Sets the "Status" field on ApplicationXtender documents.
* <code>Imported=true</code> Sets the "Imported" field on SharePoint documents.
* <code>IsRead=true</code> Sets the "IsRead" flag on an Exchange message.


For example, Normal mode allows items to be deleted immediately on import, which can be important when using the Import Watcher service. Also, Normal mode avoids the need for any follow-up load operations in the Batch Process.
[[File:2023_CMIS_Import_02_About_CMIS_Import_02_1_Import_Disposition_Settings_01.png]]


==== Sparse ====
==== Batch Creation Settings ====
It's likely you're importing documents because you want to run them through a '''Batch Process'''.  The '''''Batch Creation''''' property settings allow you to define which '''Batch Process''' you wish to use to process the imported documents.


Properties will be loaded, but content will not. This mode is much faster than a Full import, because no content files are copied into Grooper. Instead, a link is saved on each Grooper document, and content is retrieved on demand directly from the CMIS Repository. This type of document is called a sparse document. Sparse documents can be used just like any other document, with the caveat that display and processing speeds may be reduced.
This is done using the '''''Starting Step''''' property, selecting a '''Batch Process Step''' in a '''Batch Process''' from the published '''Batch Processes''' in the Grooper Repository. Upon import, a new '''Batch''' is created with each document as a '''Batch Folder''', and the selected '''Batch Process''' assigned to the '''Batch'''.


After a Sparse import, document content can be loaded in parallel using the "Execute" activity in a Batch ProcessChoose ''CMIS Document Link'' as the "Object Type" and ''Load Content'' as the "Command"
There are also further properties to control '''Batch''' creation.  You can limit the number of documents imported per '''Batch''' using the '''''Maximum Items per Batch''''' property.  By default, new '''Batches''' are named with a date/time stamp.  However, the '''''Batch Name Prefix''''' allows you to tack on a prefix to the '''Batch's''' name for easier identificationThe '''''Start Paused''''' property will automatically trigger the '''Batch Process''' if set to ''False''.


==== Link Only ====
[[File:2023_CMIS_Import_02_About_CMIS_Import_02_1_Import_Descendants_Batch_Creation_Settings_01.png]]


No content or properties will be loaded, making this the fastest import mode. It imports nothing more than a link to each document, and offloads all property and content loading to parallel operations in the Batch Process.
=== Import Query Results ===
==== The Same, But Different ====
The '''''Import Query Results''''' provider's configuration panel is almost identical to the '''''Import Descendants''''' provider's configuration panel. ''Both'' providers share the same '''''Processing Options''''', '''''Disposition''''', and '''''Batch Creation''''' property settings.  See the [[#Import Descendants|Import Descendants]] section for brief descriptions of these property sections.


After a LinkOnly import, document properties and/or content can be loaded using the "Execute" activity in a Batch Process.  Choose ''CMIS Document Link'' as the "Object Type" and ''Load Content'' as the "Command"


=== Import Disposition (2.72) ===
The big difference between the two providers is the highlighted '''''CMIS Query''''' property.  This allows users to enter a SQL-like query (called a CMISQL query) to selectively import documents from their source, based on certain metadata properties.  Only files returned by the query will be imported. 
* For example, you may want to only import documents of a certain file type(s).  You could include the file extension(s) as the query condition (or one of many conditions).
* For another example, you can use CMISQL queries to easily filter email messages when importing from an inbox.  If you only wanted to import messages from a certain sender, from an certain folder, with a certain subject line and only ones that have not been read, you could filter out any emails that didn't meet those query conditions by comparing metadata properties (like "Sender" and "Subject") to your criteria.


2.72 also adds an Import Disposition property to CMIS ImportThis allows you to change your documents disposition upon importing them into GrooperYou can delete them, move them to a folder, or update one or more properties on the document itselfThis can be leveraged with Import Watcher to prevent repeatedly importing the same document.
{|class="attn-box"
|
'''&#9888;'''
|Only certain external storage platforms are currently queryable with the '''''CMIS Query''''' property.  The following '''''CMIS Binding''''' sources '''''cannot''''' be queried currentlyAs such, they are '''''not''''' suitable for '''''Import Query Results'''''You should instead use '''''Import Descendants''''' for the following '''''CMIS Bindings'''''.


{|
:&bull; '''''FTP'''''
|-
:&bull; '''''SFTP'''''
|'''Disposition'''||'''Comments'''
:&bull; '''''NTFS''''' (If the folder path is not indexed by the Windows Search service and/or Windows Search is not running on the storage server)
|-valign="top"
|style="width:20%"|Delete Item||
* Enables or disables deletion of the input item immediately after import
* Can only be enabled if "Import Mode" is set to "Full"
|-valign="top"
|Move to Folder||
* Specifies an optional folder in your CMIS Repository to which items are moved after import.
|-valign="top"
|Update Properties||
* Defines one or more property values to be updated on import.
* Allows you to type a list of "key-value pairs" where the "key" is the name of the property and the "value" is what change you want to make to that property.  You can type one entry per line in the format <code>key=value</code>
** Examples:
** <code>Archive=true</code> Sets the archive attribute on a file
** <code>Status=PENDING</code> Sets the "Status" field on ApplicationXtender documents.
** <code>Imported=true</code> Sets the "Imported" field on SharePoint documents.
** <code>IsRead=true</code> Sets the "IsRead" flag on an Exchange message.
|}
|}
[[File:2023_CMIS_Import_02_About_CMIS_Import_02_3_Import_Query_Results_The_Same_But_Different_01.png]]
Just like with '''''Import Descendants''''', there are some minimum requirements before configuring '''''Import Query Results'''''.  A '''CMIS Connection''' object must be created and a '''CMIS Repository''' must be imported.
[https://app.supademo.com/demo/cm8ergggh03ij12zdakzyt3jp Click here for an interactive walkthrough]
==== CMIS Query Configuration ====
Upon pressing the ellipsis button at the end of the '''''CMIS Query''''' property, the '''''CMIS Query Editor''''' window will appear.
This interface allows you to configure the CMISQL query based on available metadata from the '''''CMIS Binding'''''.  For example, the '''''Exchange''''' binding has a selection of queryable metadata for email messages, such as the email's subject, sender and date the message was received.
[https://app.supademo.com/demo/cm8g1l4us17x112zdo9ti0ae3 Click here for an interactive walkthrough]
For an in depth explanation of the '''''CMIS Query Editor''''' and how to use it to craft a CMISQL query, please visit the [[CMIS Query]] article.

Revision as of 12:58, 19 March 2025

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 202320212.90

CMIS Import refers to two Import Providers used to import content from settings_system_daydream CMIS Repositories: Import Descendants and Import Query Results. CMIS Imports allow users to import from various on-premise and cloud based storage platforms (including Windows folders, Outlook inboxes, Box accounts, AppEnhancer applications and more).

Documents are imported from CMIS Connections using either the Import Descendants or Import Query Results providers. These can be used in two ways:

  • To perform manual "ad-hoc" imports when creating a new Batch on the "Imports" page.
  • To perform automated, scheduled imports using one or more Import Watcher Grooper services.

Import Descendants will import all documents within a designated folder location of a CMIS Repository. Import Query Results allows you to use a query syntax similar to a SQL query (called a CMISQL query) to set conditions for import based on the item's available metadata, such as a documents name, file type, creation date, archive status, or other variables.

SharePoint: SharePoint is a connection option for cloud CMIS Connections. It Grooper to Microsoft SharePoint, providing access to content stored in "document libraries" and "picture libraries" for import and export operations.


About CMIS+

"CMIS" stands for "Content Management Interoperability Services". It is an open standard that allows different content management systems to inter-operate over the Internet. Grooper expanded on this idea in version 2.72 to create our "CMIS+" architecture. CMIS+ unifies all content platforms under a single framework as if they were traditional CMIS endpoints.


Now, Grooper connects to all available external storage platforms by creating and configuring a CMIS Connection.

  • Once a CMIS Connection is created, Grooper can "interoperate" with these platforms.
  • "Interoperability " means Grooper has the same access to control the system as a human being does.
  • Grooper has a "one-to-one" connection to the platform, allowing full and total control.
  • Because we standardize connection to non-CMIS systems, this includes platforms like NTFS file systems (Windows) that are not CMIS servers.


Using this architecture, Grooper is able to create a simpler and more efficient import and export workflow, using a variety of storage platforms.

  • You now use CMIS Import providers and CMIS Export for any storage platform you can connect to with a CMIS Connection.
  • This also speeds up development for adding new connection types for import/export operations.

Anatomy of a CMIS Connection

When connecting Grooper to external storage platforms, you'll start by creating a CMIS Connection. There are three important parts to understanding a CMIS Connection:

  1. The CMIS Connection itself
  2. The platform it's connecting to. This is defined by the "CMIS Binding" (aka "connection type") selected for the CMIS Connection's "Connection Settings".
  3. Its child CMIS Repositories
    • "Repository" is just a general term for a location where data lives. Different systems refer to "repositories" in different ways.
      • A folder in Windows could be a repository. An email inbox could be a repository. A document library in SharePoint could be a repository. An application in ApplicationEnhancer (formerly ApplicationXtender) could be a repository.
      • "Repository" is a normalized way of referring to various terms used by various storage platforms.


For newer users, the difference between a CMIS Connection and a CMIS Repository can be confusing. The key distinction is as follows:

  • CMIS Connections connect to storage platforms.
    • It's the phone number you dial.
    • The specific platform you're connecting to is defined in its "Connection Settings".
  • CMIS Repositories represent a location within the connected platform.
    • It's the person on the other end of that phone number you want to talk to.
    • CMIS Repositories represent storage locations (typically folders) in the storage platform. They are added as children to a parent CMIS Connection.
    • The CMIS Repository nodes are what Grooper actually uses when configuring import/export operations.
      • You don't talk to a phone number. You talk to a person.
      • You don't reference the parent CMIS Connection when configuring CMIS Import or CMIS Export. Instead you reference a CMIS Repository.

Basic creation steps

There are three basic steps involved to connect Grooper to external storage platforms:

  1. Create a CMIS Connection
  2. Configure the "Connection Settings".
    • Choose what platform you want to connect to (the CMIS Binding).
    • Enter the connection settings required to connect to the platform (This will differ from platform to platform)
  3. Add child CMIS Repositories by importing the storage locations.
    • Importing a CMIS Repository is not the same as importing documents to a new Batch.
      • "Importing" here is more like importing a reference (or bringing the repository into a framework Grooper can use).
      • Upon importing the CMIS Repository, Grooper has full file access to that location in the storage platform.

CMIS Bindings (aka "connection types")

How you configure a CMIS Connection only differs based on what platform you're connecting to. Connection settings include folder paths, URL addresses or usernames or passwords.

  • Example: Connecting to a Windows folder requires a networked folder's UNC path.
  • Example: Connecting to a SharePoint site requires a URL address.
  • Example: Connecting to a email inbox requires an server host name.
  • Example: Connecting to Application Extender, Box, SharePoint, OneDrive, Exchange (Outlook) and more requires a username and password.


Each platform has its own connection requirements. These connection settings and the logic required to interoperate between Grooper and a specific platform are defined by the different "CMIS Binding"

Each CMIS Binding provides the settings and logic to connect Grooper to CMS platforms and file systems for import and export operations.

  • Example: The "Exchange" binding contains all the information Grooper uses to connect to Microsoft Exchange email servers (i.e. Outlook inboxes).
  • Example: The "AppXtender" binding contains all the information Grooper uses to connect to the ApplicationEnhancer (formerly AppXtender) content management system.
  • Example: The "NTFS" binding contains all the information Grooper uses to connect to a Windows file system.
  • And so on.


The first step in configuring a CMIS Connection is choosing what platform you want to connect to. You do this by selecting a "CMIS Binding".

  • You will commonly hear "CMIS Binding" referred to as a "CMIS connection type" or "connection type".
  • Or just "connection", as in an "Exchange connection".

Current CMIS Bindings (aka "connection types")

Grooper can connect to the following storage platforms using below using CMIS Bindings:

Most Commonly Used

Somewhat Commonly Used

Less Commonly Used

  • FTP (File Transfer Protocol) and SFTP (SSH File Transfer Protocol) servers.
  • IMAP mail servers

Least Used

  • Content management systems using CMIS 1.0 or CMIS 1.1 servers.
  • The FileBound document management platform.
  • The IBM FileNet platform.


About CMIS Import

The CMIS Import provider is split into two different Import Providers

  • Import Descendants
  • Import Query Results

These providers are designed to import files from a folder structure of an on-premise or cloud-based document storage platform. This is the primary method of Batch creation when importing digital documents into Grooper to process them with a Batch Process.

In order to do this, a few requirements must be met first.

  1. A CMIS Connection object must made and configured. This will connect Grooper to the document storage platform.
    • This may be a connection to a Windows folder, an email inbox, a true CMIS content management system, or other document storage platforms. What the CMIS Connection connects to is determined by the CMIS Binding selected when configuring the Connection Type property of the CMIS Connection object.
  2. A CMIS Repository must be imported. This will create an object Grooper can use to import documents from the folders in the document storage platform.
    • This acts as a "go-between" or a "hub" for Grooper to pull in documents from the content's source. Or, you may think of this as Grooper's representation of a folder location in the document storage platform.

For more information on adding a CMIS Connection and importing a CMIS Repository, visit the CMIS Connection article.

As for the difference between the Import Descendants and Import Query Results providers, you can think of Import Query Results as a more specialized version of Import Descendants.

  • Import Descendants is intended to import the full contents of a folder location. It imports the "descendant" files of a parent folder.
  • Import Query Results allows you to selectively import files using a SQL-like query (called a CMISQL query). Only files returned by the query will be imported. For example, using an Exchange or IMAP CMIS Connection, you could query an inbox for emails from a specific sender and only import those emails.
    • Note: There are some import filtering capabilities available to Import Descendants as well using a SQL-like query. However, the CMISQL querying capabilities of Import Query Results are much more robust.
    • That said, only certain CMIS Bindings can take advantage of this increased CMISQL query functionality. The following CMIS Bindings are not currently suitable for the Import Query Results provider.
      • FTP
      • SFTP
      • NTFS (If the folder path is not indexed by the Windows Search service and/or Windows Search is not running on the storage server)

Import Descendants

Configuration Panel

Click here for an interactive walkthrough

General Settings

Click here for an interactive walkthrough

Back top the Import Descendants configuration screen, the CMIS Repository object is used to point Grooper to this folder location for import.

  • The Repository property is configured to assign the CMIS Repository where the documents are located.
    • Here the CMIS Repository named "Import and Export" connecting to the "Import and Export" folder of the local drive.
  • The Base Folder property is configured to traverse the folder structure of the CMIS Repository.
    • Here, we don't want to import all documents from every folder in the "Import and Export" folder. We just want to import from the "Grooper Import Folder".
  • The Import Filter property allows you to perform some basic import filtering to selectively choose which documents you want to import.
    • SELECT * FROM File is the default filter. It will import all files from the selected folder location.
    • This is a SQL-like query to specify conditions for document import.
    • BE AWARE: Import Descendants has limited filtering compared to e Import Query Results.
      • Import Query Results was created to expand on this functionality. It provides more filtering options for the CMIS Connection Types supported by Import Query Results.
      • Import Descendants DOES NOT support the "IN_FOLDER" or "IN_TREE" predicates. Import Descendants will always import all documents in all subfolders from the base folder.
  • The Content Type property allows you to optionally assign the incoming documents with a Document Type.
    • You can use this property to assign a default classification for all incoming documents.

Processing Options Settings

The most important part of the Processing Options property section is the Import Mode property.

The Import Mode property allows control over the connections Grooper makes and/or retains to the imported documents.

For importing, documents contain two important sets of information:

  • Content - Images and native text data
  • Properties - Metadata associated with the file. Digital information, such as the document's filename, file type, creation date, and more.

Depending on the Import Mode selected all, some, or none of this information will be copied to your Grooper Repository's file store (in the case of the document's content) and database (in the case of the document's properties). See below for more in depth explanation of each of the Import Mode options.


Copy

  • Both properties and content will be loaded. This is a total duplication of the document from its source to your Grooper Repository's local file store. This is the slowest import mode, because the full content of each document is copied during a single-threaded import process. As such, this mode is not well-suited for high-volume imports, but provides some useful advantages in low-volume import scenarios.
  • For example, Copy mode allows items to be deleted immediately on import. Also, Full mode avoids the need for any follow-up content loading operations in the Batch Process.
  • This mode was called Full in older versions of Grooper.

Sparse

  • Properties will be loaded, but content will not. This mode is much faster than a Full import, because no content files are copied into your local Grooper file store. Instead, a link is saved on each Grooper document, and content is retrieved on demand directly from the CMIS Repository. This type of document is often referred to as a "sparse" document. Sparse documents can be used just like any other document, with the caveat that display and processing speeds may be reduced. Grooper has to traverse the document link in order to display or process the document's image.
  • However, after a Sparse import, document content can be loaded multi-threaded using the Execute activity in a Batch Process. This can overall lead to importing a document's content faster than a Full import. While the
    • Choose CMIS Document Link as the Object Type and Load Content as the Command

Link Only

  • No content or properties will be loaded, making this the fastest import mode. It imports nothing more than a link to each document, and offloads all property and content loading to parallel operations in the Batch Process.
  • However, this does not produce a usable document in Grooper. After a LinkOnly import, document content must be loaded using the Execute activity in a Batch Process.
    • Choose CMIS Document Link as the Object Type and Load Content as the Command
  • You can think of the Link Only option as an even sparser sparse import.


See the table bellow for a summary of the Import Mode options.

Import Mode Speed Comments
Full Slow Full import of content and their properties.
  • Required if deleting content from the source on import.
Sparse Fast Imports a link to the document's source and its properties but not their content.
  • This produces a usable document in Grooper without copying the full content into Grooper, saving time upon import.
  • This mode is the same as enabling the old Sparse Import property in previous versions.
Link Only Fastest Only imports a link to the document's source.
  • Does not produce a usable document. The document's properties must be loaded in a step in a Batch Process.

Disposition Settings

The Disposition property settings allow you to do something with the source documents after importing them into Grooper, namely delete them, move them, or do nothing and just leave them alone where they came from. This is often leveraged with the Import Watcher Grooper service to prevent repeatedly importing the same document.

In our example here, the Move to Folder property is configured to move the PDF documents to a folder named "Imported Documents".

  • The folder location you're moving documents to must be accessible via the connected CMIS Repository.

If using the Full Import Mode, you can enable the Delete Item property to delete each document after it is imported into the Grooper Batch.

  • This property is ONLY available when choosing the Full Import Mode. A sparsely imported document needs to call to the import storage location in order to load the document's image for display or processing. If you deleted the document upon import, you wouldn't be able to view it or do anything with it.

The Update Properties property allows you to alter the document's property values upon import. Property values are updated using a list of "key-value pairs" where the "key" is the name of the property and the "value" is what change you want to make to that property. You can type one entry per line in the format key=value.

  • Examples:
  • Archive=true Sets the archive attribute on a file
  • Status=PENDING Sets the "Status" field on ApplicationXtender documents.
  • Imported=true Sets the "Imported" field on SharePoint documents.
  • IsRead=true Sets the "IsRead" flag on an Exchange message.

Batch Creation Settings

It's likely you're importing documents because you want to run them through a Batch Process. The Batch Creation property settings allow you to define which Batch Process you wish to use to process the imported documents.

This is done using the Starting Step property, selecting a Batch Process Step in a Batch Process from the published Batch Processes in the Grooper Repository. Upon import, a new Batch is created with each document as a Batch Folder, and the selected Batch Process assigned to the Batch.

There are also further properties to control Batch creation. You can limit the number of documents imported per Batch using the Maximum Items per Batch property. By default, new Batches are named with a date/time stamp. However, the Batch Name Prefix allows you to tack on a prefix to the Batch's name for easier identification. The Start Paused property will automatically trigger the Batch Process if set to False.

Import Query Results

The Same, But Different

The Import Query Results provider's configuration panel is almost identical to the Import Descendants provider's configuration panel. Both providers share the same Processing Options, Disposition, and Batch Creation property settings. See the Import Descendants section for brief descriptions of these property sections.


The big difference between the two providers is the highlighted CMIS Query property. This allows users to enter a SQL-like query (called a CMISQL query) to selectively import documents from their source, based on certain metadata properties. Only files returned by the query will be imported.

  • For example, you may want to only import documents of a certain file type(s). You could include the file extension(s) as the query condition (or one of many conditions).
  • For another example, you can use CMISQL queries to easily filter email messages when importing from an inbox. If you only wanted to import messages from a certain sender, from an certain folder, with a certain subject line and only ones that have not been read, you could filter out any emails that didn't meet those query conditions by comparing metadata properties (like "Sender" and "Subject") to your criteria.

Only certain external storage platforms are currently queryable with the CMIS Query property. The following CMIS Binding sources cannot be queried currently. As such, they are not suitable for Import Query Results. You should instead use Import Descendants for the following CMIS Bindings.
FTP
SFTP
NTFS (If the folder path is not indexed by the Windows Search service and/or Windows Search is not running on the storage server)


Just like with Import Descendants, there are some minimum requirements before configuring Import Query Results. A CMIS Connection object must be created and a CMIS Repository must be imported.

Click here for an interactive walkthrough


CMIS Query Configuration

Upon pressing the ellipsis button at the end of the CMIS Query property, the CMIS Query Editor window will appear.

This interface allows you to configure the CMISQL query based on available metadata from the CMIS Binding. For example, the Exchange binding has a selection of queryable metadata for email messages, such as the email's subject, sender and date the message was received.

Click here for an interactive walkthrough

For an in depth explanation of the CMIS Query Editor and how to use it to craft a CMISQL query, please visit the CMIS Query article.