CMIS Import (Import Provider): Difference between revisions

From Grooper Wiki
 
(47 intermediate revisions by 3 users not shown)
Line 1: Line 1:
<onlyinclude>
{{AutoVersion}}
<blockquote style="font-size:14pt">
''CMIS Import'' is an '''''[[Import Provider]]''''' used to import content over a '''[[CMIS Connection]]''', allowing users to import from various on-premise and cloud based storage platforms.
</blockquote>


Documents are imported from '''CMIS Connections''' using either the '''Import Descendants''' or '''Import Query Results''' providers.  These can be used in two ways:
<blockquote>{{#lst:Glossary|CMIS Import}}</blockquote>


* To perform manual "ad-hoc" imports when creating a new '''[[Batch]]''' in Grooper Dashboard or Grooper Design Studio.
Documents are imported from '''CMIS Connections''' using either the Import Descendants or Import Query Results providers.  These can be used in two ways:
 
* To perform manual "ad-hoc" imports when creating a new '''[[Batch]]''' on the "Imports" page.
* To perform automated, scheduled imports using one or more '''Import Watcher''' Grooper services.
* To perform automated, scheduled imports using one or more '''Import Watcher''' Grooper services.


'''Import Descendants''' will import all documents within a designated folder location of a '''[[CMIS Repository]]'''.  '''Import Query Results''' allows you to use a query syntax similar to a SQL query (called a CMISQL query) to set conditions for import based on a documents name, file type, creation date, archive status, or other variables.
Import Descendants will import all documents within a designated folder location of a '''[[CMIS Repository]]'''.  Import Query Results allows you to use a query syntax similar to a SQL query (called a CMISQL query) to set conditions for import based on the item's available metadata, such as a documents name, file type, creation date, archive status, or other variables.
</onlyinclude>
{{#lst:CMIS+ (Concept)|cmisplus}}
 
== About CMIS ==
 
[[CMIS]] stands for "Content Management Interoperability Services".  It is an open standard that allows different content management systems to inter-operate over the Internet.  This standard protocol allows Grooper to use many different platforms for importing and exporting documents and their contents.  Once a '''CMIS Connection''' object is created, Grooper can exchange documents with these platforms.  "Interoperability " means Grooper has the same access to control the system as a human being does.  It is a "one-to-one" connection to the platform, allowing full and total control.
 
Upon connecting to an external content management system, Grooper will be able to see the "repositories" associated with it.  A repository, in computer science, is a general term for a location where data lives.  Different systems refer to "repositories" in different ways.  An email inbox could be a repository.  A folder in Windows could be a repository.  A cabinet in ApplicationXtender could be a repository.  It's a place to put things.  We standardize the various terms used by various storage platforms to simply "repository". 
 
These repositories are "imported" into Grooper as a '''CMIS Repository''' object, as a child of the '''CMIS Connection''' object.  This doesn't import data into Grooper in the traditional sense of importing documents into a batch.  "Importing" here is more like bringing the repository into a framework Grooper can use (creating the '''CMIS Connection''' object).  Upon importing the repository Grooper has full file access to that location in the storage platform.
 
For our purposes, repositories are like filing cabinets full of documents.  Once a connection is established, it's like giving Grooper a key to that cabinet.  You can open the various drawers of that cabinet.  You can pull out files and put files into.  The storage platform or content management system is like the cabinet.  The '''CMIS Connection''' object is like the key.  The '''CMIS Repository''' object is like a drawer in the cabinet.  You "connect" to the cabinet by turning the key.  You "import" the repository by opening the drawer.  Now you can see there are documents in there!  You can take them out.  You can read them and put them back in.  You can put new ones in.  You can use this "open" connection to the "drawer" however you need.
 
== CMIS+ Architecture ==
 
Grooper expanded on this idea in version 2.72 to create our [[CMIS+]] architecture. CMIS+ unifies all content platforms under a single framework as if they were traditional CMIS endpoints.  Prior to version 2.72, there was only one type of '''CMIS Connection''', a true CMIS connection using CMIS 1.0 or CMIS 1.1 servers.  Now, connections to additional non-CMIS document storage platforms can be made via "''CMIS Bindings''".  This provides standardized access to document content and metadata across a variety of external storage platforms.
 
Using this architecture, Grooper is able to create a simpler and more efficient import and export workflow, using a variety of storage platforms.  You now use the ''CMIS Import'' and ''CMIS Export'' providers, regardless of the storage platform.  They connect to a '''CMIS Repository''' imported from a '''CMIS Connection''' and use that as Grooper's import or export path.
 
How you create a '''CMIS Connection''' only differs from ''CMIS Binding'' to ''CMIS Binding'', as each binding has a different way of connecting to it.  You don't connect to an Outlook inbox the same way you connect to a Windows file folder, for example.
 
=== CMIS Bindings ===
 
A ''CMIS Binding'' provides connectivity logic for external storage platforms, allowing '''CMIS Connection''' objects to import and export content.  Grooper's CMIS+ architecture expands connectivity from traditional CMIS servers to a variety of on-premise and cloud-based storage platforms by exposing connections to these platforms as ''CMIS Bindings''.  Each individual ''CMIS Binding'' contains the settings and logic required to exchange documents between Grooper and each distinct platform.  For example, the ''AppXtender Binding'' contains all the information Grooper uses to connect to the ApplicationXtender content management system.
 
''CMIS Bindings'' are used when creating a '''CMIS Connection''' object.  The first step to creating a '''CMIS Connection''' is to configure the '''''Connection Type''''' property.  Which binding you use (and therefore which platform you connect to) is set here.  First, the user selects which ''CMIS Binding'' they want to use, selecting which storage platform they want to connect to.  The second step is to enter the connection settings for that binding, such as login information for many bindings.
 
=== Current CMIS Bindings ===
 
Grooper can connect to the following storage platforms using below using ''CMIS Bindings'':
 
* The [[AppXtender (CMIS Binding)|ApplicationXtender]] document management platform.
* The [[Box (CMIS Binding)|Box]] cloud storage platform.
* The [[FileBound (CMIS Binding)|FileBound]] document management platform.
* [[CMIS (CMIS Binding)|Content management systems]] using CMIS 1.0 or CMIS 1.1 servers.
* The following Microsoft content platforms
** The [[Exchange (CMIS Binding)|Microsoft Exchange]] mail server platform.
** The [[OneDrive (CMIS Binding)|Microsoft OneDrive]] cloud storage platform.
** [[SharePoint (CMIS Binding)|Microsoft SharePoint]] sites.
* [[FTP (CMIS Binding)|FTP]] (File Transfer Protocol) and [[SFTP (CMIS Binding)|SFTP]] (SSH File Transfer Protocol) servers.
* [[IMAP (CMIS Binding)|IMAP]] mail servers
* The Microsoft Windows [[NTFS (CMIS Binding)|NTFS]] file system.


== About CMIS Import ==
== About CMIS Import ==


The ''CMIS Import'' provider is split into two different '''''Import Providers'''''
There are two CMIS Import providers in Grooper.
 
* [[Import Descendants]]
* ''Import Descendants''
* [[Import Query Results]]
* ''Import Query Results''
 
These providers are designed to import files from a folder structure of an on-premise or cloud-based document storage platform.  This is the primary method of '''Batch''' creation when importing digital documents into Grooper to process them with a '''Batch Process'''. 
 
In order to do this, a few requirements must be met first.
 
# A '''CMIS Connection''' object must made and configured.  This will connect Grooper to the document storage platform.
#* This may be a connection to a Windows folder, an email inbox, a true CMIS content management system, or other document storage platforms.  What the '''CMIS Connection''' connects to is determined by the ''CMIS Binding'' selected when configuring the '''''Connection Type''''' property of the '''CMIS Connection''' object.
# A '''CMIS Repository''' must be imported.  This will create an object Grooper can use to import documents from the folders in the document storage platform.
#* This acts as a "go-between" or a "hub" for Grooper to pull in documents from the content's source. Or, you may think of this as Grooper's representation of a folder location in the document storage platform.
 
For more information on adding a '''CMIS Connection''' and importing a '''CMIS Repository''', visit the '''[[CMIS Connection]]''' article.
 
As for the difference between the ''Import Descendants'' and ''Import Query Results'' providers, you can think of ''Import Query Results'' as a more specialized version of ''Import Descendants''.
 
* ''Import Descendants'' is intended to import the full contents of a folder location.  It imports the "descendant" files of a parent folder.
* ''Import Query Results'' allows you to selectively import files using a SQL-like query (called a CMISQL query).  Only files returned by the query will be imported.  For example, using an ''Exchange'' or ''IMAP'' '''CMIS Connection''', you could query an inbox for emails from a specific sender and only import those emails.
** Note:  There are some import filtering capabilities available to ''Import Descendants'' as well using a SQL-like query.  However, the CMISQL query ''Import Query Results'' uses is much more robust.  That said, only certain ''CMIS Bindings'' can take advantage of this increased CMISQL query functionality.
** The following ''CMIS Bindings'' are '''''not''''' currently suitable for the ''Import Query Results'' provider.
*** ''NTFS''
*** ''FTP''
*** ''SFTP''
*** ''OneDrive''
 
=== Import Descendants ===
 
<tabs style="margin:20px">
<tab name="Configuration Panel" style="margin:20px">
=== Configuration Panel ===
 
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
This is the configuration screen for the ''Import Descendants'' provider.  This example uses a simple configuration to import a few PDFs from a local Windows folder.  Configuration is divided into four sections:
 
* '''''General'''''
* '''''Processing Options'''''
* '''''Disposition'''''
* '''''Batch Creation'''''
|
[[File:Cmis-import-import-decendants-1.png]]
|}
</tab>
<tab name="General Settings" style="margin:20px">
=== General Settings ===
 
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
At bare minimum, you will need to tell Grooper where to look for the imported files.  That is, mostly, what the '''''General''''' property settings are for.
 
In this case, we want to import the PDF files in this folder on the local hard drive.
|
[[File:Cmis-import-import-decendants-2.png]]
|-
|valign=top|
As discused earlier, there are some minimum requirements before configuring ''Import Desendants''.
 
# Here, a '''CMIS Connection''' has been made and a child '''CMIS Repository''' has been imported.
#* In this case, the ''NTFS'' binding was used for the '''CMIS Connection's''' '''''Connection Type'''''.  The folder named "Import and Export" was imported as the '''CMIS Repository'''.
# The folder named "Grooper Import Folder" is where we want to import from.
|
[[File:Cmis-import-import-decendants-3.png]]
|-
|valign=top|
Back top the ''Import Descendants'' configuration screen, the '''CMIS Repository''' object is used to point Grooper to this folder location for import.
 
*The '''''Repository''''' property is configured to assign the '''CMIS Repository''' where the documents are located.
** Here the '''CMIS Repository''' named "Import and Export" connecting to the "Import and Export" folder of the local drive.
 
*The '''''Base Folder''''' property is configured to traverse the folder structure of the '''CMIS Repository'''.
** Here, we don't want to import ''all'' documents from ''every'' folder in the "Import and Export" folder.  We just want to import from the "Grooper Import Folder".
 
*The '''''Import Filter''''' property allows you to perform some basic import filtering to selectively choose which documents you want to import.
** <code>SELECT * FROM File</code> is the default filter.  It will import all files from the selected folder location.
** This is a SQL-like query to specify conditions for document import.  However, the ''Import Query Results'' provider was created to expand on this functionality and provides more filtering options as well as a simpler interface to perform the query (for the ''CMIS Bindings'' capable of utilizing this functionality).
 
*The '''''Content Type''''' property allows you to optionally assign the incoming documents with a '''Document Type'''.
** You can use this property to assign a default classification for all incoming documents.
|
[[File:Cmis-import-import-decendants-4.png]]
|}
</tab>
<tab name="Processing Options Settings" style="margin:20px">
=== Processing Options Settings ===
 
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
The most important part of the '''''Processing Options''''' property section is the '''''Import Mode''''' property.
 
The '''''Import Mode''''' property allows control over the connections Grooper makes and/or retains to the imported documents.
 
For importing, documents contain two important sets of information:
 
* Content - Images and native text data
* Properties - Metadata associated with the file. Digital information, such as the document's filename, file type, creation date, and more.
 
Depending on the '''''Import Mode''''' selected all, some, or none of this information will be copied to your Grooper Repository's file store (in the case of the document's content) and database (in the case of the document's properties).  See below for more in depth explanation of each of the '''''Import Mode''''' options.
 
 
|valign=top|
[[File:Cmis-import-import-decendants-5.png]]
|}
 
''Full''
 
* Both properties and content will be loaded. This is a total duplication of the document from its source to your Grooper Repository's local file store.  This is the slowest import mode, because the full content of each document is copied during a ''single-threaded'' import process. As such, this mode is not well-suited for high-volume imports, but provides some useful advantages in low-volume import scenarios.
 
*For example, ''Full'' mode allows items to be deleted immediately on import. Also, ''Full'' mode avoids the need for any follow-up content loading operations in the '''Batch Process'''.
 
''Sparse''
 
*Properties will be loaded, but content will not. This mode is much faster than a ''Full'' import, because no content files are copied into your local Grooper file store. Instead, a link is saved on each Grooper document, and content is retrieved on demand directly from the '''CMIS Repository'''. This type of document is often referred to as a "sparse" document. Sparse documents can be used just like any other document, with the caveat that display and processing speeds may be reduced.  Grooper has to traverse the document link in order to display or process the document's image.
 
* However, after a ''Sparse'' import, document content can be loaded multi-threaded using the '''Execute''' activity in a '''Batch Process'''.  This can overall lead to importing a document's content faster than a ''Full'' import.  While the
** Choose ''CMIS Document Link'' as the '''''Object Type''''' and ''Load Content'' as the '''''Command'''''
 
''Link Only''
 
* No content or properties will be loaded, making this the fastest import mode. It imports nothing more than a link to each document, and offloads all property and content loading to parallel operations in the Batch Process.
 
* However, this does not produce a usable document in Grooper.  After a ''LinkOnly'' import, document content ''must'' be loaded using the '''Execute''' activity in a '''Batch Process'''.
** Choose ''CMIS Document Link'' as the '''''Object Type''''' and ''Load Content'' as the '''''Command'''''
 
* You can think of the ''Link Only'' option as an even sparser sparse import.
 
 
See the table bellow for a summary of the '''''Import Mode''''' options.
 
{|cellpadding=10 cellspacing=5
|'''''Import Mode'''''||'''Speed'''||'''Comments'''
|-valign=top
|''Full''||Slow||Full import of content and their properties.
* Required if deleting content from the source on import.
|-valign=top
||''Sparse''||Fast||Imports a link to the document's source and its properties but not their content.
* This produces a usable document in Grooper without copying the full content into Grooper, saving time upon import.
* This mode is the same as enabling the old '''''Sparse Import''''' property in previous versions.
|-valign=top
||''Link Only''||Fastest||Only imports a link to the document's source.
*Does not produce a usable document. The document's properties must be loaded in a step in a Batch Process.
|}
</tab>
<tab name="Disposition Settings" style="margin:20px">
=== Disposition Settings ===
 
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
The '''''Disposition''''' property settings allow you to do something with the source documents after importing them into Grooper, namely delete them, move them, or do nothing and just leave them alone where they came from.  This is often leveraged with the '''Import Watcher''' Grooper service to prevent repeatedly importing the same document.
 
In our example here, the '''''Move to Folder''''' property is configured to move the PDF documents to a folder named "Imported Documents".
* The folder location you're moving documents to ''must'' be accessible via the connected '''CMIS Repository'''.
 
If using the ''Full'' '''''Import Mode''''', you can enable the '''''Delete Item''''' property to delete each document after it is imported into the Grooper '''Batch'''.
* This property is ONLY available when choosing the ''Full'' '''''Import Mode.'''''  A sparsely imported document needs to call to the import storage location in order to load the document's image for display or processing.  If you deleted the document upon import, you wouldn't be able to view it or do anything with it.
 
The '''''Update Properties''''' property allows you to alter the document's property values upon import.  Property values are updated using a list of "key-value pairs" where the "key" is the name of the property and the "value" is what change you want to make to that property. You can type one entry per line in the format <code>key=value</code>.
* Examples:
* <code>Archive=true</code> Sets the archive attribute on a file
* <code>Status=PENDING</code> Sets the "Status" field on ApplicationXtender documents.
* <code>Imported=true</code> Sets the "Imported" field on SharePoint documents.
* <code>IsRead=true</code> Sets the "IsRead" flag on an Exchange message.
|
[[File:Cmis-import-import-decendants-6.png]]
|}
</tab>
<tab name="Batch Creation Settings" style="margin:20px">
=== Batch Creation Settings ===
 
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
It's likely you're importing documents because you want to run them through a '''Batch Process'''.  The '''''Batch Creation''''' property settings allow you to define which '''Batch Process''' you wish to use to process the imported documents.
 
This is done using the '''''Starting Step''''' property, selecting a '''Batch Process Step''' in a '''Batch Process''' from the published '''Batch Processes''' in the Grooper Repository.  Upon import, a new '''Batch''' is created with each document as a '''Batch Folder''', and the selected '''Batch Process''' assigned to the '''Batch'''.
 
There are also further properties to control '''Batch''' creation.  You can limit the number of documents imported per '''Batch''' using the '''''Maximum Items per Batch''''' property.  By default, new '''Batches''' are named with a date/time stamp.  However, the '''''Batch Name Prefix''''' allows you to tack on a prefix to the '''Batch's''' name for easier identification.  The '''''Start Paused''''' property will automatically trigger the '''Batch Process''' if set to ''False''.
|
[[File:Cmis-import-import-decendants-7.png]]
|}
</tab>
</tabs>
 
=== Import Query Results ===
 
<tabs name="margin:20px">
<tab name="Configuration Panel" style="margin:20px">
=== Configuration Panel ===
 
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
The ''Import Query Results'' provider's configuration panel is almost identical to the ''Import Descendants'' provider's configuration panel.  ''Both'' providers share the same '''''Processing Options''''', '''''Disposition''''', and '''''Batch Creation''''' property settings.  See the [[#Import Descendants|Import Descendants]] section for brief descriptions of these property sections.


The big difference between the two providers is the highlighted '''''CMIS Query''''' property. This allows users to enter a SQL-like query (called a CMISQL query) to selectively import documents from their source, based on certain metadata properties. Only files returned by the query will be imported.  For example, you may want to only import documents of a certain file type. You could include the file extension as the query condition.  You can use CMISQL queries to easily filter email messages when importing from an inbox.  If you only wanted to import messages from a certain sender, you could include the sender's email address as the query condition.
Both these providers are used to import files from '''CMIS Repositories''' for Batch processing in Grooper. It will import files from a folder structure of an on-premise or cloud-based document storage platform.
:*<li class="fyi-bullet"> While less common, Import Descendants and Import Query Results can also import ''folders'' from CMIS Repositories. However, since importing files is most common, we focus on importing ''files'' in this article.


{|cellpadding="10" cellspacing="5"
Just like any other Import Provider, Import Descendants and Import Query Results are used to submit "'''Import Jobs'''". Import Jobs are how Grooper brings in files from a storage location for processing. For example, it's how PDFs from a Windows folder get into Grooper or messages from an email inbox get into Grooper. When an Import Job runs, Grooper first creates a Batch and then creates a Batch Folder for each imported file. A copy of the file is attached to the Batch Folder. This becomes the Batch Folder's "attachment" and is used when applying activities like "Split Pages".
|-style="background-color:#f89420; color:white"
:*<li class="fyi-bullet"> When files are imported into Grooper, a link to that file is stored on the Batch Folder. This link maintains a connection between the file's source location and the document in Grooper. This link also makes "Sparse" imports possible. [[#Import Mode (and "Sparse" imports)|See below for more.]]
|style="font-size:22pt"|'''&#9888;'''||Only certain external storage platforms are currently queryable with the '''''CMIS Query''''' property. The following ''CMIS Binding'' sources '''''cannot''''' be queried currently. As such, they are '''''not''''' suitable for ''Import Query Results''. You should instead use ''Import Descendants'' for the following ''CMIS Bindings''.


* ''NTFS''
* ''FTP''
* ''SFTP''
* ''OneDrive''
|}
|
[[File:Cmis-import-import-query-results-8.png]]
|-
|valign=top|
Just like with ''Import Descendants'', there are some minimum requirements before configuring ''Import Query Results''.  A '''CMIS Connection''' object must be created and a '''CMIS Repository''' must be imported.


# Here, a '''CMIS Connection''' has been made and a child '''CMIS Repository''' has been imported.
Import Jobs are submitted in one of two ways:
#* In this case, the ''Exchange'' binding was used for the '''CMIS Connection's''' '''''Connection Type'''''. This binding is used to connect Grooper to Microsoft Exchange email servers.
* '''By a user from the Imports page''': Ad-hoc or "user directed" Import Jobs are submitted from the [[Imports Page]], using the "Submit Import Job" button.
# As you can see, all the folders in this email inbox are accessible to Grooper.
* '''From an Import Watcher service''': Automated or "scheduled" Import Jobs are submitted by an '''[[Import Watcher]]''' service according to its Poling Loop or Specific Times specification.
|
In both cases, an "Import Descendants" or "Import Query Results" can be selected and configured using using the "Provider" property.
[[File:Cmis-import-import-query-results-9.png]]
|-
|valign=top|
# To enter the CMISQL query, first use the '''''Repository''''' property to select the '''CMIS Repository''' you are importing from.
# Select the '''''CMIS Query''''' property.
#* ''Note: If you select a '''CMIS Repository''' that is '''not''' queryable (such as NTFS repositories), this property will '''not''' be displayed.''
# Press the ellipsis button at the end to bring up the CMIS Query editor window.
|
[[File:Cmis-import-import-query-results-10.png]]
|}
</tab>
<tab name="CMIS Query Configuration" style="margin:20px">
=== CMIS Query Configuration ===


{|cellpadding=10 cellspacing=5
<section begin="import_query_results_and_descendants_similarities"/>
|valign=top style="width:40%"|
=== Similarities and differences between Import Query Results and Import Descendants ===
Upon pressing the ellipsis button at the end of the '''''CMIS Query''''' property, the following window will appear.


This interface allows you to configure the CMISQL query based on available metadata from the ''CMIS Binding''. For example, the ''Exchange'' binding has a selection of queryable metadata for email messages, such as the email's subject, sender and date the message was received.
Overall, "Import Descendants" is a "simpler" version of "Import Query Results".
:*<li class="fyi-bullet"> We advise to use Import Query Results over Import Descendants, when possible.
:** Import Query Results can do everything Import Descendants can do and more.
:** Import Query Results has more robust file filtering capabilities. This allows for more targeted, selective imports.
:** Import Query Results is newer (and better maintained) than Import Descendants.
:** There are only a handful of scenarios where Import Descendants must be used over Import Query Results.


The query example here selectively filters an email inbox based on the following conditions:


# Only email messages are to be imported.
<big>Similarities</big>
#* This is controlled by the '''''Content Type''''' property, here set to ''Message''.  Different ''CMIS Bindings'' have different '''''Content Types''''', depending on the storage platform.  Some platforms are simpler and only have ''File'' and ''Folder'', corresponding to files and folders in the storage platform.  Some, such as ''Exchange'' have additional '''''Content Types''''' for different types of content.  The ''Message'' '''''Content Type''''' corresponds to email messages.  By limiting the '''''Content Type''''' to ''Message'' we aren't going to import other content available to the ''Exchange'' binding, such as appointments or contacts.
* Both providers import files from a CMIS Repository.
# All properties are going to be searched.
* Both providers have the same Batch Creation settings.
#* This is filtered by the '''''Select Elements''''' property.  You can choose to limit which metadata properties are queried using this property.  The <code>*</code> character indicates all properties are queried.
* Both providers are capable of "Sparse" imports by changing the "Import Mode" to "Sparse".
# Only messages in the "Wiki" folder in the inbox will be imported.
* Both providers can dispose of files on import (using the "Delete Item", "Move Item", or "Update Properties")
#* The '''''Search Scope''''' property allows you to control where in the storage platform's folder hierarchy you wish to search.  If you leave this property blank, the ''entire'' repository's folder structure will be queried.
# Only messages with certain properties are to be imported.  Only emails sent by "cdearner@bisok.com", with "Wiki Vitals" in the title, sent after 12/01/2020 that have ''not'' been read should be imported.
#* These properties are filtered by the property search grid.  Here, all queryable metadata for the ''CMIS Binding'' and selected '''''Content Type''''' are displayed.  For each property, you can use the operator column and search value column to indicate what conditions must be met for import.
#* Note:  For text searching (as we did to query the "Subject" and "Sender" properties), use the "LIKE" operator and place percentage symbols (<code>%</code>) before and after your search string.
|valign=top|
[[File:Cmis-import-import-query-results-11.png]]
|-
|valign=top|
This configuration editor writes the CMISQL query for you.  You can verify the query using the "CMISQL" tab.


# Switch to the "CMISQL" tab.
<big>Differences</big>
# In the text editor here, you can see the full CMISQL query. You can also us this editor to manually type out full queries yourself.
|
[[File:Cmis-import-import-query-results-12.png]]
|-
|valign=top|
Whether in the "Basic Search" tab or the "CMISQL" tab you can verify the results of the query, using the "Execute Query" button.  This will display a list of items that will be imported by the ''Import Query Results'' provider.


# Press the "Execute Query" button.
The biggest difference is in how the providers determine which files are imported (import criteria).
# A list of items satisfying the query conditions will populate in the list below.
* Import Descendants will import all files from a target location. ''This includes all files in all subfolders if present''. You can, however, set a "Base Folder" within the CMIS Repository.
# Press the "OK" button when finished configuring your query.
* Import Query Results will import files that match a [[CMIS Query]]. This is a specialized query language based on SQL syntax. This gives you many more options for import conditions, using a "WHERE" clause in the query. CMIS Queries also give you the capability to restrict imports to a folder location without importing files in subfolders (This is something Import Descendants ''cannot'' do).
|
:*<li class="fyi-bullet"> Import Descendants does have an "Import Filter" it can use to set import conditions. It also uses a SQL-like syntax. However, it is not as advanced as the CMIS Queries that Import Query Results uses.
[[File:Cmis-import-import-query-results-13.png]]
|}
</tab>
<tab name="Example Queries" style="margin:20px">
=== Example Queries ===


These are samples of a query string.  They take the following general form.


<code>SELECT * FROM <ContentType> WHERE <Criteria> ORDER BY <Sort></code>
<big>CMIS Repositories that can only use Import Descendants</big>


Let's break each component, or "clause", to get a better idea of how this works
Certain CMIS Bindings are '''not queryable''' using CMIS Queries. Because of this, certain CMIS Repositories '''''cannot''''' utilize Import Query Results. The following CMIS Repositories must use Import Descendants to import file content:
* FTP
* SFTP
* NTFS (only if the directory has ''not'' been indexed by the Windows Search service or the Windows Search service is not running)
<section end="import_query_results_and_descendants_similarities"/>
== Prereqs: CMIS Repository ==


=== SELECT ===
A CMIS Repository allows Grooper access to files and folders within a storage platform.


This specifies which properties are to be returned with query results.
Because Import Descendants imports from a CMIS Repository, you can import from numerous storage platforms determined by the "CMIS Binding" used. These CMIS Bindings include:
* [[NTFS]] to connect to Windows folders
* [[FTP]] to connect to FTP directories
* [[SFTP]] to connect to SFTP directories
* [[Exchange]] to connect to Outlook inboxes
* [[SharePoint]] to connect to SharePoint sites (and document libraries)
* [[OneDrive]] to connect to OneDrive drives
* [[Box]] to connect to Box accounts
* [[AppXtender]] to connect to AppEnhancer applications


If you are querying all properties the asterisk or <code>*</code> will indicate all properties should be returned.
Before you can import files from these platforms using Import Descendants or Import Query Results, there's some setup required in the Grooper Design page. You must:
# Create and configure a CMIS Connection.
# Import a folder location as a CMIS Repository.


For example: <code>SELECT *</code>
This will allow you to import files from folders accessed by the CMIS Repository. For information on CMIS Connections and CMIS Repositories, including how to create them in Grooper, visit the [[CMIS Connection]] page.


Otherwise, you will list them out separated by commas.
[https://app.supademo.com/demo/cm8ddjcr91rb92ugqp7k9phqo Click here for an interactive walkthrough.]
<section begin="shared_cmis_import_settings"/>
== Settings shared between Import Descendants and Import Query Results ==


For example:  Let's say you are querying an Exchange repository and you only want to search the sender and recipients properties. You'd type <code>SELECT Sender, ToRecipients, CcRecipients, BccRecipients</code> to limit your query to only those four properties.
The following properties/settings are part of configuring either Import Descendants or Import Query Results.
* '''Import Mode''' - Configuring this property allows you to perform "Sparse" imports to speed up the time it takes to ingest large numbers of files.
* '''Batch Creation settings''' - These settings control how Batches are created on import, including which Batch Process is used.
* '''File disposition options''' - Optionally, you can "dispose" of the source file after it has been imported into Grooper. These settings allow you to delete the file, move it or update one of its properties.


=== FROM ===
=== Import Mode (and "Sparse" imports) ===
<section begin="import_modes" />
[[File:2023_CMIS_Import_02_About_CMIS_Import_02_1_Import_Descendants_Processing_Options_Settings_01(2).png|right]]
<big>What is an "Import Mode"?</big>


This clause indicates the type of object to search for.  This will be a content type defined in the CMIS Repository. If the content type is document based, the query result will be a CMIS Document.  If it is folder based, it will be a CMIS Folder.
The Import Provider's "Import Mode" controls how file content and data associated with that file (its properties and any metadata field values in a content management system) are imported into Grooper. Practically speaking, this has an effect on the overall speed of the import process.


The content type specified in the <code>FROM</code> clause has two jobs. One, it defines what properties are available to the other clauses. Two, it limits the scope of the search to only objects of the type specified in the clause.
Files can be imported using one of three "Import Modes":
* '''Copy''' - The files are fully copied to the Grooper Repository on import.
** Certain file properties (file name, size, MIME type, attributes, the source import file's location, etc.) are stored in the CMIS Document Link attached to the Batch Folder.
** A copy of the file is attached to the Batch Folder and stored in the Grooper File Store.
** If using an Import Behavior, properties/metadata are mapped to the Batch Folder's Data Fields.
* '''Sparse''' - Only the file properties and mapped metadata are copied to the Grooper Repository.
** Certain file properties (file name, size, MIME type, attributes, the source import file's location, etc.) are stored in the CMIS Document Link attached to the Batch Folder.
** If using an Import Behavior, properties/metadata are mapped to the Batch Folder's Data Fields.
** The file content is ''not'' copied over to the Grooper File Store. Instead, it is accessed by the link attached to the Batch Folder. The file can be copied to the Grooper File Store with the "CMIS Document Link > Load" command.
* '''Link Only''' (seldom used) - Nothing is copied to the Grooper Repository. Only a link to the source file is attached to the Batch Folder.
** Certain file properties (file name, size, MIME type, attributes, the source import file's location, etc.) are stored in the CMIS Document Link attached to the Batch Folder.
** File content and mapped properties/metadata must be brought into Grooper with the "CMIS Document Link > Load" command.


For example:  Let's say you are querying an Exchange repository and want to search email messages and not contacts or tasks or appointments.  You'd type <code>FROM Message</code> to limit your query to just the Message content type.


=== WHERE ===
Sparse imports serve two functions:
* ''Primarily'', they are used to speed up the overall import operation. This is actually a two step process.
*# Enable the "Sparse" Import Mode when configuring the Import Provider.
*# Have the first step in the Batch Process fully copy the files into the Grooper Repository (using the Execute activity and the "CMIS Document Link > Load" command).
* They can also be used to avoid file duplication between the import source location and the Grooper File Store.
** A fully copied import creates a copy of the file in the Grooper File Store. A sparsely imported document is fully usable in Grooper, but no such copy exists in the File Store.
** Instead, Grooper travels the link every time it needs to access the file (Example: When the file's image is pulled up in the Document Viewer).
** While this does save on storage between two systems (Grooper and the import source), it ''does not'' save on processing time. Every time Grooper needs to access the document to view it, execute a command, or run an activity, it will take some time to travel the document link and fetch the document. Depending on latency, it may be preferable to load the file into Grooper even if it does duplicate the file (The file can always be removed from Grooper with a Dispose step at the end of a Batch Process too).


This is how you define what search conditions must be met to be included in your set of returns.  Multiple conditions can be joined with the AND or OR or NOT operators.  You can change the order of operations by using nested parenthesis.  Each condition is followed by a predicate.  The following is a list of predicates.  Note not every property type may be able to utilize every predicate.  For example, the Subject property on the Exchange binding cannot use the "=" operator.


{|cellspacing="5" cellpadding="10"
<big>How does using "Sparse" speed up import?</big>
|'''Predicate'''||'''Description'''||'''Example'''
|-style="background-color:#ddf5f5"
|Comparison Predicate||Specifies a condition for an individual property using comparisons, such as "equals to" or "less than".  The <code>LIKE</code> and <code>IS</code> operators are also a comparison predicates.||<code>invoice_date<'12/31/2007'</code>
|-style="background-color:#ddf5f5"
|<code>IN</code> Predicate||Specifies a list of allowed values for a property.  This list is separated by commas.||<code>FileExtension IN ('.pdf', '.docx', '.xlsx')</code>
|-style="background-color:#ddf5f5"
|<code>CONTAINS</code> Predicate||Specifies a full-text query.  You can use AND, OR and NOT operators.||<code>CONTAINS('mortgage AND payment AND NOT vehicle')</code>
|-style="background-color:#ddf5f5"
|Scope Predicate||Restricts the search scope to children or descendants of a folder||<code>IN_FOLDER(/Inbox)</code>
|}


Note: The NOT operator cannot be used with the <code>IN_FOLDER</code> or <code>IN_TREE</code> predicates.
It increases the parallelism of the overall import operation.


For example:  Let's say you are querying an Exchange repository and want to find an email which contains the words "cake" and "free" but not "birthday" in it, that was not received last Christmas Day found in the inbox folder and has attachments.  That would look something like <code>WHERE IN_FOLDER(/Inbox) AND CONTAINS('cake AND free AND NOT birthday') AND (DateTimeReceived<>'12/25/2018') AND (HasAttachments=False)</code>
Import operations must run "single threaded" in Grooper. That means regardless of how much compute your server has, it's only ever going to use a single processing thread to import files.


=== ORDER BY ===
When you're importing hundreds or thousands of documents by copying them from a source location to the Grooper File Store, it takes a long time for the Import Job to complete.
* By only importing a link to the file content, Sparse mode ''dramatically'' speeds up the time it takes to get a usable document into Grooper.
* To take full advantage of your system's resources, the first step in your Batch Process should be "Execute" using the "CMIS Document Link > Load" command. This will allow you to load the files into the Grooper File Store using multiple threads.
**<li class="attn-bullet"> Be Aware: The "Load" command has three modes (1) Content (2) Properties and (3) Full. For "Properties" and "Full" to work appropriately, the Batch Folders must be classified on import and use an Import Behavior to map the properties.
* The end result is the overall import operation will be as if you had used the "Copy" mode. But it will be done in a way that runs multi-threaded.
<br clear=all><section end="import_modes" />


This is an optional clause which allows you to specify the order in which results are returned.  You can sort by multiple properties using a comma separated list.  Optionally, each property name may be followed by <code>ASC</code> or <code>DESC</code> to indicate ascending or descending sort direction.  The default sort direction is ascending.
=== Batch Creation settings ===


For example:  If you wanted to sort a query of an Exchange repository by both whether they have attachments and by size in descending order you would type <code>ORDER BY HasAttachments, Size DESC</code>
[[File:2023_CMIS_Import_02_About_CMIS_Import_02_1_Import_Descendants_Batch_Creation_Settings_01.png|right]]


=== Putting It All Together ===
The Batch Creation settings allow you to define which '''Batch Process''' you wish to use to process the imported files.


Let's mash all our examples together and search for email messages in the Inbox that have the words "cake" and "free"  but not "birthday" in the body, received any day besides Christmas Day.  I'm going to go ahead and search all the properties available to me, and I want to sort the results by whether the message has attachments and by size in descending order.  This would be the resulting query
You '''''must''''' configure the "Starting Step" property to assign the Batch Process.
* Use the property's dropdown editor to select a Batch Process Step from a list of Batch Processes.
* Only published Batch Processes will appear in this list.


<code>
SELECT * FROM Message
WHERE IN_FOLDER(/Inbox) AND CONTAINS('cake AND free AND NOT birthday') AND (DateTimeReceived<>'12/25/2018')
ORDER BY HasAttachments, Size DESC
</code>


=== More Examples ===
Other notable Batch Creation properties:
* "Start Paused" - This determines if the Batch starts in a paused state or not. If "False", the first step's tasks will be automatically submitted to Activity Processing services. If "True", you will have to manually start the Batch.
* "Max Items Per Batch" - The default is 2500, meaning each Batch will have a maximum of 2500 Batch Folders before creating a new Batch on import. For users who want more Batches with fewer documents, lower this number.
* "Organize By Date" - This will organize Batches into subfolders in the Production branch in Grooper according to the year / month / day the Batch was created.
* "Priority" and "Increment Priority" - Controls the task processing priority for the Batch. "Increment Priority" is useful when submitting large user-directed imports from the Imports Page to ensure the first Batch created is the first that is fully processed by Activity Processing services.
<br clear=all>


{|cellspacing="5" cellpadding="10"
=== File disposition settings ===
|'''Filter'''||'''Description'''
[[File:2023_CMIS_Import_02_About_CMIS_Import_02_1_Import_Disposition_Settings_01.png|right]]
|-style="background-color:#ddf5f5"
The "Disposition" settings allow you to do something with the source files after importing them into Grooper. This is important when using an Import Watcher service to schedule imports. If you do not configure a Disposition property, the imported file will remain in the same state after the Import Job completes. This can cause the Import Watcher to repeatedly attempt to import the same file over and over again.
|<code>SELECT * FROM File</code>||Import all descendant files.  This will import all files in the repository without any foldering.
|-style="background-color:#ddf5f5"
|<code>SELECT * FROM File WHERE AT_LEVEL(1)</code>||Import files which are immediate children.  This will only import files at that level, not from subsequent levels.
|-style="background-color:#ddf5f5"
|<code>SELECT * FROM Folder</code>||Import folders which are immediate children.  This will import both files and their foldering.
|-style="background-color:#ddf5f5"
|<code>SELECT * FROM File WHERE cmis:name MATCHES '^\d{4}-\d{2}-\d{2}'</code>||Import files with a specific naming pattern, using regular expression.
|-style="background-color:#ddf5f5"
|<code>SELECT * FROM File WHERE cmis:name LIKE 'ca%'</code>||Import files with a name starting with ca.
|-style="background-color:#ddf5f5"
|<code>SELECT * FROM File WHERE cmis:contentStreamLength > 10000</code>||Import files larger than 10,000 bytes.
|}
</tab>
</tabs>


== Version Differences ==
There are three "Disposition" options:
* "Delete Item" - Turning this to "True" will simply delete the source file after the Import Job completes.
**<li class="attn-bullet"> You may only configure this option if the "Import Mode" is set to "Copy".
* "Move To Folder" - This will move the files to a different folder in the CMIS Repository after the Import Job completes.
* "Update Properties" - This will update one or more file properties after the Import Job completes. Properties are updated by listing them as "key-value pairs" in the "Update Properties" list editor where <code>key=value</code>.
** Examples:
**<code>Archive=false</code> Sets the archive attribute on each imported file to "false".
** <code>IsRead=true</code> Marks each imported email message as read.
** <code>Status=PENDING</code> Sets a "Status" field on a document in an AppEnhancer application (assuming there is a "Status" field in the application).
<br clear=all>
<section end="shared_cmis_import_settings"/>


=== Legacy Providers (2.72) ===
== Example Import Descendants configuration ==


Old import and export providers should be replaced with this new functionality. While Grooper's older import and export providers are available as "Legacy Import" and "Legacy Export" providers, these components are depreciated. They will still function but will no longer be upgraded in future versions of Grooper.
Regardless of the platform you're accessing with a CMIS Repository, you configure Import Descendants largely the same. Just pick a CMIS Repository and configure the rest of Import Descendants as needed.
* Import Descendants will import all files in the base folder, ''including all descendant files in any subfolders if present.''
* Unless you configure the "Base Folder" property, Import Descendants will start at the root of the CMIS Repository and continue down the folder structure.
* When using Import Descendants, setting the "Base Folder" to a terminal branch in the folder structure (a folder with no subfolders) is the only way to import files from a folder without importing descendant files (because there are none in this case).
**<li class="fyi-bullet"> A more technical way of saying this is Import Descendants does not support the <code>IN_TREE</code> CMIS search predicate. It only supports the <code>IN_FOLDER</code> predicate.


Grooper can import documents using '''CMIS Connections''' via ''Import Descendents'' and ''Import Query Results''.  Grooper can export via the ''CMIS Export'' providers, ''[[CMIS Export#Mapped Export|Mapped Export]]'' and ''[[CMIS Export#Unmapped Export|Unmapped Export]]''.
<big>Example: Submitting Import Descendants from the Imports Page</big>


=== New Connection Types (2.72) ===
[https://app.supademo.com/demo/cm8d4b2wt1f812ugqwtbi17bu Click here for a step by step walkthrough.]


By creating the [[CMIS+]] architecture, we have been able to create new connections between Grooper and content management systems. Grooper can now connect to Microsoft OneDrive, SharePoint, and Exchange via new ''CMIS Bindings''. Since these were created as ''CMIS Bindings'', they can be used by the ''CMIS Import'' and ''CMIS Export'' providers.  Instead of having to create three new import providers and three new export providers for a total of six brand new components, we can use the already established CMIS import and export providers in the CMIS+ framework. A user can create a '''CMIS Connection''' using the ''OneDrive'', ''SharePoint'' or ''Exchange'' bindings, and use the same import and export providers for them as any of the other ''CMIS Bindings''.
# Go to the Imports Page.
# Press the "New Import Job" button.
# This brings up the "Submit Import Job" editor.
# Enter a description in the Description property (This is required).
# Open the "Provider" dropdown (Press the "☰" button).
# Select "Import Descendants" from the dropdown list.
# Expand the Provider settings to configure it.
# Open the "Repository" node selector (Press the "☰" button).
# Select the CMIS Repository you wish to import from.
# Select a Base Folder, as needed.
# Configure the Import Mode property, as needed.
# Configure the Batch Creation settings, as needed.
# Configure the file disposition options (Delete Item, Move To Folder, or Update Properties), as needed.
# Configure any remaining Import Descendants properties, as needed.
# Press the "Submit" button when finished.
# Your Import Watcher service will pick up and execute the Import Job.


This will also allow Grooper to create ''CMIS Bindings'' to connect to currently unavailable content management systems in the future much quicker and easier.
== Example Import Query Results configuration ==


=== Import Mode (2.72) ===
Import Query Results relies on a "CMIS Query" to import files from a CMIS Repository. The CMIS Query (aka CMISQL Query) uses a syntax structure similar to a SQL query. Instead of querying rows in a database based on column values, you're querying documents in a storage location based on file property and metadata values.
* The general CMIS Query format is: <code>'''SELECT * FROM''' ''<a type of document in the CMIS Repository>'' '''WHERE''' ''<according to certain search conditions>''</code>
* What "type of document" in can search for in the FROM clause is determined by the storage platform and the CMIS Binding.
** Example: For the NTFS binding you select "<code>FROM File</code>" for files in a Windows folder.
** Example: For the Exchange binding you select "<code>FROM Message</code>" to search for email messages.
* In the <code>WHERE</code> clause, you can set search parameters based on file properties and metadata values called "CMIS properties". What CMIS properties are "queryable" will also depend on the the CMIS Repository and its CMIS Binding.
** Example: The Exchange binding has a queryable "Subject" property.
** Example: Fields in a Box metadata template are queryable for the Box binding.
**<li class="fyi-bullet">Which CMIS properties are queryable can be determined by (1) navigating to the CMIS Repository in the Grooper node tree (2) going to the "Types" tab (3) selecting the CMIS document type whose properties you want to inspect and (4) reviewing the "Queryable" column for each CMIS property.
* The <code>WHERE</code> clause is also used to set the folder scope, using the <code>IN_FOLDER</code> and <code>IN_TREE</code> predicates (where supported).
* More information on CMIS Queries (including unsupported query configurations for various CMIS Bindings) can be found in the [[CMIS Query]] article.


In version 2.72 the '''''Import Mode''''' property replaces previous versions' '''''Sparse Import''''' property.
<big>Example: Submitting Import Query Results from the Imports Page</big>


=== Import Disposition (2.72) ===
[https://app.supademo.com/demo/cm8ergggh03ij12zdakzyt3jp Click here for a step by step walkthrough.]


2.72 adds the '''''Import Disposition''''' property to CMIS Import.  This allows you to change your documents disposition upon importing them into Grooper. You can delete them, move them to a folder, or update one or more properties on the document itself. This can be leveraged with '''Import Watcher''' to prevent repeatedly importing the same document.
# Go to the Imports Page.
# Press the "New Import Job" button.
# This brings up the "Submit Import Job" editor.
# Enter a description in the Description property (This is required).
# Open the "Provider" dropdown (Press the "☰" button).
# Select "Import Query Results" from the dropdown list.
# Expand the Provider settings to configure it.
# Open the "Repository" node selector (Press the "☰" button).
# Select the CMIS Repository you wish to import from.
# Open the "CMIS Query" editor (Press the "..." button).
# Enter the CMIS Query by either:
#* Typing it into the Query Editor.
#* Or, using the Query Editor's property grid to construct the query (The text will populate the Query Editor as you configure these properties).
#**<li class="fyi-bullet"> A more in depth explanation of the Query Editor and CMIS Queries can be found in the [[CMIS Query]] article.
# Configure the Import Mode property, as needed.
# Configure the Batch Creation settings, as needed.
# Configure the file disposition options (Delete Item, Move To Folder, or Update Properties), as needed.
# Configure any remaining Import Descendants properties, as needed.
# Press the "Submit" button when finished.
# Your Import Watcher service will pick up and execute the Import Job.

Latest revision as of 15:59, 28 May 2025

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 202320212.90

CMIS Import refers to two Import Providers used to import content from settings_system_daydream CMIS Repositories: Import Descendants and Import Query Results. CMIS Imports allow users to import from various on-premise and cloud based storage platforms (including Windows folders, Outlook inboxes, Box accounts, AppEnhancer applications and more).

Documents are imported from CMIS Connections using either the Import Descendants or Import Query Results providers. These can be used in two ways:

  • To perform manual "ad-hoc" imports when creating a new Batch on the "Imports" page.
  • To perform automated, scheduled imports using one or more Import Watcher Grooper services.

Import Descendants will import all documents within a designated folder location of a CMIS Repository. Import Query Results allows you to use a query syntax similar to a SQL query (called a CMISQL query) to set conditions for import based on the item's available metadata, such as a documents name, file type, creation date, archive status, or other variables.

About CMIS+

"CMIS" stands for "Content Management Interoperability Services". It is an open standard that allows different content management systems to inter-operate over the Internet. Grooper expanded on this idea in version 2.72 to create our "CMIS+" architecture. CMIS+ unifies all content platforms under a single framework as if they were traditional CMIS endpoints.


Now, Grooper connects to all available external storage platforms by creating and configuring a CMIS Connection.

  • Once a CMIS Connection is created, Grooper can "interoperate" with these platforms.
  • "Interoperability " means Grooper has the same access to control the system as a human being does.
  • Grooper has a "one-to-one" connection to the platform, allowing full and total control.
  • Because we standardize connection to non-CMIS systems, this includes platforms like NTFS file systems (Windows) that are not CMIS servers.


Using this architecture, Grooper is able to create a simpler and more efficient import and export workflow, using a variety of storage platforms.

  • You now use CMIS Import providers and CMIS Export for any storage platform you can connect to with a CMIS Connection.
  • This also speeds up development for adding new connection types for import/export operations.

Anatomy of a CMIS Connection

When connecting Grooper to external storage platforms, you'll start by creating a CMIS Connection. There are three important parts to understanding a CMIS Connection:

  1. The CMIS Connection itself
  2. The platform it's connecting to. This is defined by the "CMIS Binding" (aka "connection type") selected for the CMIS Connection's "Connection Settings".
  3. Its child CMIS Repositories
    • "Repository" is just a general term for a location where data lives. Different systems refer to "repositories" in different ways.
      • A folder in Windows could be a repository. An email inbox could be a repository. A document library in SharePoint could be a repository. An application in ApplicationEnhancer (formerly ApplicationXtender) could be a repository.
      • "Repository" is a normalized way of referring to various terms used by various storage platforms.


For newer users, the difference between a CMIS Connection and a CMIS Repository can be confusing. The key distinction is as follows:

  • CMIS Connections connect to storage platforms.
    • It's the phone number you dial.
    • The specific platform you're connecting to is defined in its "Connection Settings".
  • CMIS Repositories represent a location within the connected platform.
    • It's the person on the other end of that phone number you want to talk to.
    • CMIS Repositories represent storage locations (typically folders) in the storage platform. They are added as children to a parent CMIS Connection.
    • The CMIS Repository nodes are what Grooper actually uses when configuring import/export operations.
      • You don't talk to a phone number. You talk to a person.
      • You don't reference the parent CMIS Connection when configuring CMIS Import or CMIS Export. Instead you reference a CMIS Repository.

Basic creation steps

There are three basic steps involved to connect Grooper to external storage platforms:

  1. Create a CMIS Connection
  2. Configure the "Connection Settings".
    • Choose what platform you want to connect to (the CMIS Binding).
    • Enter the connection settings required to connect to the platform (This will differ from platform to platform)
  3. Add child CMIS Repositories by importing the storage locations.
    • Importing a CMIS Repository is not the same as importing documents to a new Batch.
      • "Importing" here is more like importing a reference (or bringing the repository into a framework Grooper can use).
      • Upon importing the CMIS Repository, Grooper has full file access to that location in the storage platform.

CMIS Bindings (aka "connection types")

How you configure a CMIS Connection only differs based on what platform you're connecting to. Connection settings include folder paths, URL addresses or usernames or passwords.

  • Example: Connecting to a Windows folder requires a networked folder's UNC path.
  • Example: Connecting to a SharePoint site requires a URL address.
  • Example: Connecting to a email inbox requires an server host name.
  • Example: Connecting to Application Extender, Box, SharePoint, OneDrive, Exchange (Outlook) and more requires a username and password.


Each platform has its own connection requirements. These connection settings and the logic required to interoperate between Grooper and a specific platform are defined by the different "CMIS Binding"

Each CMIS Binding provides the settings and logic to connect Grooper to CMS platforms and file systems for import and export operations.

  • Example: The "Exchange" binding contains all the information Grooper uses to connect to Microsoft Exchange email servers (i.e. Outlook inboxes).
  • Example: The "AppXtender" binding contains all the information Grooper uses to connect to the ApplicationEnhancer (formerly AppXtender) content management system.
  • Example: The "NTFS" binding contains all the information Grooper uses to connect to a Windows file system.
  • And so on.


The first step in configuring a CMIS Connection is choosing what platform you want to connect to. You do this by selecting a "CMIS Binding".

  • You will commonly hear "CMIS Binding" referred to as a "CMIS connection type" or "connection type".
  • Or just "connection", as in an "Exchange connection".

Current CMIS Bindings (aka "connection types")

Grooper can connect to the following storage platforms using below using CMIS Bindings:

Most Commonly Used

Somewhat Commonly Used

Less Commonly Used

  • FTP (File Transfer Protocol) and SFTP (SSH File Transfer Protocol) servers.
  • IMAP mail servers

Least Used

  • Content management systems using CMIS 1.0 or CMIS 1.1 servers.
  • The FileBound document management platform.
  • The IBM FileNet platform.


About CMIS Import

There are two CMIS Import providers in Grooper.

Both these providers are used to import files from CMIS Repositories for Batch processing in Grooper. It will import files from a folder structure of an on-premise or cloud-based document storage platform.

  • While less common, Import Descendants and Import Query Results can also import folders from CMIS Repositories. However, since importing files is most common, we focus on importing files in this article.

Just like any other Import Provider, Import Descendants and Import Query Results are used to submit "Import Jobs". Import Jobs are how Grooper brings in files from a storage location for processing. For example, it's how PDFs from a Windows folder get into Grooper or messages from an email inbox get into Grooper. When an Import Job runs, Grooper first creates a Batch and then creates a Batch Folder for each imported file. A copy of the file is attached to the Batch Folder. This becomes the Batch Folder's "attachment" and is used when applying activities like "Split Pages".

  • When files are imported into Grooper, a link to that file is stored on the Batch Folder. This link maintains a connection between the file's source location and the document in Grooper. This link also makes "Sparse" imports possible. See below for more.


Import Jobs are submitted in one of two ways:

  • By a user from the Imports page: Ad-hoc or "user directed" Import Jobs are submitted from the Imports Page, using the "Submit Import Job" button.
  • From an Import Watcher service: Automated or "scheduled" Import Jobs are submitted by an Import Watcher service according to its Poling Loop or Specific Times specification.

In both cases, an "Import Descendants" or "Import Query Results" can be selected and configured using using the "Provider" property.


Similarities and differences between Import Query Results and Import Descendants

Overall, "Import Descendants" is a "simpler" version of "Import Query Results".

  • We advise to use Import Query Results over Import Descendants, when possible.
    • Import Query Results can do everything Import Descendants can do and more.
    • Import Query Results has more robust file filtering capabilities. This allows for more targeted, selective imports.
    • Import Query Results is newer (and better maintained) than Import Descendants.
    • There are only a handful of scenarios where Import Descendants must be used over Import Query Results.


Similarities

  • Both providers import files from a CMIS Repository.
  • Both providers have the same Batch Creation settings.
  • Both providers are capable of "Sparse" imports by changing the "Import Mode" to "Sparse".
  • Both providers can dispose of files on import (using the "Delete Item", "Move Item", or "Update Properties")

Differences

The biggest difference is in how the providers determine which files are imported (import criteria).

  • Import Descendants will import all files from a target location. This includes all files in all subfolders if present. You can, however, set a "Base Folder" within the CMIS Repository.
  • Import Query Results will import files that match a CMIS Query. This is a specialized query language based on SQL syntax. This gives you many more options for import conditions, using a "WHERE" clause in the query. CMIS Queries also give you the capability to restrict imports to a folder location without importing files in subfolders (This is something Import Descendants cannot do).
  • Import Descendants does have an "Import Filter" it can use to set import conditions. It also uses a SQL-like syntax. However, it is not as advanced as the CMIS Queries that Import Query Results uses.


CMIS Repositories that can only use Import Descendants

Certain CMIS Bindings are not queryable using CMIS Queries. Because of this, certain CMIS Repositories cannot utilize Import Query Results. The following CMIS Repositories must use Import Descendants to import file content:

  • FTP
  • SFTP
  • NTFS (only if the directory has not been indexed by the Windows Search service or the Windows Search service is not running)

Prereqs: CMIS Repository

A CMIS Repository allows Grooper access to files and folders within a storage platform.

Because Import Descendants imports from a CMIS Repository, you can import from numerous storage platforms determined by the "CMIS Binding" used. These CMIS Bindings include:

  • NTFS to connect to Windows folders
  • FTP to connect to FTP directories
  • SFTP to connect to SFTP directories
  • Exchange to connect to Outlook inboxes
  • SharePoint to connect to SharePoint sites (and document libraries)
  • OneDrive to connect to OneDrive drives
  • Box to connect to Box accounts
  • AppXtender to connect to AppEnhancer applications

Before you can import files from these platforms using Import Descendants or Import Query Results, there's some setup required in the Grooper Design page. You must:

  1. Create and configure a CMIS Connection.
  2. Import a folder location as a CMIS Repository.

This will allow you to import files from folders accessed by the CMIS Repository. For information on CMIS Connections and CMIS Repositories, including how to create them in Grooper, visit the CMIS Connection page.

Click here for an interactive walkthrough.

Settings shared between Import Descendants and Import Query Results

The following properties/settings are part of configuring either Import Descendants or Import Query Results.

  • Import Mode - Configuring this property allows you to perform "Sparse" imports to speed up the time it takes to ingest large numbers of files.
  • Batch Creation settings - These settings control how Batches are created on import, including which Batch Process is used.
  • File disposition options - Optionally, you can "dispose" of the source file after it has been imported into Grooper. These settings allow you to delete the file, move it or update one of its properties.

Import Mode (and "Sparse" imports)

What is an "Import Mode"?

The Import Provider's "Import Mode" controls how file content and data associated with that file (its properties and any metadata field values in a content management system) are imported into Grooper. Practically speaking, this has an effect on the overall speed of the import process.

Files can be imported using one of three "Import Modes":

  • Copy - The files are fully copied to the Grooper Repository on import.
    • Certain file properties (file name, size, MIME type, attributes, the source import file's location, etc.) are stored in the CMIS Document Link attached to the Batch Folder.
    • A copy of the file is attached to the Batch Folder and stored in the Grooper File Store.
    • If using an Import Behavior, properties/metadata are mapped to the Batch Folder's Data Fields.
  • Sparse - Only the file properties and mapped metadata are copied to the Grooper Repository.
    • Certain file properties (file name, size, MIME type, attributes, the source import file's location, etc.) are stored in the CMIS Document Link attached to the Batch Folder.
    • If using an Import Behavior, properties/metadata are mapped to the Batch Folder's Data Fields.
    • The file content is not copied over to the Grooper File Store. Instead, it is accessed by the link attached to the Batch Folder. The file can be copied to the Grooper File Store with the "CMIS Document Link > Load" command.
  • Link Only (seldom used) - Nothing is copied to the Grooper Repository. Only a link to the source file is attached to the Batch Folder.
    • Certain file properties (file name, size, MIME type, attributes, the source import file's location, etc.) are stored in the CMIS Document Link attached to the Batch Folder.
    • File content and mapped properties/metadata must be brought into Grooper with the "CMIS Document Link > Load" command.


Sparse imports serve two functions:

  • Primarily, they are used to speed up the overall import operation. This is actually a two step process.
    1. Enable the "Sparse" Import Mode when configuring the Import Provider.
    2. Have the first step in the Batch Process fully copy the files into the Grooper Repository (using the Execute activity and the "CMIS Document Link > Load" command).
  • They can also be used to avoid file duplication between the import source location and the Grooper File Store.
    • A fully copied import creates a copy of the file in the Grooper File Store. A sparsely imported document is fully usable in Grooper, but no such copy exists in the File Store.
    • Instead, Grooper travels the link every time it needs to access the file (Example: When the file's image is pulled up in the Document Viewer).
    • While this does save on storage between two systems (Grooper and the import source), it does not save on processing time. Every time Grooper needs to access the document to view it, execute a command, or run an activity, it will take some time to travel the document link and fetch the document. Depending on latency, it may be preferable to load the file into Grooper even if it does duplicate the file (The file can always be removed from Grooper with a Dispose step at the end of a Batch Process too).


How does using "Sparse" speed up import?

It increases the parallelism of the overall import operation.

Import operations must run "single threaded" in Grooper. That means regardless of how much compute your server has, it's only ever going to use a single processing thread to import files.

When you're importing hundreds or thousands of documents by copying them from a source location to the Grooper File Store, it takes a long time for the Import Job to complete.

  • By only importing a link to the file content, Sparse mode dramatically speeds up the time it takes to get a usable document into Grooper.
  • To take full advantage of your system's resources, the first step in your Batch Process should be "Execute" using the "CMIS Document Link > Load" command. This will allow you to load the files into the Grooper File Store using multiple threads.
    • Be Aware: The "Load" command has three modes (1) Content (2) Properties and (3) Full. For "Properties" and "Full" to work appropriately, the Batch Folders must be classified on import and use an Import Behavior to map the properties.
  • The end result is the overall import operation will be as if you had used the "Copy" mode. But it will be done in a way that runs multi-threaded.


Batch Creation settings

The Batch Creation settings allow you to define which Batch Process you wish to use to process the imported files.

You must configure the "Starting Step" property to assign the Batch Process.

  • Use the property's dropdown editor to select a Batch Process Step from a list of Batch Processes.
  • Only published Batch Processes will appear in this list.


Other notable Batch Creation properties:

  • "Start Paused" - This determines if the Batch starts in a paused state or not. If "False", the first step's tasks will be automatically submitted to Activity Processing services. If "True", you will have to manually start the Batch.
  • "Max Items Per Batch" - The default is 2500, meaning each Batch will have a maximum of 2500 Batch Folders before creating a new Batch on import. For users who want more Batches with fewer documents, lower this number.
  • "Organize By Date" - This will organize Batches into subfolders in the Production branch in Grooper according to the year / month / day the Batch was created.
  • "Priority" and "Increment Priority" - Controls the task processing priority for the Batch. "Increment Priority" is useful when submitting large user-directed imports from the Imports Page to ensure the first Batch created is the first that is fully processed by Activity Processing services.


File disposition settings

The "Disposition" settings allow you to do something with the source files after importing them into Grooper. This is important when using an Import Watcher service to schedule imports. If you do not configure a Disposition property, the imported file will remain in the same state after the Import Job completes. This can cause the Import Watcher to repeatedly attempt to import the same file over and over again.

There are three "Disposition" options:

  • "Delete Item" - Turning this to "True" will simply delete the source file after the Import Job completes.
    • You may only configure this option if the "Import Mode" is set to "Copy".
  • "Move To Folder" - This will move the files to a different folder in the CMIS Repository after the Import Job completes.
  • "Update Properties" - This will update one or more file properties after the Import Job completes. Properties are updated by listing them as "key-value pairs" in the "Update Properties" list editor where key=value.
    • Examples:
    • Archive=false Sets the archive attribute on each imported file to "false".
    • IsRead=true Marks each imported email message as read.
    • Status=PENDING Sets a "Status" field on a document in an AppEnhancer application (assuming there is a "Status" field in the application).



Example Import Descendants configuration

Regardless of the platform you're accessing with a CMIS Repository, you configure Import Descendants largely the same. Just pick a CMIS Repository and configure the rest of Import Descendants as needed.

  • Import Descendants will import all files in the base folder, including all descendant files in any subfolders if present.
  • Unless you configure the "Base Folder" property, Import Descendants will start at the root of the CMIS Repository and continue down the folder structure.
  • When using Import Descendants, setting the "Base Folder" to a terminal branch in the folder structure (a folder with no subfolders) is the only way to import files from a folder without importing descendant files (because there are none in this case).
    • A more technical way of saying this is Import Descendants does not support the IN_TREE CMIS search predicate. It only supports the IN_FOLDER predicate.

Example: Submitting Import Descendants from the Imports Page

Click here for a step by step walkthrough.

  1. Go to the Imports Page.
  2. Press the "New Import Job" button.
  3. This brings up the "Submit Import Job" editor.
  4. Enter a description in the Description property (This is required).
  5. Open the "Provider" dropdown (Press the "☰" button).
  6. Select "Import Descendants" from the dropdown list.
  7. Expand the Provider settings to configure it.
  8. Open the "Repository" node selector (Press the "☰" button).
  9. Select the CMIS Repository you wish to import from.
  10. Select a Base Folder, as needed.
  11. Configure the Import Mode property, as needed.
  12. Configure the Batch Creation settings, as needed.
  13. Configure the file disposition options (Delete Item, Move To Folder, or Update Properties), as needed.
  14. Configure any remaining Import Descendants properties, as needed.
  15. Press the "Submit" button when finished.
  16. Your Import Watcher service will pick up and execute the Import Job.

Example Import Query Results configuration

Import Query Results relies on a "CMIS Query" to import files from a CMIS Repository. The CMIS Query (aka CMISQL Query) uses a syntax structure similar to a SQL query. Instead of querying rows in a database based on column values, you're querying documents in a storage location based on file property and metadata values.

  • The general CMIS Query format is: SELECT * FROM <a type of document in the CMIS Repository> WHERE <according to certain search conditions>
  • What "type of document" in can search for in the FROM clause is determined by the storage platform and the CMIS Binding.
    • Example: For the NTFS binding you select "FROM File" for files in a Windows folder.
    • Example: For the Exchange binding you select "FROM Message" to search for email messages.
  • In the WHERE clause, you can set search parameters based on file properties and metadata values called "CMIS properties". What CMIS properties are "queryable" will also depend on the the CMIS Repository and its CMIS Binding.
    • Example: The Exchange binding has a queryable "Subject" property.
    • Example: Fields in a Box metadata template are queryable for the Box binding.
    • Which CMIS properties are queryable can be determined by (1) navigating to the CMIS Repository in the Grooper node tree (2) going to the "Types" tab (3) selecting the CMIS document type whose properties you want to inspect and (4) reviewing the "Queryable" column for each CMIS property.
  • The WHERE clause is also used to set the folder scope, using the IN_FOLDER and IN_TREE predicates (where supported).
  • More information on CMIS Queries (including unsupported query configurations for various CMIS Bindings) can be found in the CMIS Query article.

Example: Submitting Import Query Results from the Imports Page

Click here for a step by step walkthrough.

  1. Go to the Imports Page.
  2. Press the "New Import Job" button.
  3. This brings up the "Submit Import Job" editor.
  4. Enter a description in the Description property (This is required).
  5. Open the "Provider" dropdown (Press the "☰" button).
  6. Select "Import Query Results" from the dropdown list.
  7. Expand the Provider settings to configure it.
  8. Open the "Repository" node selector (Press the "☰" button).
  9. Select the CMIS Repository you wish to import from.
  10. Open the "CMIS Query" editor (Press the "..." button).
  11. Enter the CMIS Query by either:
    • Typing it into the Query Editor.
    • Or, using the Query Editor's property grid to construct the query (The text will populate the Query Editor as you configure these properties).
      • A more in depth explanation of the Query Editor and CMIS Queries can be found in the CMIS Query article.
  12. Configure the Import Mode property, as needed.
  13. Configure the Batch Creation settings, as needed.
  14. Configure the file disposition options (Delete Item, Move To Folder, or Update Properties), as needed.
  15. Configure any remaining Import Descendants properties, as needed.
  16. Press the "Submit" button when finished.
  17. Your Import Watcher service will pick up and execute the Import Job.