2023:CMIS Import (Import Provider)

From Grooper Wiki

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

2025202320212.90

CMIS Import refers to two Import Providers used to import content over a cloud CMIS Connection: Import Descendants and Import Query Results. CMIS Imports allow users to import from various on-premise and cloud based storage platforms.

Documents are imported from CMIS Connections using either the Import Descendants or Import Query Results providers. These can be used in two ways:

  • To perform manual "ad-hoc" imports when creating a new Batch in Grooper Dashboard or Grooper Design Studio.
  • To perform automated, scheduled imports using one or more Import Watcher Grooper services.

Import Descendants will import all documents within a designated folder location of a CMIS Repository. Import Query Results allows you to use a query syntax similar to a SQL query (called a CMISQL query) to set conditions for import based on the item's available metadata, such as a documents name, file type, creation date, archive status, or other variables.

SharePoint: SharePoint is a CMIS Connection Type that connects Grooper to Microsoft SharePoint, providing access to content stored in "document libraries" and "picture libraries" for import and export operations.

About CMIS+

"CMIS" stands for "Content Management Interoperability Services". It is an open standard that allows different content management systems to inter-operate over the Internet. Grooper expanded on this idea in version 2.72 to create our "CMIS+" architecture. CMIS+ unifies all content platforms under a single framework as if they were traditional CMIS endpoints.


Now, Grooper connects to any external storage platform by creating and configuring a CMIS Connection (not just CMIS 1.0 or CMIS 1.1 servers).

  • Once a CMIS Connection is created, Grooper can "interoperate" with these platforms.
  • "Interoperability " means Grooper has the same access to control the system as a human being does.
  • Grooper has a "one-to-one" connection to the platform, allowing full and total control.


Using this architecture, Grooper is able to create a simpler and more efficient import and export workflow, using a variety of storage platforms.

  • You now use CMIS Import providers and CMIS Export, regardless of the storage platform.

Anatomy of a CMIS Connection

When connecting Grooper to external storage platforms, you'll start by creating a CMIS Connection. There are three important parts to understanding a CMIS Connection

  • The CMIS Connection itself.
  • It's Connection Type (and the "CMIS Binding" you select).
  • It's child CMIS Repositories

FYI

A "repository", in computer science, is a general term for a location where data lives. Different systems refer to "repositories" in different ways.

  • An email inbox could be a repository. A folder in Windows could be a repository. A folder in a Box account could be a repository. A cabinet in ApplicationXtender could be a repository.
  • We standardize the various terms used by various storage platforms to simply "repository".
  • Put simply, it's a place to put stuff.

For newer users, the difference between a CMIS Connection and a CMIS Repository can be confusing (and it doesn't help that some people use the terms interchangeably!). The key distinction is as follows:

  • The CMIS Connection is the object in Grooper that Grooper uses to establish a connection to some external system.
    • The Connection Type determines which specific platform you're connecting to, and any settings required to connect to it.
  • CMIS Repositories represent a location within the connected platform.
    • These are created after creating the CMIS Connection.
    • Typically, these represent a folder location in the storage platform.

For example, imagine you want to use Grooper to connect to a Windows file system folder on some networked server.

  • First, they would create a new CMIS Connection
  • Then they would choose NTFS for its Connection Type
  • Then they would import the folder location as a CMIS Repository
    • It is then this CMIS Repository Grooper will point to when importing from or exporting to this folder location.
    • The CMIS Connection is just the thing that allows Grooper to connect to Windows in this case. It is the CMIS Repository that acts as the Windows file system folder in Grooper.


To reiterate, there are three basic steps involved to connect Grooper to external storage platforms:

  1. Create a CMIS Connection
  2. Configure its Connection Type to select which platform you want to connect to (and enter any settings to connect to that platform).
  3. Import storage locations as one or more CMIS Repositories, which are created as children of the CMIS Connection.

FYI

Importing a CMIS Repository is not the same as importing documents to a new Batch.

  • "Importing" here is more like bringing the repository into a framework Grooper can use.
  • Upon importing the repository, Grooper has full file access to that location in the storage platform.

CMIS Bindings (AKA Connection Types)

How you configure a CMIS Connection only differs from CMIS Binding to CMIS Binding, as each binding has a different way of connecting to it.

  • You don't connect to an Outlook inbox the same way you connect to a Windows file folder, for example.
  • Thus, the property configuration for the Exchange binding is different from the NTFS binding.


A CMIS Binding provides connectivity to external storage platforms for content import and export. Each individual CMIS Binding contains the settings and logic required to exchange documents between Grooper and each distinct platform.

  • The Exchange Binding contains all the information Grooper uses to connect to Microsoft Exchange email servers (i.e. Outlook inboxes).
  • The AppXtender Binding contains all the information Grooper uses to connect to the ApplicationXtender content management system.
  • The NTFS Binding contains all the information Grooper uses to connect to a Windows file system.
  • And so on.

When creating a CMIS Connection the first step to configure the Connection Type property.

  • When you select a Connection Type you're selecting which platform you want to connect to (using a CMIS Binding).
    • First, you select which platform you want to connect to (which CMIS Binding you want to use)
    • Then, you enter connection settings unique to the platform (any values the CMIS Binding needs to connect to the platform, like login information for many platforms)

Current CMIS Connection Types

Grooper can connect to the following storage platforms using below using CMIS Bindings:

Most Commonly Used

Somewhat Commonly Used

Less Commonly Used

  • FTP (File Transfer Protocol) and SFTP (SSH File Transfer Protocol) servers.
  • IMAP mail servers

Least Used

  • Content management systems using CMIS 1.0 or CMIS 1.1 servers.
  • The FileBound document management platform.
  • The IBM FileNet platform.

About CMIS Import

The CMIS Import provider is split into two different Import Providers

  • Import Descendants
  • Import Query Results

These providers are designed to import files from a folder structure of an on-premise or cloud-based document storage platform. This is the primary method of Batch creation when importing digital documents into Grooper to process them with a Batch Process.

In order to do this, a few requirements must be met first.

  1. A CMIS Connection object must made and configured. This will connect Grooper to the document storage platform.
    • This may be a connection to a Windows folder, an email inbox, a true CMIS content management system, or other document storage platforms. What the CMIS Connection connects to is determined by the CMIS Binding selected when configuring the Connection Type property of the CMIS Connection object.
  2. A CMIS Repository must be imported. This will create an object Grooper can use to import documents from the folders in the document storage platform.
    • This acts as a "go-between" or a "hub" for Grooper to pull in documents from the content's source. Or, you may think of this as Grooper's representation of a folder location in the document storage platform.

For more information on adding a CMIS Connection and importing a CMIS Repository, visit the CMIS Connection article.

As for the difference between the Import Descendants and Import Query Results providers, you can think of Import Query Results as a more specialized version of Import Descendants.

  • Import Descendants is intended to import the full contents of a folder location. It imports the "descendant" files of a parent folder.
  • Import Query Results allows you to selectively import files using a SQL-like query (called a CMISQL query). Only files returned by the query will be imported. For example, using an Exchange or IMAP CMIS Connection, you could query an inbox for emails from a specific sender and only import those emails.
    • Note: There are some import filtering capabilities available to Import Descendants as well using a SQL-like query. However, the CMISQL querying capabilities of Import Query Results are much more robust.
    • That said, only certain CMIS Bindings can take advantage of this increased CMISQL query functionality. The following CMIS Bindings are not currently suitable for the Import Query Results provider.
      • FTP
      • SFTP
      • NTFS (If the folder path is not indexed by the Windows Search service and/or Windows Search is not running on the storage server)

Import Descendants

Configuration Panel

General Settings

Back top the Import Descendants configuration screen, the CMIS Repository object is used to point Grooper to this folder location for import.

  • The Repository property is configured to assign the CMIS Repository where the documents are located.
    • Here the CMIS Repository named "Import and Export" connecting to the "Import and Export" folder of the local drive.
  • The Base Folder property is configured to traverse the folder structure of the CMIS Repository.
    • Here, we don't want to import all documents from every folder in the "Import and Export" folder. We just want to import from the "Grooper Import Folder".
  • The Import Filter property allows you to perform some basic import filtering to selectively choose which documents you want to import.
    • SELECT * FROM File is the default filter. It will import all files from the selected folder location.
    • This is a SQL-like query to specify conditions for document import. However, the Import Query Results provider was created to expand on this functionality and provides more filtering options as well as a simpler interface to perform the query (for the CMIS Bindings capable of utilizing this functionality).
  • The Content Type property allows you to optionally assign the incoming documents with a Document Type.
    • You can use this property to assign a default classification for all incoming documents.

Processing Options Settings

The most important part of the Processing Options property section is the Import Mode property. The Import Mode property allows control over the connections Grooper makes and/or retains to the imported documents. For importing, documents contain two important sets of information:
  • Content - Images and native text data
  • Properties - Metadata associated with the file. Digital information, such as the document's filename, file type, creation date, and more.
Depending on the Import Mode selected all, some, or none of this information will be copied to your Grooper Repository's file store (in the case of the document's content) and database (in the case of the document's properties). See below for more in depth explanation of each of the Import Mode options.

Copy

  • Both properties and content will be loaded. This is a total duplication of the document from its source to your Grooper Repository's local file store. This is the slowest import mode, because the full content of each document is copied during a single-threaded import process. As such, this mode is not well-suited for high-volume imports, but provides some useful advantages in low-volume import scenarios.
  • For example, Copy mode allows items to be deleted immediately on import. Also, Full mode avoids the need for any follow-up content loading operations in the Batch Process.
  • This mode was called Full in older versions of Grooper.

Sparse

  • Properties will be loaded, but content will not. This mode is much faster than a Full import, because no content files are copied into your local Grooper file store. Instead, a link is saved on each Grooper document, and content is retrieved on demand directly from the CMIS Repository. This type of document is often referred to as a "sparse" document. Sparse documents can be used just like any other document, with the caveat that display and processing speeds may be reduced. Grooper has to traverse the document link in order to display or process the document's image.
  • However, after a Sparse import, document content can be loaded multi-threaded using the Execute activity in a Batch Process. This can overall lead to importing a document's content faster than a Full import. While the
    • Choose CMIS Document Link as the Object Type and Load Content as the Command

Link Only

  • No content or properties will be loaded, making this the fastest import mode. It imports nothing more than a link to each document, and offloads all property and content loading to parallel operations in the Batch Process.
  • However, this does not produce a usable document in Grooper. After a LinkOnly import, document content must be loaded using the Execute activity in a Batch Process.
    • Choose CMIS Document Link as the Object Type and Load Content as the Command
  • You can think of the Link Only option as an even sparser sparse import.


See the table bellow for a summary of the Import Mode options.

Import Mode Speed Comments
Full Slow Full import of content and their properties.
  • Required if deleting content from the source on import.
Sparse Fast Imports a link to the document's source and its properties but not their content.
  • This produces a usable document in Grooper without copying the full content into Grooper, saving time upon import.
  • This mode is the same as enabling the old Sparse Import property in previous versions.
Link Only Fastest Only imports a link to the document's source.
  • Does not produce a usable document. The document's properties must be loaded in a step in a Batch Process.

Disposition Settings

The Disposition property settings allow you to do something with the source documents after importing them into Grooper, namely delete them, move them, or do nothing and just leave them alone where they came from. This is often leveraged with the Import Watcher Grooper service to prevent repeatedly importing the same document. In our example here, the Move to Folder property is configured to move the PDF documents to a folder named "Imported Documents".
  • The folder location you're moving documents to must be accessible via the connected CMIS Repository.
If using the FullImport Mode, you can enable the Delete Item property to delete each document after it is imported into the Grooper Batch.
  • This property is ONLY available when choosing the Full Import Mode. A sparsely imported document needs to call to the import storage location in order to load the document's image for display or processing. If you deleted the document upon import, you wouldn't be able to view it or do anything with it.
The Update Properties property allows you to alter the document's property values upon import. Property values are updated using a list of "key-value pairs" where the "key" is the name of the property and the "value" is what change you want to make to that property. You can type one entry per line in the format key=value.
  • Examples:
  • Archive=true Sets the archive attribute on a file
  • Status=PENDING Sets the "Status" field on ApplicationXtender documents.
  • Imported=true Sets the "Imported" field on SharePoint documents.
  • IsRead=true Sets the "IsRead" flag on an Exchange message.

Batch Creation Settings

It's likely you're importing documents because you want to run them through a Batch Process. The Batch Creation property settings allow you to define which Batch Process you wish to use to process the imported documents. This is done using the Starting Step property, selecting a Batch Process Step in a Batch Process from the published Batch Processes in the Grooper Repository. Upon import, a new Batch is created with each document as a Batch Folder, and the selected Batch Process assigned to the Batch. There are also further properties to control Batch creation. You can limit the number of documents imported per Batch using the Maximum Items per Batch property. By default, new Batches are named with a date/time stamp. However, the Batch Name Prefix allows you to tack on a prefix to the Batch's name for easier identification. The Start Paused property will automatically trigger the Batch Process if set to False.

Import Query Results

The Same, But Different

The Import Query Results provider's configuration panel is almost identical to the Import Descendants provider's configuration panel. Both providers share the same Processing Options, Disposition, and Batch Creation property settings. See the Import Descendants section for brief descriptions of these property sections.


The big difference between the two providers is the highlighted CMIS Query property. This allows users to enter a SQL-like query (called a CMISQL query) to selectively import documents from their source, based on certain metadata properties. Only files returned by the query will be imported.
  • For example, you may want to only import documents of a certain file type(s). You could include the file extension(s) as the query condition (or one of many conditions).
  • For another example, you can use CMISQL queries to easily filter email messages when importing from an inbox. If you only wanted to import messages from a certain sender, from an certain folder, with a certain subject line and only ones that have not been read, you could filter out any emails that didn't meet those query conditions by comparing metadata properties (like "Sender" and "Subject") to your criteria.

Only certain external storage platforms are currently queryable with the CMIS Query property. The following CMIS Binding sources cannot be queried currently. As such, they are not suitable for Import Query Results. You should instead use Import Descendants for the following CMIS Bindings.
FTP
SFTP
NTFS (If the folder path is not indexed by the Windows Search service and/or Windows Search is not running on the storage server)



Just like with Import Descendants, there are some minimum requirements before configuring Import Query Results. A CMIS Connection object must be created and a CMIS Repository must be imported.



CMIS Query Configuration

Upon pressing the ellipsis button at the end of the CMIS Query property, the CMIS Query Editor window will appear.

This interface allows you to configure the CMISQL query based on available metadata from the CMIS Binding. For example, the Exchange binding has a selection of queryable metadata for email messages, such as the email's subject, sender and date the message was received.

For an in depth explanation of the CMIS Query Editor and how to use it to craft a CMISQL query, please visit the CMIS Query article.

Version Differences

New CMIS Query Editor (2021)

Grooper 2021 introduced a new and improved CMIS Query Editor'. This editor was designed to simplify construction of CMISQL queries using a property grid. For more information, please visit the CMIS Query article.

Box Integration (2.90)

Grooper 2.9 sees the addition of the Box.com document storage platform into the CMIS fold via the Box (CMIS Binding).

Legacy Providers (2.72)

Old import and export providers should be replaced with this new functionality. While Grooper's older import and export providers are available as "Legacy Import" and "Legacy Export" providers, these components are depreciated. They will still function but will no longer be upgraded in future versions of Grooper.

Grooper can import documents using CMIS Connections via Import Descendants and Import Query Results. Grooper can export via the CMIS Export providers, Mapped Export and Unmapped Export.

New Connection Types (2.72)

By creating the CMIS+ architecture, we have been able to create new connections between Grooper and content management systems. Grooper can now connect to Microsoft OneDrive, SharePoint, and Exchange via new CMIS Bindings. Since these were created as CMIS Bindings, they can be used by the CMIS Import and CMIS Export providers. Instead of having to create three new import providers and three new export providers for a total of six brand new components, we can use the already established CMIS import and export providers in the CMIS+ framework. A user can create a CMIS Connection using the OneDrive, SharePoint or Exchange bindings, and use the same import and export providers for them as any of the other CMIS Bindings.

This will also allow Grooper to create CMIS Bindings to connect to currently unavailable content management systems in the future much quicker and easier.

Import Mode (2.72)

In version 2.72 the Import Mode property replaces previous versions' Sparse Import property.

Import Disposition (2.72)

2.72 adds the Import Disposition property to CMIS Import. This allows you to change your documents disposition upon importing them into Grooper. You can delete them, move them to a folder, or update one or more properties on the document itself. This can be leveraged with Import Watcher to prevent repeatedly importing the same document.

Glossary

AppXtender: AppXtender is a CMIS Connection Type that connects Grooper to the AppEnhancer (formerly ApplicationXtender) content management system for import and export operations.

Batch Folder: folder Batch Folder objects are defined as container objects within a inventory_2 Batch that are used to represent and organize both folders and pages. They can hold other Batch Folders or contract Batch Page objects as children. The Batch Folder acts as an organizational unit within a Batch, allowing for a structured approach to managing and processing a collection of documents.

  • Batch Folders are frequently referred to simply as "documents".

Batch Process Step: edit_document Batch Process Step objects are specific actions within the sequence defined by a settings Batch Process. A Batch Process Step plays a critical role in automating and managing the flow of documents through the various stages of processing within Grooper.

  • Batch Process Steps are frequently referred to as simply "steps".
  • Because a single Batch Process Step executes a single Activity configuration, they are often referred to by their referenced Activity as well. For example, a "Recognize step".

Batch Process: settings Batch Process objects are crucial components in Grooper's architecture. A Batch Process orchestrates the document processing strategy and ensures each inventory_2 Batch of documents is managed systematically and efficiently.

  • Batch Processes by themselves do nothing. Instead, the workflows they execute are designed by adding child edit_document Batch Process Steps.
  • A Batch Process is often referred to as simply a "process".

Batch: inventory_2 Batch objects are fundamental in Grooper's architecture as they are the containers of documents that get moved through Grooper's workflow mechanisms known as settings Batch Processes.

Box: Box is a CMIS Connection Type that connects Grooper to the Box content management system for import and export operations.

CMIS Connection Type: A CMIS Connection Type is defined when creating a cloud CMIS Connection. The CMIS Connection Type (formally CMIS Binding) establishes the communication protocols used to connect Grooper with content management systems (CMS) adhering to the CMIS standard. Even when connecting to CMS platforms that are not true CMIS systems, Grooper normalizes connection to them as if they were. This allows Grooper to use CMIS Import and CMIS Export for all content management systems.

CMIS Connection: cloud CMIS Connection node objects provide a standardized way of connecting to various content management systems (CMS). These objects allow Grooper to communicate with multiple external storage platforms, enabling access to documents and content that reside outside of Grooper's immediate environment.

  • For those that support the CMIS standard, the CMIS Connection connects to the CMS using the CMIS standard.
  • For those that do not, the CMIS Connection normalizes connection and transfer protocol as if they were a CMIS platform.

CMIS Export: CMIS Export is an Export Definition available when configuring an Export Behavior. It exports content over a cloud CMIS Connection, allowing users to export documents and their metadata to various on-premise and cloud-based storage platforms.

CMIS Import: CMIS Import refers to two Import Providers used to import content over a cloud CMIS Connection: Import Descendants and Import Query Results. CMIS Imports allow users to import from various on-premise and cloud based storage platforms.

CMIS Query: A CMIS Query (aka CMISQL Query) is Grooper's way of searching for documents in CMIS Repositories and filtering them upon import when using the Import Query Results Import Provider. CMIS queries are based on a subset of the SQL-92 syntax for querying databases, with some specialized extensions added to support querying CMIS sources.

CMIS Repository: settings_system_daydream CMIS Repository node objects in Grooper allow access to external documents through a cloud CMIS Connection. They allows managing and interacting with those documents within Grooper's framework as if they were local. They are created as a child object of a CMIS Connection and used for various Activities.

CMIS+: CMIS+ is a conceptual term that refers to Grooper's connectivity architecture to external storage platforms. CMIS+ standardizes connections to a variety of content management system based on the CMIS standard. This provides a standardized setup to allow Grooper to interoperate with both CMIS compliant systems and non-CMIS systems. It further provides normalized access to document content and metadata for import (CMIS Import) and export (CMIS Export) operations.

CMIS: CMIS (Content Management Interoperability Services) is open standard allowing different content management systems to "interoperate", sharing files, folders and their metadata as well as programmatic control of the platform over the internet.

Content Type: Content Type refers to objects in Grooper used to classify folder Batch Folders. These include: stacks Content Models, collections_bookmark Content Categories, and description Document Types.

Document Type: description Document Type objects represent a distinct type of document, like an invoice or contract. Document Types are created as children of a stacks Content Model or a collections_bookmark Content Category and are used to classify individual folder Batch Folders. Each Document Type in the hierarchy defines the Data Elements and Behaviors that apply to Batch Folders of that specific classification.

Exchange: Exchange is a CMIS Connection Type that connects Grooper to Microsoft Exchange email servers (including Outlook servers) for import and export operations.

Execute: tv_options_edit_channels Execute is an Activity that runs one or more specified object commands. This gives access to a variety of Grooper commands in a settings Batch Process for which there is no Activity, such as the "Sort Children" command for Batch Folders or the "Expand Attachments" command for email attachments.

Export: output Export is an Activity that transfers documents and extracted information to external file systems and content management systems, completing the data processing workflow.

FTP: FTP is a CMIS Connection Type that connects Grooper to FTP directories for import and export operations.

Grooper Repository: A Grooper Repository is the environment used to create, configure and execute objects in Grooper. It provides the framework to "do work" in Grooper. Fundamentally, a Grooper Repository is a connection to a database and file store location, which store the node configurations and their associated file content. The Grooper application interacts with the Grooper Repository to automate tasks and provide the Grooper user interface.

IMAP: IMAP is a CMIS Connection Type that connects Grooper to email messages and folders through an IMAP email server for import and export operations.

Import Descendants: Import Descendants is one of two Import Providers that use cloud CMIS Connections to import document content into Grooper. Import Descendants imports files or folders in a settings_system_daydream CMIS Repository folder location, including any files or folders in any sub-folders (i.e. "descendant" files or folders).

Import Provider: Import Providers enable Grooper to import file-based content from a variety of sources, such as file systems, mail servers, and content repositories. An Import Provider is selected and configured when configuring Import Jobs. Ad-hoc or "user directed" Import Jobs are submitted from the Imports Page, using the "Submit Import Job" button. Automated or "scheduled" Import Jobs are submitted by an Import Watcher service according to its Poling Loop or Specific Times specification. In all cases, the Import Provider is selected using the Provider property.

Import Query Results: Import Query Results is one of two Import Providers that use cloud CMIS Connections to import document content into Grooper. Import Query Results imports files or folders in a settings_system_daydream CMIS Repository that match a "CMISQL query" (a specialized query language based on SQL database queries).

NTFS: NTFS is a CMIS Connection Type that connects Grooper to files and folders in the Microsoft Windows NTFS file system for import and export operations.

OneDrive: OneDrive is a CMIS Connection Type that connects Grooper to Microsoft OneDrive cloud services for import and export operations.

Repository: A "repository" is a general term in computer science referring to where files and/or data is stored and managed. In Grooper, the term "repository" may refer to:

Service: Grooper Service is a conceptual term that refers to the various executable programs that run as a Windows Services to facilitate Grooper processing. Service instances are installed, started and stopped using Grooper Command Console.

SFTP: SFTP is a CMIS Connection Type that connects Grooper to SFTP directories for import and export operations.