CMIS Export
CMIS Export is one of the Export Providers available to the Document Export activity. It exports content over a CMIS Connection, allowing users to export documents and their metadata to various on-premise and cloud-based storage platforms.
CMIS Exports can be either "Mapped" or "Unmapped".
- Unmapped Export is a simple export of files to folders. Metadata (such as data extracted from documents used to populate a Data Model) must be exported as a "buddy file", such as an XML or CSV file. This is appropriate for simple storage platforms such as the Windows NTFS file system.
- Mapped Export allows you to export files and their metadata to a repository that allows for metadata storage as well. Many content management systems allow for document storage as well as storing metadata in fields in the storage platform. This is done by pointing the extracted and available document metadata from Grooper to corresponding locations within the content management system. This is set up using the CMIS Content Type objects in the CMIS Repository object, mapping a connection between objects in a Content Model within Grooper (such as Data Fields in a Data Model) and their corresponding locations in the content management system (such as a column in a SharePoint site).
About CMIS
CMIS stands for "Content Management Interoperability Services". It is an open standard that allows different content management systems to inter-operate over the Internet. This standard protocol allows Grooper to use many different platforms for importing and exporting documents and their contents. Once a CMIS Connection object is created, Grooper can exchange documents with these platforms. "Interoperability " means Grooper has the same access to control the system as a human being does. It is a "one-to-one" connection to the platform, allowing full and total control.
Upon connecting to an external content management system, Grooper will be able to see the "repositories" associated with it. A repository, in computer science, is a general term for a location where data lives. Different systems refer to "repositories" in different ways. An email inbox could be a repository. A folder in Windows could be a repository. A cabinet in ApplicationXtender could be a repository. It's a place to put things. We standardize the various terms used by various storage platforms to simply "repository".
These repositories are "imported" into Grooper as a CMIS Repository object, as a child of the CMIS Connection object. This doesn't import data into Grooper in the traditional sense of importing documents into a batch. "Importing" here is more like bringing the repository into a framework Grooper can use (creating the CMIS Connection object). Upon importing the repository Grooper has full file access to that location in the storage platform.
For our purposes, repositories are like filing cabinets full of documents. Once a connection is established, it's like giving Grooper a key to that cabinet. You can open the various drawers of that cabinet. You can pull out files and put files into. The storage platform or content management system is like the cabinet. The CMIS Connection object is like the key. The CMIS Repository object is like a drawer in the cabinet. You "connect" to the cabinet by turning the key. You "import" the repository by opening the drawer. Now you can see there are documents in there! You can take them out. You can read them and put them back in. You can put new ones in. You can use this "open" connection to the "drawer" however you need.
CMIS+ Architecture
Grooper expanded on this idea in version 2.72 to create our CMIS+ architecture. CMIS+ unifies all content platforms under a single framework as if they were traditional CMIS endpoints. Prior to version 2.72, there was only one type of CMIS Connection, a true CMIS connection using CMIS 1.0 or CMIS 1.1 servers. Now, connections to additional non-CMIS document storage platforms can be made via "CMIS Bindings". This provides standardized access to document content and metadata across a variety of external storage platforms.
Using this architecture, Grooper is able to create a simpler and more efficient import and export workflow, using a variety of storage platforms. You now use the CMIS Import and CMIS Export providers, regardless of the storage platform. They connect to a CMIS Repository imported from a CMIS Connection and use that as Grooper's import or export path.
How you create a CMIS Connection only differs from CMIS Binding to CMIS Binding, as each binding has a different way of connecting to it. You don't connect to an Outlook inbox the same way you connect to a Windows file folder, for example.
CMIS Bindings
A CMIS Binding provides connectivity logic for external storage platforms, allowing CMIS Connection objects to import and export content. Grooper's CMIS+ architecture expands connectivity from traditional CMIS servers to a variety of on-premise and cloud-based storage platforms by exposing connections to these platforms as CMIS Bindings. Each individual CMIS Binding contains the settings and logic required to exchange documents between Grooper and each distinct platform. For example, the AppXtender Binding contains all the information Grooper uses to connect to the ApplicationXtender content management system.
CMIS Bindings are used when creating a CMIS Connection object. The first step to creating a CMIS Connection is to configure the Connection Type property. Which binding you use (and therefore which platform you connect to) is set here. First, the user selects which CMIS Binding they want to use, selecting which storage platform they want to connect to. The second step is to enter the connection settings for that binding, such as login information for many bindings.
Current CMIS Bindings
Grooper can connect to the following storage platforms using below using CMIS Bindings:
- The ApplicationXtender document management platform.
- The Box cloud storage platform.
- The FileBound document management platform.
- Content management systems using CMIS 1.0 or CMIS 1.1 servers.
- The following Microsoft content platforms
- The Microsoft Exchange mail server platform.
- The Microsoft OneDrive cloud storage platform.
- Microsoft SharePoint sites.
- FTP (File Transfer Protocol) and SFTP (SSH File Transfer Protocol) servers.
- IMAP mail servers
- The Microsoft Windows NTFS file system.
About CMIS Export
Unmapped Export
Unmapped Export is the simpler of the two. Both Unmapped and Mapped Exports will connect to CMIS Repositories imported from a CMIS Connection. However, Unmapped Export exports content as simple files and folders. You cannot utilize your Content Model to map metadata values to files and folders. This is a much more "raw" export. However, you can still export metadata "buddy files" along with your documents. If you're wanting to export a JSON, XML, or CSV file of a document's extracted fields, you'll need to use Unmapped Export to get this file.
Generally, Unmapped Export is designed for use with CMIS Repositories using a hierarchical file system (HFS), where folders and files are represented by simple object types. The Microsoft Windows file system, NTFS, is an example of an HFS. These are basic folder and file structures.
The following CMIS Bindings utilize simple HFS repositories.
- NTFS
- FTP
- SFTP
- OneDrive
However, you can still use Unmapped Export to export documents to the other more robust CMIS Bindings as well. It's just not very commonly done. It's much more typical to use Mapped Export for non-HFS platforms, taking full advantage of the metadata mapping capabilities available to those content storage platforms.
Furthermore, you can use Mapped Export with HFS repositories to perform simple mappings, such as renaming exported files and folders using extracted fields in a Content Model's Data Model. Unmapped Export will simply name the exported document according to its Grooper folder name (ie "Document (1)" or if it's classified, "Document Type Name (1)") or, if a native file exists on the document folder, according to the native file's name (ie "imported document 123.pdf")
How To: Configure Document Export with Unmapped Export
Establish the CMIS Connection and CMIS Repository
Before configuring the Document Export activity, you must have created a CMIS Connection and imported a CMIS Repository. For more information on how to create a CMIS Connection and import a CMIS Repository refer to the CMIS Connection article. For this example, we will simply export to a Windows folder on a local drive.
|
Assign the Export Provider
The Document Export activity uses Export Providers to control where and how processed documents are exported from Grooper. Unmapped Export is one of these Export Providers. For this example, we will configure a Document Export activity in a Batch Process (However, these Document Export configuration steps would apply for other more ad-hoc methods of executing the activity as well). This is a simple Batch Process used to import purchase order documents, recognize their text, and extract some basic data from them. The last step in this Batch Process is a Document Export step.
|
Configure Unmapped Export
From here, you will configure the Unmapped Export of your documents. |
|
The most important part of Unmapped Export is telling Grooper where you want your documents to go. This is what the CMIS Repository property is for. Here, you will point to the CMIS Repository you've established earlier.
|
|
|
|
At this point, this is all you need to do to meet the bare minimum requirements of performing an Unmapped Export. The remaining property configurations are present to customize your export. The Content Type property will typically be set to File as you are exporting processed documents as files. The Update Link property will create a connection link between the exported document in its exported location and the document in the Batch in Grooper. You can disable this if you desire. The Content Format property allows you to customize what file format the exported document takes (for example exporting as a TIF or PDF document).
Metadata Export allows you to export a metadata "buddy file" with the document, such as an XML or CSV file. Path Cloning allows you to use part or all of the imported path as the output path. This can be useful for duplicating an imported folder structure upon output. Ignore Mapped Items allows you to skip over any documents who have their Document Types mapped for Mapped Export (more on how to do this in the Mapped Export section of the article). Press "Ok" when finished configuring the export. |
Reviewing the Export
Because nothing is mapped in an Unmapped Export, you have no control over what the document is named. Unmapped Export is a very simple or "raw" form of exporting to a storage platform. How the document is named will differ somewhat according the original content exported from Grooper.
|
|
|
Mapped Export
The Mapped Export provider is much more powerful. Mapped Export exports documents to a connected CMIS Repository just like Unmapped Export with the added bonus of exporting the document's metadata as well. Using this Export Provider, you can copy relevant data you've collected using to the connected storage platform. Or in other words, you can map properties from your Content Model, such as Data Fields in a Data Model, to an external repository. Because it can export both the document and its metadata, Mapped Export is considered a "full content" export.
Similar to the pre-2.72 CMIS Export Export Provider, Mapped Export uses user defined mappings to accomplish this. These mappings are set on a CMIS Repository's CMIS Content Type objects. Here, we will match fields or properties from the external storage platform with the associated fields, variables and values from a Content Model in Grooper. The Content Type assigned to the document Batch Folder (ie it's classified Document Type) should then match up with the corresponding mappings . For example, if a document has a Data Field for the "Document Date" it could be mapped to a "Document Date" field in the external storage platform.
Because Grooper needs content information from a Content Type in order to set up the mappings, all documents processed through Mapped Export must be classified. A document folder won't have a Document Type assigned to it without classifying the Batch. So, Grooper wouldn't have any information on how to map the documents during the Document Export activity.
To what extent you can map metadata to an external storage platform depends entirely on the platform's capabilities. Grooper has the ability to connect to a variety of storage platforms using CMIS Bindings. Each of them will have their own endpoints to which Grooper metadata is mapped. However, CMIS Bindings using a simple hierarchical file system (HFS), where folders and files are represented by simple object types, are severely limited when it comes to Mapped Export. HFS platforms are basic folder and file structures. The Microsoft Windows file system, NTFS, is an example of an HFS.
The following CMIS Bindings utilize simple HFS repositories.
- NTFS
- FTP
- SFTP
- OneDrive
These repositories aren't equipped to fully map metadata from a Content Model. There just isn't really any endpoint to map information from one point to another.
However, you can use Mapped Export with HFS repositories to perform simple mappings, such as renaming exported files and folders using extracted fields in a Content Model's Data Model. If you want to perform simple document indexing to use extracted data to name exported files and folders, you can still take advantage of Mapped Export, even for more basic HFS repositories.
How To: Set up Mappings for a Mapped Export
Import CMIS Content Types (If necessary)
To perform a Mapped Export, you must enable exporting on the repository's CMIS Content Type. These are objects created and imported after importing a CMIS Repository from a CMIS Connection. The CMIS Content Type objects house all the information Grooper needs concerning the readable and writable properties for files and folders in the storage platform. Depending on the CMIS Binding used, these objects will have different accessible properties.
For simpler CMIS Bindings (such as the NTFS binding) you may only have two Content Types that are imported by default, one that corresponds to documents in Grooper Batches and one for folders. For example, the NTFS binding has two CMIS Content Types: File for files in the Windows file system and Folder for folders.
|
|
More robust CMIS Bindings have more properties and metadata Grooper has access to. These CMIS Bindings will have "sub-types" of the default CMIS Content Types that need to be imported in order to fully utilize them for import and export. For example, the Exchange connection type, used to connect Grooper to Microsoft Exchange email servers, has two default CMIS Content Types: Item and Folder. However, each of these have their own sub-types. Item houses various email inbox items, such as email messages, contacts, appointments and tasks. Folder houses various different folders in an Exchange inbox.
|
|
To import a CMIS Content Type, first select the one you wish to import.
|
|
For this tutorial, we are going to use the Box.com connection type to export documents and extracted data using Mapped Export. The Box storage platform has the capability of adding "Metadata Templates" to folder locations. Using Mapped Export we can map Grooper extracted data from a document stored in a Data Model's Data Fields to the document's endpoint in Box. From Grooper we will extract the purchase order number, purchase order date, vendor name, and purchase total from a series of purchase orders. Upon executing the Document Export activity, using the Mapped Export provider, we will populate the data's corresponding location with the Grooper extracted values. |
|
|
Enable Export
If you want to perform a Mapped Export, you must enable exporting for a CMIS Content Type. This will allow you to map properties to and from the storage platform and Grooper.
|
|
Enabling export for a CMIS Content Type will allow you to use a Grooper Content Model to map extracted data to the storage platform endpoint. With Export Enabled set to True this will give us access to all the configuration settings we need to map Grooper metadata to the Box metadata. |
Configure Field Mappings
Now we're ready to map Grooper properties to the corresponding metadata locations of the Box.com repository.
|
|
With the appropriate level of a Content Model selected, next you will configure the property mappings on the CMIS Content Type.
|
|
|
Auto Mapping
Grooper also has an "Auto Mapping" capability. If the Data Field and its corresponding mapped location shares a similar name, you can assign all mappings at once with the click of a button.
|
|
Properties that share similar names are then mapped automatically. |
Expression Mapping
Expressions are snippets of .NET code, allowing Grooper to do various things outside its "normal" parameters. Grooper gives you the capability to use expressions when establishing export mappings. We will show you how to create a custom mapping for the file's name upon export. We will use the values of three Data Fields to from the file's name. If the vendor's name is "ACME" and the purchase order date is "06/12/2020" and the purcahse order number is "123456", the expression we will write will make the document's file name "ACME - 2020-06-12 - PO 123456" upon export.
|
|
|
|
But we're going to skip to the end. This is the full expression we're going to use.
Putting it all together, we will end up with file names such as "ACME - 2020-06-12 - PO 123456" upon export. |
Folder Mappings
You can also configure CMIS Content Types for folder creation and naming. Let's say we want to make a folder for every vendor for these purchase orders and put all documents for the vendor in their vendor folder. First we need to tell Grooper we want to make a folder in the storage platform endpoint.
Just like with fields, you can use information available to a Content Model to map how this folder level is created.
|
Other Export Configuration Considerations
The CMIS Content Type objects also allow you to configure export settings usually configured during the Document Export activity's configuration. These configurations can still be done when configuring Document Export or they can be configured here. The choice is up to you (If assigned here, they can also be overridden when configuring Document Export if you so choose).
|
|
|
How To: Configure Document Export with Mapped Export
Now that we have the mappings assigned, configuration of the Document Export activity is much the same for Mapped Export as it is for Unmapped Export.
Assign the Export Provider
The Document Export activity uses Export Providers to control where and how processed documents are exported from Grooper. Mapped Export is one of these Export Providers. For this example, we will configure a Document Export activity in a Batch Process (However, these Document Export configuration steps would apply for other more ad-hoc methods of executing the activity as well). This is a simple Batch Process used to import purchase order documents, recognize their text, and extract some basic data from them. The last step in this Batch Process is a Document Export step.
|
Configure Unmapped Export
From here, you will configure the Mapped Export of your documents. |
|
Much like with Unmapped Export, the most important part of Mapped Export is telling Grooper where you want your documents to go. This is what the CMIS Repository property is for. Here, you will point to the CMIS Repository you've established earlier and configured its CMIS Content Type object with export mappings.
|
|
At this point, this is all you need to do to meet the bare minimum requirements of performing a Mapped Export. All the mappings have already been configured. Grooper properties will be mapped to the external strage platform according to how to set them up on the CMIS Content Type. The remaining property configurations are present to customize your export. The Target Folder property will dictate where in the external repository's folder structure the documents are exported. In our example, we set a default target folder using the CMIS Content Type's Default Base Folder property. Setting this property will override that setting. If both are left blank, documents will be exported to the root folder of the repository. The Content Format property allows you to customize what file format the exported document takes (for example exporting as a TIF or PDF document).
Path Cloning allows you to use part or all of the imported path as the output path. This can be useful for duplicating an imported folder structure upon output. Ignore Unmapped Items allows you to skip over any documents who do not have mappings configured on the CMIS Content Type. Name Conflict Resolution is imported to avoid overwriting existing files or resolving duplicates. This can be set to:
Press "Ok" when finished configuring the export. |
Reviewing the Export
Upon export, our documents and their metadata will land in the external storage platform (in this case Box.com) according to the mappings we defined on the CMIS Content Type object.'
|
|
|
|
|
Version Differences
Box Integration (2.90)
Grooper 2.9 sees the addition of the Box.com document storage platform into the CMIS fold via the Box (CMIS Binding).
Legacy Providers (2.72)
Old import and export providers should be replaced with this new functionality. While Grooper's older import and export providers are available as "Legacy Import" and "Legacy Export" providers, these components are depreciated. They will still function but will no longer be upgraded in future versions of Grooper.
Grooper can import documents using CMIS Connections via Import Descendents and Import Query Results. Grooper can export via the CMIS Export providers, Mapped Export and Unmapped Export.
New Connection Types (2.72)
By creating the CMIS+ architecture, we have been able to create new connections between Grooper and content management systems. Grooper can now connect to Microsoft OneDrive, SharePoint, and Exchange via new CMIS Bindings. Since these were created as CMIS Bindings, they can be used by the CMIS Import and CMIS Export providers. Instead of having to create three new import providers and three new export providers for a total of six brand new components, we can use the already established CMIS import and export providers in the CMIS+ framework. A user can create a CMIS Connection using the OneDrive, SharePoint or Exchange bindings, and use the same import and export providers for them as any of the other CMIS Bindings.
This will also allow Grooper to create CMIS Bindings to connect to currently unavailable content management systems in the future much quicker and easier.
Link Expression (2.72)
New to 2.72, the "Link" object can expose several new CMIS properties to field expressions.
- Link.Name - Returns cmis:name
- Link.CreatedBy - Returns cmis:createdBy
- Link.CreatedTime - Returns cmis:creationDate
- Link.LastModifiedBy - Returns cmis:lastModifiedBy
- Link.LastModifiedTime - Returns cmis:lastModificationDate
- Link.AsDocument.MimeType - Returns cmis:contentStreamMimeType
- Link.AsDocument.Filename - Returns cmis:contentStreamFilename
There is also a new method available to return CMIS properties in field expressions:
- Link.GetCustomValue (Name)
- Returns the value of a CMIS property
- "Name" can be the ID, query name, or display name of a property
- Ex: Link.GetCustomValue ("invoice_no")