CMIS Export - 2021

From Grooper Wiki
Jump to navigation Jump to search

CMIS Export is one of the Export Types available when configuring an Export Behavior. It exports content over a CMIS Connection, allowing users to export documents and their metadata to various on-premise and cloud-based storage platforms.

CMIS Connections allow Grooper to standardize most, if not all, export configuration for a variety of storage platforms. This object can connect Grooper to both cloud based storage platforms, such as true CMIS content management systems, a Microsoft OneDrive account, or an Online Exchange email server, as well as on-premise platforms, such as a Windows file system or an on-premise Exchange server. It standardizes access to these platforms by exposing connectivity as if they were CMIS endpoints using the CMIS standard.

The CMIS Connection connects to an individual platform using a CMIS Binding, which defines the logic required for document interchange between Grooper and the storage platform. For example, the NTFS binding is used to connect to a Windows file system for import and export operations.

CMIS Export allows for the most advanced types of document export. It allows you to utilize document metadata and data Grooper extracts for export in a variety of ways. Many content management systems allow for document storage as well as storing metadata in fields in the storage platform. For applicable CMIS Bindings, CMIS Export document metadata and extracted data can be mapped to corresponding locations within the content management system, mapping a connection between objects or properties in a Content Model within Grooper (such as Data Fields in a Data Model) and their corresponding locations in the content management system (such as a column in a SharePoint site). Even for simpler platforms (like an NTFS file system) metadata can be used for file name and folder indexing.


About CMIS and CMIS Connections

Asset 22@4x.png

You may download and import the file below into your own Grooper environment (version 2021). This contains a Batch with the example document(s), a Content Model, and a Batch Process discussed in this article

CMIS stands for "Content Management Interoperability Services".  It is an open standard that allows different content management systems to inter-operate over the Internet.  This standard protocol allows Grooper to use many different platforms for importing and exporting documents and their contents.  Once a CMIS Connection is created, Grooper can exchange documents with these platforms. "Interoperability " means Grooper has the same access to control the system as a human being does. It is a "one-to-one" connection to the platform, allowing full and total control.

Upon connecting to an external content management system, Grooper will be able to see the "repositories" associated with it.  A repository, in computer science, is a general term for a location where data lives. Different systems refer to "repositories" in different ways.  An email inbox could be a repository. A folder in Windows could be a repository. A cabinet in ApplicationXtender could be a repository. It's a place to put things. We standardize the various terms used by various storage platforms to simply "repository".

2021-cmis-connection-about-01.png

These repositories are "imported" into Grooper as a CMIS Repository object, as a child of the CMIS Connection object. This doesn't import data into Grooper in the traditional sense of importing documents into a new Batch. "Importing" here is more like bringing the repository into a framework Grooper can use. Upon importing the repository, Grooper has full file access to that location in the storage platform.

For our purposes, repositories are like filing cabinets full of documents.  Once a connection is established, it's like giving Grooper a key to that cabinet.  You can open the various drawers of that cabinet. You can pull out files and put files into. The storage platform or content management system is like the cabinet.

  • The CMIS Connection object is like the key.
  • The CMIS Repository object is like a drawer in the cabinet.
  • You "connect" to the cabinet by turning the key. You "import" the repository by opening the drawer. Now you can see there are documents in there! You can take them out. You can read them and put them back in. You can put new ones in. You can use this "open" connection to the "drawer" however you need.

CMIS+ Architecture

Grooper expanded on this idea in version 2.72 to create our CMIS+ architecture. CMIS+ unifies all content platforms under a single framework as if they were traditional CMIS endpoints. Prior to version 2.72, there was only one type of CMIS Connection, a true CMIS connection using CMIS 1.0 or CMIS 1.1 servers. Now, connections to additional non-CMIS document storage platforms can be made via "CMIS Bindings". This provides standardized access to document content and metadata across a variety of external storage platforms.

Using this architecture, Grooper is able to create a simpler and more efficient import and export workflow, using a variety of storage platforms. You now use the CMIS Import Import Provider and the CMIS Export Export Type, regardless of the storage platform. They connect to a CMIS Repository imported from a CMIS Connection and use that as Grooper's import or export path.

How you create a CMIS Connection only differs from CMIS Binding to CMIS Binding, as each binding has a different way of connecting to it. You don't connect to an Outlook inbox the same way you connect to a Windows file folder, for example. Thus, the property configuration for the Exchange binding is different from the NTFS binding.

CMIS Bindings

A CMIS Binding provides connectivity to external storage platforms for content import and export. Grooper's CMIS+ architecture expands connectivity from traditional CMIS servers to a variety of on-premise and cloud-based storage platforms by exposing connections to these platforms as CMIS Bindings.

Each individual CMIS Binding contains the settings and logic required to exchange documents between Grooper and each distinct platform. For example, the AppXtender Binding contains all the information Grooper uses to connect to the ApplicationXtender content management system.

CMIS Bindings are used when creating a CMIS Connection object. The first step to creating a CMIS Connection is to configure the Connection Type property. Which binding you use (and therefore which platform you connect to) is set here. First, the user selects which CMIS Binding they want to use, selecting which storage platform they want to connect to. The second step is to enter the connection settings for that binding, such as login information for many bindings.

Current CMIS Bindings

Grooper can connect to the following storage platforms using below using CMIS Bindings:

  • AppXtender- Defining connection to the ApplicationXtender document management platform.
  • Box - Defining connection to the Box.com cloud storage platform.
  • FileBound - Defining connection to the FileBound document management platform.
  • IBM FileNet Connector - Defining connection to the FileNet content management platform.
  • CMIS - Defining connection to any content management systems using CMIS 1.0 or CMIS 1.1 servers.
  • The following Microsoft content platforms
    • Exchange - Defining connection to the Microsoft Exchange mail server platform (i.e. Outlook mailboxes).
    • OneDrive - Defining connection to the OneDrive cloud storage platform.
    • SharePoint - Defining connection to Microsoft SharePoint sites.
  • FTP - Defining connection to an FTP (File Transfer Protocol) server.
  • SFTP - Defining connection to an SFTP (SSH File Transfer Protocol) server.
  • IMAP - Defining connection to IMAP mail servers.
  • NTFS - Defining connection to the Microsoft Windows file system.

How To

Prereqs: Understanding the Content Model and Documents Used in These Tutorials

Asset 22@4x.png

You may download and import the file below into your own Grooper environment (version 2021). This contains a Batch with the example document(s) discussed in this tutorial and a Content Model configured according to its instructions.

In the following "how to" tutorials, we will use a simple Content Model used for purchase order and invoice processing. First, we should familiarize ourselves with the Content Model and some of the documents. Understanding our content, both in terms of the documents themselves as well as the Content Type and Data Model hierarchy of the Content Model will make it easier to follow along with the subsequent tutorials.

The Documents

In our sample test Batch, we have a series of documents we ultimately want to export in one way or another, using CMIS Export. In this Batch you will find the following kinds of documents:

2021-cmis-export-how-to-understanding-content-1.png

2021-cmis-export-how-to-understanding-content-2.png

2021-cmis-export-how-to-understanding-content-3.png

  1. If you've imported the zip for this tutorial, you will find a sample test Batch navigating the following path in the Node Tree:
    • Root Node > Batch Processing > Batches > Test > Export Activity > Sample Export Batch - POs and Invoices

2021-cmis-export-how-to-understanding-content-4.png

Click Me to Return To the Top

The Content Model

These documents are modeled by our example Content Model named "Export Example Model - POs and Invoices".

  1. If you've imported the zip for this tutorial, you will find a sample test Batch nagivating the following path in the Node Tree:
    • Root Node > Content Model > Export Activity > Export Example Model - POs and Invoices

2021-cmis-export-how-to-understanding-content-5.png

Our document set is represented by the Content Type hierarchy of our Content Model.

  1. The invoices from various vendors are modeled by the "Invoice" Document Type.
  2. The purchase orders are modeled by the "Purchase Order" Document Type.
  3. All the different pricing letters, notifying a vendor of a price increase, decrease or promotional types, are child Document Types of the "Price Letters" Content Category.
  4. The "Price Decrease Letter", "Price Increase Letter", and "Promo Price Letter" Document Types model these corresponding kinds of price letters.

2021-cmis-export-how-to-understanding-content-6.png

Click Me to Return To the Top

The Data Model Hierarchy

Part of modeling a document set with a Content Model and its component Document Types is modeling the data elements you wish to extract. This is done with one or more Data Models in the Content Model's hierarchy.

Any Content Type can have a child Data Model. Data you wish to extract is defined by adding child Data Elements to the Data Model, such as Data Field and Data Table objects. These objects are then configured with extractors to parse a Batch Folder's text data and return a value, stored as the document's index data when the Extract activity is executed.

Ultimately, understanding a Document Type's Data Model and how it inherits Data Elements from parent Content Types will be critical for configuring CMIS Export (and truly, any Export Type). We can use extracted data in a variety of ways from document folder pathing and naming to mapping extracted data to storage locations in content management systems (for those that support it). Understanding how the data flows through a Content Model's Content Type and thus Data Model hierarchy is necessary to understand how to call it out later on down the line during export.

  1. In our case the parent Content Model has a Data Model with a single child Data Field named "Document Date"
    • Extraction logic is already configured for this Data Field to return a date for any of our documents, such as an invoice date for our invoices or the letter date for our pricing letters.

2021-cmis-export-how-to-understanding-content-7.png

All child Document Types will inherit the Data Elements of their parent Content Type's Data Model. This means all Document Types will extract the "Document Date" Data Field when the Extract activity runs.

  1. For example, the "Invoice" Document Type has its own Data Model, with its own Data Elements.
    • These are various Data Fields and a Data Table (the one named "Invoice Line Items") that only relate to invoices.
  2. This kind of data is specific to invoices and will only be extracted if a Batch Folder is assigned the "Invoice" Document Type during extraction.
    • We can see these Data Elements in the Data Model preview panel with the Data Model selected in the Node Tree.
  3. However, the "Invoice" Document Type's Data Model inherits any Data Elements from any parent Content Type.
    • In this case, the "Document Date" Data Field is inherited from the parent Content Model's Data Model.
    • This Data Field also shows up in the "Invoice" Data Model's preview panel. In essence, it becomes a part of the "Invoice" Document Type's Data Model.

2021-cmis-export-how-to-understanding-content-8.png

  1. Similarly, the "Purchase Order" Document Type has its own child Data Model with its own Data Elements relating just to purchase orders.
    • We only want these Data Elements extracted if the Batch Folder is classified as the "Purchase Order" Document Type.
  2. However, it too is a child of the parent Content Model. As such, it inherits the Content Model's Data Model as well.
    • So, it too has a "Document Date" Data Field as part of its Data Model.
FYI You may have noticed the "Invoice" Document Type and "Purchase Order" Document Type both have a "PO Number" and "Vendor" Data Field in their Data Models.

Be aware, these are two separate objects in the Node Tree. They have different extractors extracting their data. These Data Fields extract data in different ways depending on the Batch Folder's Document Type. They just happen to share the same name.

However, they are in totally different locations in the Content Model's hierarchy, and thus are distinct objects.

2021-cmis-export-how-to-understanding-content-9.png

  1. For the three different kinds of pricing letters, a Data Model is added to the "Price Letters" Content Category.
    • This Data Model has its own pricing letter related Data Elements.
  2. Remember, Data Elements flow through a Content Model's Content Type hierarchy. The "Price Letters" Content Category is the parent Content Type of the three pricing letter Document Types ("Price Decrease Letter", "Price Increase Letter", and "Promo Price Letter").
  3. As such, any Batch Folder assigned one of these three Document Types will inherit the Content Category's Data Elements for its Data Model.
    • Furthermore, for our made up use case here, the individual pricing letter Document Types don't have their own Data Models. We don't need them! For each of these three Document Types we want to extract the same set of data. The parent "Price Letter" Content Category's Data Model will apply to all three Document Types. Creating a unique Data Model for each Document Type would be a waste of time, in our case.
  4. Not only that any grandparent Data Elements are inherited as well, such as the top level Content Model's "Document Date" Data Field.

2021-cmis-export-how-to-understanding-content-10.png

Click Me to Return To the Top

The Batch Process

  1. If you've imported the zip for these tutorials, you will find a sample test Batch navigating the following path in the Node Tree:
    • Root Node > Batch Processing > Processes > Working > Export Activity > Export Process - POs and Invoices
    • This is a simple Batch Process used to process our document set. It will recognize their text, classify them according to our Content Model's configuration, and extract data (as described by the previous tab). The documents in the sample test Batch provided have been processed according to this Batch Process.
  2. The last step in this Batch Process is an Export activity step.
    • As we discuss CMIS Export set up in the following how-to tutorials, keep in mind the Export activity is the activity that drives document export. It's all part of a process (a Batch Process) where content is ingested into a Batch, processed by various activities, and ultimately exported by the Export activity.
  3. The Export activity exports Batch Folder document content according to an Export Behavior configuration.
    • CMIS Export is one way to get that done, exporting content over a CMIS Connection. Specifically, as will be described in the next tutorial, CMIS Export is an Export Type configuration for an Export Behavior.
    • FYI: The Export step in this Batch Process will be unconfigured if you imported the zip for these tutorials. Part of configuring CMIS Export involves connecting to external systems. Obviously we can't connect to your personal storage environments. However, this content will get you started to follow along using the subsequent lessons.

2021-cmis-export-how-to-understanding-content-11.png

Click Me to Return To the Top

Basic CMIS Export

CMIS Exports can range from very simple exports of Batch Folder content, to more complex exports, utilizing Grooper extracted content in a variety of ways. We will start with the most basic configuration of a CMIS Export. These steps will be largely applicable to any CMIS Export. By the end of this tutorial, we will export PDF files generated from the image and OCR text content of the Batch Folders in our Batch, as well as an XML metadata file generated from the extracted Data Model Elements for each Batch Folder.

Establish the CMIS Connection and CMIS Repository

Before configuring a CMIS Export, you must have created a CMIS Connection and imported a CMIS Repository. For more information on how to create a CMIS Connection and import a CMIS Repository refer to the CMIS Connection article.

For this example, we will simply export to a Windows folder on a local drive.

  1. We have created a CMIS Connection using the NTFS Connection Type
  2. We have imported a CMIS Repository connecting Grooper to a folder named "Grooper Import Export".
  3. And we will be exporting to this subfolder named "Export".

2021-cmis-export-how-to-understanding-content-30.png

Click Me to Return to the Top

Configure an Export Behavior

CMIS Export is one of the Export Type options when configuring an Export Behavior. Export Behaviors control what document content for a Batch Folder is exported where, according to its classified Document Type. As such, in order to configure a CMIS Export, you must first configure an Export Behavior for a Content Type (a Content Model or its child Content Categories or Document Types).

Export Behaviors can be configured in one of two ways:

  1. Using the Behaviors property of a Content Type object
    • A Content Model
    • A Content Category
    • Or, a Document Type
  2. As part of the Export activity's property configuration
FYI In general, users will choose to configure Export Behaviors either on the Content Type object it applies to or local to the Export activity step in a Batch Process.

This may just boil down to personal preference. There is no functional difference between an Export Behavior configured on a Content Type or an Export Behavior configured on an Export Step, upon completing their configuration. In either case, they will accomplish the same goal.

However, it is possible to configure Export Behaviors, in both locations. If you do this, you will need to understand the Export activity's Shared Behavior Mode property options. This will effect if and how two Export Behaviors configured for the same Content Type will execute. Please visit the Export article for more information.

Option 1: Content Type Export Behaviors

An Export Behavior configuration can be added to any Content Type object (i.e. Content Models, Content Categories, and Document Types) using its Behaviors property. Doing so will control how a Document Type "behaves" upon export.

  1. For example, here we have a Content Model selected in the Node Tree.
  2. To add an Export Behavior first select the Behaviors property.
  3. Then, press the ellipsis button at the end of the property.

Export-export-behaviors-1.png

  1. This will bring up the Behaviors collection editor window.
  2. Press the "Add" button.
  3. Select Export Behavior.
    • FYI: Children Content Type objects will inherit export settings from their parent Content Type's Export Behavior configuration
    • Also, you can only configure one Export Behavior per Content Type object. However, you can configure an Export Behavior for any Content Type in a Content Model. Functionally, this is how you add multiple Export Behaviors for a single Content Model.
      • For example, if every Document Type needed a unique Export Behavior configuration, you could configure the Behaviors property for each one, adding one Export Behavior to the Behaviors list for each one.

Export-export-behaviors-2.png

  1. You will see the Export Behavior added to the Behaviors list.
  2. Selecting it, you can now add one or more Export Definitions with the Export Definitions property.

Export-export-behaviors-3.png

Option 2: Export Activity Export Behaviors

Export Behaviors can also be configured as part of the Export activity's configuration. These are called "local" Export Behaviors. They are local to the Export activity in the Batch Process.

  1. For example, here we have a working Batch Process selected in the Node Tree.
    • This is a simple Batch Process used to import purchase order, invoice, and other related documents, recognize their text, and extract some basic data from them. The last step in this Batch Process is an Export step.
  2. Select the Export step of the Batch Process.
  3. To add an Export Behavior, select the Export Behaviors property.
  4. Then, press the ellipsis button at the end of the property.

Export-export-behaviors-4.png

  1. This will bring up the Export Behaviors collection editor window.
  2. Press the "Add" button to add a new Export Behavior
  3. An Export Behavior will be added to the list.
  4. With the Export Behavior selected you must define which Content Type the behavior applies to using the Content Type property.
    • Note in both cases, a Content Type is involved in configuring Export Behaviors. Whether local to the Export activity or as part of a Content Model's configuration, Grooper needs to know what to do upon export, given a certain Content Type (and its children Content Types if scoped to a Content Model or Content Category). Once Grooper knows what kind of document it's looking at, we can then inform it what to do in terms of exporting its document content.
  5. Using the dropdown menu, select which Content Type scope should utilize the Export Behavior by selecting either a top-level parent Content Model or one of its child Content Categories or Document Types.
    • Keep in mind you can only select a single Content Type here. You can only configure one Export Behavior per Content Type object.
    • Again, children Content Type objects will inherit export settings from their parent Content Type's Export Behavior configuration. However, multiple Export Behaviors may be added locally to the Export activity.
      • For example, if every Document Type needed a unique Export Behavior configuration, you could add one Export Behavior to the list for each one.

Export-export-behaviors-5.png

  1. Once a Content Type is selected, you can add one more more Export Definitions with the Export Definitions property.

Export-export-behaviors-6.png

Click Me to Return to the Top

Add an Export Definition

  1. Going forward in this tutorial, we will scope our Export Behavior to the parent Content Model "Export Example Model - POs and Invoices".
    • What Content Type scope in your Content Model you choose will be paramount if you want to use extracted Data Element values for data mapping purposes. However, we do not need to concern ourselves with that for this tutorial. This is the most basic (or "unmapped") version of a CMIS Export. For more information on data mapping, visit the Perform a Mapped CMIS Export tutorial later in this article.
  2. We will choose to configure the Export Behavior using "Option 1", adding it to the Content Model's set of Behaviors properties.

2021-cmis-export-how-to-understanding-content-12.png

Regardless if you choose to configure the Export Behavior on a Content Type object, or if you configure it local to to Export activity's configuration, your next step is adding an Export Definition.

  1. Once you've added an Export Behavior, select the Export Definitions property.
  2. To add an Export Definition, press the ellipsis button at the end of the property.

2021-cmis-export-how-to-understanding-content-13.png

  1. This will bring up an Export Definition list editor to add one or more Export Types.

2021-cmis-export-how-to-understanding-content-14.png

Click Me to Return to the Top

Add a CMIS Export

Export Definitions functionally determine three things:

  1. Location - Where the document content ends up upon export. In other words, the storage platform you're exporting to.
  2. Content - What document content is exported: image content, full text content, and/or extracted data content.
  3. Format - What format the exported content takes, such as a PDF file or XML data file.

Export Definitions do this by adding one or more Export Type configurations to the definition list. The Export Type you choose determines how you want to export content to which platform. In our case, we want to use a CMIS Connection to export content to a connected CMIS Repository. We will add a CMIS Export to the definition list.

  1. To do this, press the "Add" button.
  2. Choose CMIS Export from the list.

2021-cmis-export-how-to-understanding-content-15.png

  1. This will add an unconfigured CMIS Export to the Export Definitions list.
  2. For all CMIS Export configurations, you must choose what CMIS Repository using the CMIS Repository property.
  3. Location! Location! Location! This is how Grooper knows where you want to export your document content. In our case, we want to export documents to a folder in the "Grooper Import Export" folder on this machine's Windows hard drive.
    • Using the drop down menu, we will select the "NTFS - Grooper Import Export" CMIS Connection's CMIS Repository named "Grooper Import Export".

2021-cmis-export-how-to-understanding-content-16.png

Selecting an Export Subfolder Location

With a CMIS Repository selected, a set of Filing Location properties will appear.

  1. The Target Folder property allows you to select a subfolder in the repository as the export path, rather than just putting documents at the root of the repository.
  2. Selecting this property, you can press the ellipsis button at the end to bring up a navigation window to select a subfolder.
  3. This will bring up the following window. The CMIS Repository you selected for the CMIS Repository property will be at the root of this node tree. Expand the root and subsequent nodes to explore the folder structure of the CMIS Repository.
  4. In our case, we selected the subfolder here, named "Export"

2021-cmis-export-how-to-understanding-content-17.png

Now that we know what location we're exporting to, we need to define what content we want to export and what format that content should take.

  1. In a very general way, that's what the Object Data set of properties are all about.
    • More specifically, we will use the Export Formats property to export a simple PDF for each Batch Folder in our sample test Batch.

2021-cmis-export-how-to-understanding-content-19.png

Click Me to Return to the Top

Configure Export Formats

Next, we need to tell Grooper what content we want to export and how we want to export it. Keep in mind, we're exporting to a Windows file folder. Regardless of the storage system, we are always in some way limited by the constraints of the storage system. Some have greater capabilities to house customized metadata fields, for example. The NTFS file system, however, is pretty basic. It is a hierarchical file system to store and organize files.

So, what can we export? Files. The good news is there's all kinds of different file formats out there. Even just exporting simple files, we can get Grooper processed content (the document's image, the document's full text data, and the document's index data) out of Grooper. To do this, we will add Export Formats to dictate what content exports to what file type.

There are a variety of Export Formats available to generate and export content from Grooper.

  1. PDF Format - This will output a PDF file from the Batch Folder content. This includes capabilities to embed full text data obtained from the Recognize activity.
  2. XML Metadata - This will output extracted Data Model values to an XML file.
  3. JSON Metadata - This will output extracted Data Model values to a JSON file.
  4. Simple Metadata - This will output extracted Data Model values to a text file.
    • This file formats Data Fields and their values as simple "key-value pairs"
  5. Delimited Metadata - This will also output extracted Data Model values to a text file.
    • This formats Data Field values as a delimiter-separated value array.
  6. TIF Format - This will output image content only as a TIF file.
  7. Text Format - This will output full text content only, generated from OCR data, as a text file.
  8. Attachment - For document files that were imported from a digital source, this will output the Batch Folder's attachment file. This option can also output any file attached to a Batch Folder by referencing a filename. This is how Grooper exports custom generated files from activities such as XML Transform or custom scripted activities.
    • If the Batch Folder has no attachment, this option will generate an image version of the document from all child Batch Pages in the folder.

We will export PDF files generated from the image and OCR text content of the Batch Folders in our Batch, as well as an XML metadata file generated from the extracted Data Model Elements for each Batch Folder.

  1. To add an Export Format, first select the Export Formats property.
  2. Press the ellipsis button at the end of the property.

2021-cmis-export-how-to-understanding-content-20.png

  1. This will bring up the Export Formats collection editor.
  2. By default, there will always be an Attachment Export Format present in the list.
  3. We're going to use a different format. We will get rid of it by selecting it in the list, and pressing the "Delete" button.

2021-cmis-export-how-to-understanding-content-21.png

  1. To add a new Export Format, press the "Add" button.
  2. Select the format you wish to output from the list.
    • We will first choose the PDF Format.

2021-cmis-export-how-to-understanding-content-22.png

  1. This will add a PDF Format to the list of Export Formats.
  2. With an Export Format selected, the right panel will allow you to further configure the exported file.
    • For example, in our case, we've enabled the Searchable property under Build Options. This will embed the full text data generated by the Recognize activity in our Batch Process into each page in the PDF.

2021-cmis-export-how-to-understanding-content-23.png

You can add as many Export Formats as you want. This allows you to export multiple files generated from the Batch Folder content in your Batch. For example, we've extracted data from our documents, using the Extract activity of our Batch Process. We can create an XML metadata file with all that data using the XML Metadata Export Format.

  1. To add a new Export Format to the list, press the "Add" button again.
  2. Select the additional file format you wish to output from the list.
    • We will choose XML Format.

2021-cmis-export-how-to-understanding-content-24.png

Upon executing the Export activity, now two files will be exported for each Batch Folder in the Batch, one for each Export Format in our list.

  1. A PDF file generated from the PDF Format.
  2. An XML file generated from the XML Metadata.
  3. Press "OK" on this and all subsequent windows to save your changes.

2021-cmis-export-how-to-understanding-content-25.png

Click Me to Return to the Top

Export the Documents

With the Export Behavior configured, we can now test our export.

  1. The Export activity in our Batch Process will apply our Export Behavior to every Batch Folder in the Batch.
  2. FYI: Because we configured the Export Behavior on our Content Model (using its Behaviors property editor), we do not have to configure the Export activity's local properties.
    • We've given Grooper all the information it needs to export content. The Export activity will go through every Batch Folder, one by one, in the Batch. It will see the Batch Folders are classified with one of the Document Types in our Content Model. Since we configured the Export Behavior on the Content Model, all child Document Types will use its configuration settings to export document content.

2021-cmis-export-how-to-understanding-content-26.png

We will test our export using the Export activity's "Unattended Activity Tester" tab.

  1. Expand the Batch Process to reveal its child Batch Step nodes.
  2. Select the Export activity step.
  3. Switch to the "Unattended Activity Tester" tab.
  4. Press the "Process All..." button.
    • On the subsequent screen press the "Start" button to start processing the Batch. This will apply the Export activity, as configured in the Batch Process to all items in the activity's scope (Folder Level 1 in our case)

2021-cmis-export-how-to-understanding-content-27.png

Success! We exported the documents in our Batch! All of this was made possible by our Content Model's Export Behavior using the CMIS Export.

  1. All files are exported to the connected CMIS Repository location using our NTFS CMIS Connection.
  2. For each, Batch Folder, two files were exported for each Export Format added and configured.
    • A PDF file from the PDF Format
    • An XML file from the XML Format

2021-cmis-export-how-to-understanding-content-28.png

FYI This truly is about the most basic export you could do using CMIS Export. There's a lot more functionality available to CMIS Export to get data out of Grooper and use that data to index your documents better.
  1. If nothing else, the these exported document filenames could stand improvement. Because these files were originally brought into Grooper as imported PDF files, as we configured CMIS Export, the generated filenames are simply copied from whatever the original file's name was.

In the next tutorial, we will introduce the concept of "data mapping". We will use extracted data to form folder levels and filenames, mapping Grooper extracted metadata to folder and file metadata upon export.

2021-cmis-export-how-to-understanding-content-29.png

Click Me to Return to the Top

Intro to Data Mapping: Folder Pathing and File Naming

Now that we know the basics of exporting using CMIS Export, we can get into some more advanced functionality. CMIS Export is the most powerful method of exporting document folder content for two main reasons:

  1. It leverages the capabilities of multiple different storage platforms by exposing them to Grooper as CMIS Repositories of CMIS Connections.
  2. Data Mapping

"Data Mapping" refers to the concept of exporting data, in one way or another, by mapping Grooper extracted document data (such as Data Field values collected from the Extract activity) to file, folder, or storage location metadata. In this tutorial, we will map data to create subfolders and file names for our exported documents. This will allow us to better index our file paths and file names upon export.

We will edit the CMIS Export configuration in the previous tutorial to do this.

The Subfolder Path Property

First, we will use CMIS Export's Subfolder Path to establish subfolder locations for our files.

  1. In the Filing Location set of properties of a CMIS Export property panel, select Subfolder Path.
  2. Press the ellipsis button at the end of the property.

2021-cmis-export-how-to-data-mapping-01.png

  1. This will bring up a "Path Expression" editor window.
    • Using .NET expressions, you can dynamically path exports using document metadata available to Grooper.
  2. The expression here CurrentDocument.ContentTypeName would update the export path to a subfolder whose name matches the exported Batch Folder's Document Type assigned during classification.
    • For example, our export path before was C:/Grooper Import Export/Export. Now, Grooper will export documents assigned the "Invoice" Document Type would export to C:/Grooper Import Export/Export/Invoice.
    • Furthermore, if the folder does not exist, Grooper will create a new folder for the exported files.

2021-cmis-export-how-to-data-mapping-02.png

Additional subfolder levels can be created with a simple addition to the expression.

  1. To path the export to another subfolder (or create a new one) add the following to the end of the expression:
    • + "/" +
  2. As you type the expression, IntelliSense will help you along to complete the code. It will show you potential code snippets you could add to complete the expression.
    • For example, any Data Elements available for use. We could use the "Document Date" Data Field for our next folder level, creating a new subfolder for every document from the date Grooper extracts during the Extract activity.

2021-cmis-export-how-to-data-mapping-03.png

  1. The complete expression here CurrentDocument.ContentTypeName + "/" + Document_Date.ToString("yyyy-MM-dd") would path the export to an additional subfolder level, based on whatever the Batch Folder's Document Type is and extracted "Document Date" Data Field's value is.
    • For example, if you have a purchase order (assigned the "Purchase Order" Document Type) and Grooper extracted the purchase order date as "01/31/2021" it would be exported to the following folder path:
    • C:/Grooper Import Export/Export/Purchase Order/2021-31-01
    • FYI: The .ToString("yyyy-MM-dd") portion of the expression converts the Document_Date value from a date value type a string value type and reformats the date to a "year-month-date" format.

2021-cmis-export-how-to-data-mapping-05.png

Upon executing the Export activity documents will be dynamically placed in folders according to the Subfolder Path property's expression.

  1. In our case, first in a folder for the Batch Folder's Document Type.
  2. Then in a second folder level from the Batch Folder's' extracted "Document Date" value.

2021-cmis-export-how-to-data-mapping-06.png

Click Me to Return to the Top

Write Mappings for File Naming

Next, we will use the Write Mappings property to more dynamically name our files, using data content of the Batch Folders, such as extracted Data Field values.

Write Mappings get into the more advanced capabilities of CMIS Export. With Write Mappings, you can edit the exported file's metadata by mapping Grooper Data Model Data Elements or other Grooper generated metadata to the metadata available to the storage location. Depending on the CMIS Connection's storage location, you will have access to different document metadata properties. More robust content management systems actually allow you to store extracted field values along with something like a PDF file (We will cover this in the next how-to tutorial). Some connection types, like our NTFS CMIS Connection, are more limited in what document metadata you can edit.

However, all CMIS Connection types allow you to map data to generate a file name for exported files.

  1. Select the Write Mappings property.
  2. Press the ellipsis button at the end to configure your data mappings.

2021-cmis-export-how-to-data-mapping-08.png

  1. This will bring up the CMIS Export Map window.
  2. Any property you see here is an editable file metadata property. Using these property definitions, we can dictate the exported file's metadata by mapping these properties to Grooper extracted values.
    • Depending on the CMIS Connection's type (or CMIS Binding), different metadata properties will be editable. These are the editable file metadata for the NTFS Connection Type. You may see more for some connection types. You may see less. You may see a widely different set of available properties. However, for nearly all CMIS Connection types, you will always see a Cmis Name property.
    • We say "nearly" because the AppXtender connection type is the odd exception. It actually does not have a Cmis Name property available for mapping. However, we discuss this further in the next tutorial.

2021-cmis-export-how-to-data-mapping-07.png

  1. We can edit the exported files' names, using the Cmis Name property.
  2. Using the dropdown list, you can map a single Grooper extracted value to the file's name.
    • Here, you will see any Data Field available to the Export Behavior's Content Type scope. We configured this Export Behavior on our Content Model. There is only a single Data Field in its Data Model, the "Document Date" Data Field. That's why we see "Document_Date" as an option. If we chose Document_Date from this list, each file would be named as whatever date was extracted for that Data Field.
    • You may be thinking "But what about all those other Data Fields? Isn't there a Data Field for the invoice number or purchase order? Yes there is, but those are out of scope. This is why understanding your Data Model hierarchy is critical for CMIS Export. If you need to utilize those Data Elements from those child Data Models, you will need to configure Export Behaviors at the appropriate scope. For example, on the "Invoice" and "Purchase Order" Document Types instead of on their parent Content Model.
    • You will also see some mappable expression snippets, like "CurrentDocument.ContentTypeName". Choosing CurrentDocument.ContentTypeName will name the file after the Batch Folder's classified Document Type.

2021-cmis-export-how-to-data-mapping-09.png

You can also map data by crafting an expression. This is the route we will take. What if we want to name our documents after two or more pieces of information Grooper collects? For example, if an invoice is dated 01/30/2021, we want to name the PDF file generated "Invoice - 2021-30-01.pdf", using both the Document Type and the collected "Document Date" value. We can't do that with a simple dropdown. But we can with an expression!

  1. Expand the Cmis Name property.
  2. Select the Expression property.
  3. Press the ellipsis button at the end.

2021-cmis-export-how-to-data-mapping-10.png

  1. This will bring up an expression editor window.
    • Using this expression editor, you can stitch together a custom file name, using a .NET code snippet.
  2. We've entered the following expression:
    • CurrentDocument.ContentTypeName + " - " + Document_Date.ToString("yyyy-MM-dd")
    • This will create file names as described earlier, using the Batch Folder's classified Document Type and collected "Document Date" value, separated by a hyphen (FYI: That's why the " - " part of the expression is present. You're literally adding the characters in quotes to the file name.
    • In other words, an invoice dated 01/30/2021, if exported as a PDF will be named "Invoice - 2021-31-01.pdf"
  3. When finished editing the expression, press "OK" to save.

2021-cmis-export-how-to-data-mapping-11.png

  1. Upon Export, our files are now named dynamically, according to the expression we configured for the Cmis Name mapping.
  2. Furthermore, note all files generated by each Export Format will be named the same.
    • We had two Export Formats configured: one PDF Format and one XML Metadata
    • Obviously, the extension will be different, because they are different file types. Just keep in mind, the mapping will apply to multiple files if multiple Export Formats are configured for the same CMIS Export.

2021-cmis-export-how-to-data-mapping-12.png

Click Me to Return to the Top

Advanced Data Mapping: Exporting Field Data

Understanding the Endpoint

Grooper can leverage the capabilities of various storage platforms to store collected Data Field values, as long as that platform has corresponding metadata Grooper can map. Understanding the endpoint storage location and its capabilities is the first step to more advanced data mapping.

For example, the SharePoint, Box, and ApplicationXtender platforms all have capabilities to store field information, in one way or another. Each platform allows the user to create custom metadata properties, to which Grooper can map collected data.

For SharePoint, custom metadata fields are added as column values of a Document Library. On this SharePoint site, there are some custom columns added for storing data relating to purchase orders.

  1. PO Number
  2. PO Date
  3. Vendor
  4. PO Total

2021-cmis-export-how-to-data-mapping-field-mapping-01.png

For Box, custom metadata is added by creating an metadata template for documents and/or folders. This metadata template has fields that also relate to purchase orders.

  1. PO Number
  2. PO Date
  3. Vendor
  4. Total

2021-cmis-export-how-to-data-mapping-field-mapping-02.png

For ApplicationXtender, fields are added to house values when the application repository is created. This application is designed to store purchase orders, and as such has some editable purchase order related fields added.

  1. PO Date
  2. PO Number
  3. Vendor
  4. Order Total

2021-cmis-export-how-to-data-mapping-field-mapping-03.png

In this tutorial, we will review how to map Data Field values Grooper extracted to these metadata endpoints in these storage locations.

We will use the Write Mappings property of CMIS Export to export this data Grooper extracts to these metadata locations.

2021-cmis-export-how-to-data-mapping-field-mapping-04.png

Click Me to Return to the Top

Choosing the Right Scope

Arguably the most important part of data mapping a CMIS Export is choosing the right scope for your Export Behavior. This will determine which Data Elements are available for mapping.

  1. In our previous tutorials, we scoped the Export Behavior to the parent Content Model.
  2. The Export Behavior was configured using its Behaviors property.
  3. This means the only Data Elements accessible are those belonging to its direct child Data Model.
    • This isn't going to work for us. We only have access to map the "Document Date" Data Field, which extracts the purchase order date. But we also need the map the PO number, the vendor, and the order total extracted from the "Purchase Order" Document Type's Data Model.

2021-cmis-export-how-to-data-mapping-field-mapping-05.png

  1. We will need to scope our Export Behavior to the "Purchase Order" Document Type to access all the Data Fields we need.
  2. This will give us access to its direct child Data Model.
    • These Data Fields ("PO Number", "Vendor" and "Total") will now be mappable using CMIS Export.
  3. As well as inherited Data Elements from any parent Data Models.
    • i.e. the "Document Date" Data Field
  4. The good news is we are not just limited to configuring an Export Behavior on a Content Model. We can configure an Export Behavior for the "Purchase Order" Document Type using its Behaviors editor as well.
    • Furthermore, we can have both the Export Behavior configured on the parent Content Model and the Export Behavior configured on a child Content Type (such as our "Purchase Order" Document Type). The child Content Type's Export Behavior will supersede any parent Content Type's Export Behavior.

2021-cmis-export-how-to-data-mapping-field-mapping-06.png

FYI If you were going the Export activity configuration route, rather than Content Type configuration route, you would select the scope when adding an Export Behavior to the Export Behaviors collection editor.
  1. Using the Content Type property, you can scope down to the appropriate level in the Content Model's hierarchy to access the Data Model Elements you need.
  2. In our case, we would scope to the "Purchase Order" Document Type by selecting it from the dropdown.
  3. However, when adding multiple Export Behaviors to the list, they execute in list order, first to last (or top to bottom if you prefer to look at it that way).
    • Imagine this second Export Behavior was scoped to the parent Content Model. The first Export Behavior would export just the "Purchase Order" Batch Folders. Then, the second Export Behavior would export any Batch Folders of any other Document Type in the Content Model.
    • If the order was reversed, the Content Model scoped Export Behavior would fire first, exporting all Batch Folders of all its Document Types, including "Purchase Order" Batch Folders. Then, the "Purchase Order" scoped Export Behavior (now second in list order in this scenario) would do nothing. The "Purchase Order" Batch Folders would have already been exported by the first Export Behavior.

2021-cmis-export-how-to-data-mapping-field-mapping-07.png

Click Me to Return to the Top

CMIS Connection Considerations

Depending on the storage location you're connecting to, you may need to be aware of some additional property settings in order to properly map and export data. Below, you will see how we configured the CMIS Connection configuration for the three CMIS Bindings: SharePoint, Box, and AppXtender

SharePoint

Connecting Grooper to SharePoint sites is mostly straightforward, but there are some key things to keep in mind when you want to map data to a SharePoint folder.

  1. First, you must define where the site is! That's what our Base URL property is doing for us.
    • This is a SharePoint site we use for testing and demos here at Grooper.
  2. Next, you'll need to authenticate the connection with username and password information. This is what ultimately allows Grooper to connect to SharePoint. It's acting in your stead to pull and push documents to the site.
  3. In most cases, you'll want to turn Enable Subsites to True.
    • In our case the Document Library we are exporting to is a subsite of our parent site listed in the Base URL. For sure, we need to enable this property.
  4. Turning Enable Library Types to True is critical for mapping data from Grooper to column values in SharePoint.
    • This property will expose the various sites and subsites additional metadata properties to Grooper as file subtype objects, allowing us to map data.
  5. Last, but not least, we've imported the root of this SharePoint site as our CMIS Repository. When configuring CMIS Export, we'll end up exporting to a subsite/subfolder location.

2021-cmis-export-how-to-data-mapping-field-mapping-08.png

Box

Configuring a Box connection is even less involved, but there is one key property you'll need to enable as well.

Connecting Grooper to SharePoint sites is mostly straightforward, but there are some key things to keep in mind when you want to map data to a SharePoint folder.

  1. All you need to connect to a Box.com account is a Box User ID.
    • The account must be an enterprise account. You can find this number in your Box account's "Account Settings" under "Account ID". Visit the Box article for more information.
  2. You must turn the Use Metadata property to True to map data.
    • This will expose those Box Metadata Templates to Grooper as file subtype objects, allowing Grooper to point to them and their metadata property locations.
  3. Much like with our SharePoint connection, we imported the root of the Box account as our CMIS Repository. We'll end up exporting to a subfolder location as well.

2021-cmis-export-how-to-data-mapping-field-mapping-09.png

ApplicationXtender

When mapping data over an AppXtender connection, the only thing you really need to worry about is which repository you're connecting to.

  1. As far as the connection settings go, all we need to concern ourselves for our purposes is basic connection requirements: the web services url for your account, the data source you want to connect to, and logon information.
  2. The key point is importing the right CMIS Repository.
    • Metadata fields are defined when these repositories are created in ApplicationXtender. So we just need to import the right one, so Grooper can map to the right metadata locations. In our case, this repository named "WIKI" is the one we've set up for our purchase order export demonstration.

2021-cmis-export-how-to-data-mapping-field-mapping-10.png

Click Me to Return to the Top

Add a CMIS Export

Mapping data to export to our field storage location endpoints, is part of the CMIS Export configuration. So, we need to add a CMIS Export definition!

  1. As discussed previously, we're scoping this export to the "Purchase Order" Document Type so we can access its Data Model's Data Fields for mapping.
  2. Using the Behaviors property, we will first add an Export Behavior.
  3. Here, we've added the Export Behavior to the Behaviors list.
  4. CMIS Exports are then added as an Export Definition.
  5. Here, we've added a CMIS Export to the Export Definitions list.
  6. Next, we will need to configure the export, first selecting the CMIS Repository we want to export to.

2021-cmis-export-how-to-data-mapping-field-mapping-11.png

Click Me to Return to the Top

Mapping Configuration Examples

SharePoint

  1. For the CMIS Repository, we've selected our SharePoint CMIS Repository.
  2. Using the Target Folder property, we've selected the subsite we're exporting documents and data to.
    • The target folder must be a Document Library subsite in SharePoint. In our case, it is that Document Library seen earlier with custom metadata columns relating to purchase order data.
  3. This auto-populates the Object Type property.
    • This gives Grooper an object we can map metadata properties to, ultimately exporting data to the custom metadata columns in the SharePoint Document Library.
  4. We've also added a PDF Format to export a PDF file for our Export Format.
  5. Next, we will configure our data mappings using the Write Mappings property.

2021-cmis-export-how-to-data-mapping-field-mapping-12.png

  1. These mappable properties, "PO Number", "PO Date", "PO Total" and "Vendor", are all those editable column properties in the SharePoint site.
  2. Since we've scoped this export to the "Purchase Order" Document Type, we now have access to map Data Fields from its Data Model.
    • Furthermore, we also have access to the "Document Date" Data Field inherited from the parent Content Model's Data Model.

2021-cmis-export-how-to-data-mapping-field-mapping-13.png

  1. As far as our mappings go, all we have to do is select the Grooper Data Field that corresponds to the SharePoint metadata property. using the drop menu.
    • For example, the PO Total mapping, here, would populate the "PO Total" column value in SharePoint with the Grooper extracted value for the "Total" Data Field, for each exported document.
  2. Furthermore, since we're at a lower level in our Content Model's hierarchy, we have more access to more Data Elements when writing expression based mappings as well. For the Cmis Name mapping, we used the following expression:
    • CurrentDocument.ContentTypeName + " - " + Document_Date.ToString("yyyy-MM-dd") + " - " + PO_Number
    • Now we have access to the "PO Number" Data Field, whereas we did not in our earlier examples.

2021-cmis-export-how-to-data-mapping-field-mapping-14.png

When the Export activity processes the "Purchase Order" Batch Folders, the PDF files are exported to the SharePoint site. As seen highlighted in yellow, the extracted values for the mapped Data Fields populate the corresponding column values in SharePoint.

2021-cmis-export-how-to-data-mapping-field-mapping-15.png

Box

  1. For the CMIS Repository, we've selected our Box CMIS Repository.
  2. Using the Target Folder property, we've selected a subfolder location.
  3. For Box connections, metadata templates must be manually selected using the Secondary Types property.
    • You can select any metadata template created in your Box account. It's up to you to choose the right one that corresponds to the fields extracted by your Data Model.
    • In our case, we've selected the "Purchase Order" metadata template, seen earlier in this tutorial.
  4. We've also added a PDF Format to export a PDF file for our Export Format.
  5. Next, we will configure our data mappings using the Write Mappings property.

2021-cmis-export-how-to-data-mapping-field-mapping-16.png

  1. These mappable properties, "PO Number", "PO Date", "Vendor" and "Total", are the editable metadata fields in the metadata template we selected using the Secondary Types property.

Now the process is essentially the same as it was for our SharePoint example. Map the Grooper Data Fields to the corresponding storage location property.

There's also a shortcut we can use through Grooper's "Auto Map" feature. Auto-mapping will look for Grooper Data Fields and storage location property names that match. If they match, Grooper will automatically assign the mapping, without you having to select it from a dropdown.

  1. Right click any property and select "Auto Map...".

2021-cmis-export-how-to-data-mapping-field-mapping-17.png

  1. Three out of five of the Box metadata field names match our Grooper Data Field names.
    • All three are automatically populated by the Auto Map feature.

2021-cmis-export-how-to-data-mapping-field-mapping-18.png

  1. That just leaves the remaining two properties (Cmis Name and PO Date) to be mapped manually.

2021-cmis-export-how-to-data-mapping-field-mapping-19.png

When the Export activity processes the "Purchase Order" Batch Folders, the PDF files are exported to the selected Box folder.

  1. Viewing a document in Box, you can view the exported metadata by pressing the "Metadata" icon.
  2. The extracted values for the mapped Data Fields populate the corresponding metadata fields, using the "Purchas Order" metadata template.

2021-cmis-export-how-to-data-mapping-field-mapping-20.png

ApplicationXtender

  1. For the CMIS Repository, we've selected our AppXtender CMIS Repository.
  2. Using the Target Folder property, ensure a folder location is selected.
    • You may need to manually select the root of the connected repository, if this property value is blank.
  3. This auto-populates the Object Type property.
    • This gives Grooper an object we can map metadata properties to, ultimately exporting data to the custom metadata fields in the AX application cabinet.
  4. We've also added a PDF Format to export a PDF file for our Export Format.
  5. Next, we will configure our data mappings using the Write Mappings property.

2021-cmis-export-how-to-data-mapping-field-mapping-21.png

  1. Here, we map the AX data fields to the Grooper Data Fields just as described for the previous two examples, which will export the Grooper extracted Data Field values to their corresponding field in the AX cabinet.
FYI As mentioned in the Folder Pathing and File Naming tutorial, the AppXtender binding does not have a mappable Cmis Name property. However, users who set up AX application cabinets will often add a "TITLE" field (or "DOCUMENT TITLE" or "DOC NAME" or some other variation).

You can use expression based mappings for any metadata property. So, if there is a "TITLE" field (as is the case here) you can use an expression to generate a custom title. We used the exact same expression here we used for the SharePoint and Box examples' Cmis Name mapping.

2021-cmis-export-how-to-data-mapping-field-mapping-22.png

When the Export activity processes the "Purchase Order" Batch Folders, the PDF files are exported to the AX cabinet. As seen highlighted in yellow, the extracted values for the mapped Data Fields populate the corresponding ApplicationXtender fields.

2021-cmis-export-how-to-data-mapping-field-mapping-23.png


Click Me to Return to the Top