2021:CMIS Export (Export Definition): Difference between revisions

From Grooper Wiki
Line 16: Line 16:


Upon connecting to an external content management system, Grooper will be able to see the "repositories" associated with it.  A repository, in computer science, is a general term for a location where data lives.  Different systems refer to "repositories" in different ways.  An email inbox could be a repository.  A folder in Windows could be a repository.  A cabinet in ApplicationXtender could be a repository.  It's a place to put things.  We standardize the various terms used by various storage platforms to simply "repository".   
Upon connecting to an external content management system, Grooper will be able to see the "repositories" associated with it.  A repository, in computer science, is a general term for a location where data lives.  Different systems refer to "repositories" in different ways.  An email inbox could be a repository.  A folder in Windows could be a repository.  A cabinet in ApplicationXtender could be a repository.  It's a place to put things.  We standardize the various terms used by various storage platforms to simply "repository".   
[[File:2021-cmis-connection-about-01.png|right]]


These repositories are "imported" into Grooper as a '''CMIS Repository''' object, as a child of the '''CMIS Connection''' object.  This doesn't import data into Grooper in the traditional sense of importing documents into a new '''Batch'''.  "Importing" here is more like bringing the repository into a framework Grooper can use.  Upon importing the repository, Grooper has full file access to that location in the storage platform.
These repositories are "imported" into Grooper as a '''CMIS Repository''' object, as a child of the '''CMIS Connection''' object.  This doesn't import data into Grooper in the traditional sense of importing documents into a new '''Batch'''.  "Importing" here is more like bringing the repository into a framework Grooper can use.  Upon importing the repository, Grooper has full file access to that location in the storage platform.

Revision as of 13:34, 28 September 2021

CMIS Export is one of the Export Types available when configuring an Export Behavior. It exports content over a CMIS Connection, allowing users to export documents and their metadata to various on-premise and cloud-based storage platforms.

CMIS Connections allow Grooper to standardize most, if not all, export configuration for a variety of storage platforms. This object can connect Grooper to both cloud based storage platforms, such as true CMIS content management systems, a Microsoft OneDrive account, or an Online Exchange email server, as well as on-premise platforms, such as a Windows file system or an on-premise Exchange server. It standardizes access to these platforms by exposing connectivity as if they were CMIS endpoints using the CMIS standard.

The CMIS Connection connects to an individual platform using a CMIS Binding, which defines the logic required for document interchange between Grooper and the storage platform. For example, the NTFS binding is used to connect to a Windows file system for import and export operations.

CMIS Export allows for the most advanced types of document export. It allows you to utilize document metadata and data Grooper extracts for export in a variety of ways. Many content management systems allow for document storage as well as storing metadata in fields in the storage platform. For applicable CMIS Bindings, CMIS Export document metadata and extracted data can be mapped to corresponding locations within the content management system, mapping a connection between objects or properties in a Content Model within Grooper (such as Data Fields in a Data Model) and their corresponding locations in the content management system (such as a column in a SharePoint site). Even for simpler platforms (like an NTFS file system) metadata can be used for file name and folder indexing.


About CMIS and CMIS Connections

CMIS stands for "Content Management Interoperability Services".  It is an open standard that allows different content management systems to inter-operate over the Internet.  This standard protocol allows Grooper to use many different platforms for importing and exporting documents and their contents.  Once a CMIS Connection is created, Grooper can exchange documents with these platforms. "Interoperability " means Grooper has the same access to control the system as a human being does. It is a "one-to-one" connection to the platform, allowing full and total control.

Upon connecting to an external content management system, Grooper will be able to see the "repositories" associated with it.  A repository, in computer science, is a general term for a location where data lives. Different systems refer to "repositories" in different ways.  An email inbox could be a repository. A folder in Windows could be a repository. A cabinet in ApplicationXtender could be a repository. It's a place to put things. We standardize the various terms used by various storage platforms to simply "repository".

These repositories are "imported" into Grooper as a CMIS Repository object, as a child of the CMIS Connection object. This doesn't import data into Grooper in the traditional sense of importing documents into a new Batch. "Importing" here is more like bringing the repository into a framework Grooper can use. Upon importing the repository, Grooper has full file access to that location in the storage platform.

For our purposes, repositories are like filing cabinets full of documents.  Once a connection is established, it's like giving Grooper a key to that cabinet.  You can open the various drawers of that cabinet. You can pull out files and put files into. The storage platform or content management system is like the cabinet.

  • The CMIS Connection object is like the key.
  • The CMIS Repository object is like a drawer in the cabinet.
  • You "connect" to the cabinet by turning the key. You "import" the repository by opening the drawer. Now you can see there are documents in there! You can take them out. You can read them and put them back in. You can put new ones in. You can use this "open" connection to the "drawer" however you need.

CMIS+ Architecture

Grooper expanded on this idea in version 2.72 to create our CMIS+ architecture. CMIS+ unifies all content platforms under a single framework as if they were traditional CMIS endpoints. Prior to version 2.72, there was only one type of CMIS Connection, a true CMIS connection using CMIS 1.0 or CMIS 1.1 servers. Now, connections to additional non-CMIS document storage platforms can be made via "CMIS Bindings". This provides standardized access to document content and metadata across a variety of external storage platforms.

Using this architecture, Grooper is able to create a simpler and more efficient import and export workflow, using a variety of storage platforms. You now use the CMIS Import Import Provider and the CMIS Export Export Type, regardless of the storage platform. They connect to a CMIS Repository imported from a CMIS Connection and use that as Grooper's import or export path.

How you create a CMIS Connection only differs from CMIS Binding to CMIS Binding, as each binding has a different way of connecting to it. You don't connect to an Outlook inbox the same way you connect to a Windows file folder, for example. Thus, the property configuration for the Exchange binding is different from the NTFS binding.

CMIS Bindings

A CMIS Binding provides connectivity to external storage platforms for content import and export. Grooper's CMIS+ architecture expands connectivity from traditional CMIS servers to a variety of on-premise and cloud-based storage platforms by exposing connections to these platforms as CMIS Bindings.

Each individual CMIS Binding contains the settings and logic required to exchange documents between Grooper and each distinct platform. For example, the AppXtender Binding contains all the information Grooper uses to connect to the ApplicationXtender content management system.

CMIS Bindings are used when creating a CMIS Connection object. The first step to creating a CMIS Connection is to configure the Connection Type property. Which binding you use (and therefore which platform you connect to) is set here. First, the user selects which CMIS Binding they want to use, selecting which storage platform they want to connect to. The second step is to enter the connection settings for that binding, such as login information for many bindings.

Current CMIS Bindings

Grooper can connect to the following storage platforms using below using CMIS Bindings:

  • AppXtender- Defining connection to the ApplicationXtender document management platform.
  • Box - Defining connection to the Box.com cloud storage platform.
  • FileBound - Defining connection to the FileBound document management platform.
  • IBM FileNet Connector - Defining connection to the FileNet content management platform.
  • CMIS - Defining connection to any content management systems using CMIS 1.0 or CMIS 1.1 servers.
  • The following Microsoft content platforms
    • Exchange - Defining connection to the Microsoft Exchange mail server platform (i.e. Outlook mailboxes).
    • OneDrive - Defining connection to the OneDrive cloud storage platform.
    • SharePoint - Defining connection to Microsoft SharePoint sites.
  • FTP - Defining connection to an FTP (File Transfer Protocol) server.
  • SFTP - Defining connection to an SFTP (SSH File Transfer Protocol) server.
  • IMAP - Defining connection to IMAP mail servers.
  • NTFS - Defining connection to the Microsoft Windows file system.

How To

Prereqs: Understanding the Content Model and Documents Used in These Tutorials

You may download and import the file below into your own Grooper environment (version 2021). This contains a Batch with the example document(s) discussed in this tutorial and a Content Model configured according to its instructions.

  • [[]]

In the following "how to" tutorials, we will use a simple Content Model used for purchase order and invoice processing. First, we should familiarize ourselves with the Content Model and some of the documents. Understanding our content, both in terms of the documents themselves as well as the Content Type and Data Model hierarchy of the Content Model will make it easier to follow along with the subsequent tutorials.

The Documents

In our sample test Batch, we have a series of documents we ultimately want to export in one way or another, using CMIS Export. In this Batch you will find the following kinds of documents:

  1. If you've imported the zip for this tutorial, you will find a sample test Batch nagivating the following path in the Node Tree:
    • Root Node > Batch Processing > Batches > Test > Export Activity > Sample Export Batch - POs and Invoices

The Content Model

These documents are modeled by our example Content Model named "Export Example Model - POs and Invoices".

  1. If you've imported the zip for this tutorial, you will find a sample test Batch nagivating the following path in the Node Tree:
    • Root Node > Content Model > Export Activity > Export Example Model - POs and Invoices

Our document set is represented by the Content Type hierarchy of our Content Model.

  1. The invoices from various vendors are modeled by the "Invoice" Document Type.
  2. The purchase orders are modeled by the "Purchase Order" Document Type.
  3. All the different pricing letters, notifying a vendor of a price increase, decrease or promotional types, are child Document Types of the "Price Letters" Content Category.
  4. The "Price Decrease Letter", "Price Increase Letter", and "Promo Price Letter" Document Types model these corresponding kinds of price letters.

The Data Model Hierarchy

Part of modeling a document set with a Content Model and its component Document Types is modeling the data elements you wish to extract. This is done with one or more Data Models in the Content Model's hierarchy.

Any Content Type can have a child Data Model. Data you wish to extract is defined by adding child Data Elements to the Data Model, such as Data Field and Data Table objects. These objects are then configured with extractors to parse a Batch Folder's text data and return a value, stored as the document's index data when the Extract activity is executed.

Ultimately, understanding a Document Type's Data Model and how it inherits Data Elements from parent Content Types will be critical for configuring CMIS Export (and truly, any Export Type). We can use extracted data in a variety of ways from document folder pathing and naming to mapping extracted data to storage locations in content management systems (for those that support it). Understanding how the data flows through a Content Model's Content Type and thus Data Model hierarchy is necessary to understand how to call it out later on down the line during export.

  1. In our case the parent Content Model has a Data Model with a single child Data Field named "Document Date"
    • Extraction logic is already configured for this Data Field to return a date for any of our documents, such as an invoice date for our invoices or the letter date for our pricing letters.

All child Document Types will inherit the Data Elements of their parent Content Type's Data Model. This means all Document Types will extract the "Document Date" Data Field when the Extract activity runs.

  1. For example, the "Invoice" Document Type has its own Data Model, with its own Data Elements.
    • These are various Data Fields and a Data Table (the one named "Invoice Line Items") that only relate to invoices.
  2. This kind of data is specific to invoices and will only be extracted if a Batch Folder is assigned the "Invoice" Document Type during extraction.
    • We can see these Data Elements in the Data Model preview panel with the Data Model selected in the Node Tree.
  3. However, the "Invoice" Document Type's Data Model inherits any Data Elements from any parent Content Type.
    • In this case, the "Document Date" Data Field is inherited from the parent Content Model's Data Model.
    • This Data Field also shows up in the "Invoice" Data Model's preview panel. In essence, it becomes a part of the "Invoice" Document Type's Data Model.

  1. Similarly, the "Purchase Order" Document Type has its own child Data Model with its own Data Elements relating just to purchase orders.
    • We only want these Data Elements extracted if the Batch Folder is classified as the "Purchase Order" Document Type.
  2. However, it too is a child of the parent Content Model. As such, it inherits the Content Model's Data Model as well.
    • So, it too has a "Document Date" Data Field as part of its Data Model.
FYI You may have noticed the "Invoice" Document Type and "Purchase Order" Document Type both have a "PO Number" and "Vendor" Data Field in their Data Models.

Be aware, these are two separate objects in the Node Tree. They have different extractors extracting their data. These Data Fields extract data in different ways depending on the Batch Folder's Document Type. They just happen to share the same name.

However, they are in totally different locations in the Content Model's hierarchy, and thus are distinct objects.

  1. For the three different kinds of pricing letters, a Data Model is added to the "Price Letters" Content Category.
    • This Data Model has its own pricing letter related Data Elements.
  2. Remember, Data Elements flow through a Content Model's Content Type hierarchy. The "Price Letters" Content Category is the parent Content Type of the three pricing letter Document Types ("Price Decrease Letter", "Price Increase Letter", and "Promo Price Letter").
  3. As such, any Batch Folder assigned one of these three Document Types will inherit the Content Category's Data Elements for its Data Model.
    • Furthermore, for our made up use case here, the individual pricing letter Document Types don't have their own Data Models. We don't need them! For each of these three Document Types we want to extract the same set of data. The parent "Price Letter" Content Category's Data Model will apply to all three Document Types. Creating a unique Data Model for each Document Type would be a waste of time, in our case.
  4. Not only that any grandparent Data Elements are inherited as well, such as the top level Content Model's "Document Date" Data Field.


Perform a Basic CMIS Export

CMIS Exports can range from very simple exports of Batch Folder content, to more complex exports, utilizing Grooper extracted content in a variety of ways. We will start with the most basic configuration of a CMIS Export. These steps will be applicable to any CMIS Export.

Establish the CMIS Connection and CMIS Repository

Before configuring a CMIS Export, you must have created a CMIS Connection and imported a CMIS Repository. For more information on how to create a CMIS Connection and import a CMIS Repository refer to the CMIS Connection article.

For this example, we will simply export to a Windows folder on a local drive.

  1. We have created a CMIS Connection using the NTFS Connection Type
  2. We have imported a CMIS Repository connecting Grooper to a folder named "Grooper Import Export".
  3. And we will be exporting to this folder named "Export".

TBD

Configure an Export Behavior

CMIS Export is one of the Export Type options when configuring an Export Behavior. Export Behaviors control what document content for a Batch Folder is exported where, according to its classified Document Type. As such, in order to configure a CMIS Export, you must first configure an Export Behavior for a Content Type (a Content Model or its child Content Categories or Document Types).

Export Behaviors can be configured in one of two ways:

  1. Using the Behaviors property of a Content Type object
    • A Content Model
    • A Content Category
    • Or, a Document Type
  2. As part of the Export activity's property configuration

Option 1: Content Type Export Behaviors

An Export Behavior configuration can be added to any Content Type object (i.e. Content Models, Content Categories, and Document Types) using its Behaviors property. Doing so will control how a Document Type "behaves" upon export.

  1. For example, here we have a Content Model selected in the Node Tree.
  2. To add an Export Behavior, first select the Behaviors property.
  3. Then, press the ellipsis button at the end of the property.

  1. This will bring up the Behaviors collection editor window.
  2. Press the "Add" button.
  3. Select Export Behavior.
    • You can only configure one Export Behavior per Content Type object.
    • Children Content Type objects will inherit export settings from their parent Content Type's Export Behavior configuration.
    • However, multiple Export Behaviors may be added by configuring the Behaviors property of multiple Content Types. For example, if every Document Type needed a unique Export Behavior configuration, you could configure the Behaviors property for each one, adding one Export Behavior to the Behaviors list for each one.

  1. You will see the Export Behavior added to the Behaviors list.
  2. Selecting it, you can now add one or more Export Definitions with the Export Definitions property.


FYI When configured using the Behaviors property of a Content Type object, the Export activity will export Batch Folder content in a Batch according to the Export Definition settings configured for the Batch Folder's assigned Document Type
  • Or its parent Content Category or parent Content Model depending on which Content Type's Behavior property is configured in the Content Model's hierarchy.
  • Option 2: Export Activity Export Behaviors

    Export Behaviors can also be configured as part of the Export activity's configuration. These are called "local" Export Behaviors. They are local to the Export activity in the Batch Process.

    1. For example, here we have a working Batch Process selected in the Node Tree.
      • This is a simple Batch Process used to import purchase order, invoice, and other related documents, recognize their text, and extract some basic data from them. The last step in this Batch Process is an Export step.
    2. Select the Export step of the Batch Process.
    3. To add an Export Behavior, select the Export Behaviors property.
    4. Then, press the ellipsis button at the end of the property.

    1. This will bring up the Export Behaviors collection editor window.
    2. Press the "Add" button to add a new Export Behavior
    3. An Export Behavior will be added to the list.
    4. With the Export Behavior selected you must define which Content Type the behavior applies to using the Content Type property.
      • Note in both cases, a Content Type is involved in configuring Export Behaviors. Whether local to the Export activity or as part of a Content Model's configuration, Grooper needs to know what to do upon export, given a certain Content Type (and its children Content Types if scoped to a Content Model or Content Category). Once Grooper knows what kind of document it's looking at, we can then inform it what to do in terms of exporting its document content.
    5. Using the dropdown menu, select which Content Type scope should utilize the Export Behavior by selecting either a top-level parent Content Model or one of its child Content Categories or Document Types.
      • Keep in mind you can only select a single Content Type here. You can only configure one Export Behavior per Content Type object.
      • Children Content Type objects will inherit export settings from their parent Content Type's Export Behavior configuration.
    6. However, multiple Export Behaviors may be added locally to the Export activity. For example, if every Document Type needed a unique Export Behavior configuration, you could add one Export Behavior to the list for each one.

    1. Once a Content Type is selected, you can add one more more Export Definitions with the Export Definitions property.

    Going forward in this tutorial, we will scope our Export Behavior to the parent Content Model "Export Example Model - POs and Invoices"

    Folder Indexing: Using the Subfolder Path Property

    Perform a Mapped CMIS Export