2021:CMISQL Query: Difference between revisions

From Grooper Wiki
Line 315: Line 315:
#* This will negate the predicate's logic.  If the email contains the search term, it will ''not'' be retrieved.
#* This will negate the predicate's logic.  If the email contains the search term, it will ''not'' be retrieved.
|valign=top|
|valign=top|
[[File:2021-cmis-query-10.png]]
[[File:2021-cmis-query-11.png]]
|}
|}

Revision as of 10:32, 1 June 2022

CMIS Queries are utilized to search documents in CMIS Repositories and to filter documents upon import when using the Import Query Results provider.

Also called a "CMISQL query", the querying language CMIS Queries use is based on a subset of the SQL-92 grammar. Where SQL is a querying language to search and select data in a database, "CMISQL" is a querying language to search and select documents (and their metadata properties) in a storage location, represented by a CMIS Repository in Grooper.

About

CMIS Queries use clauses to retrieve content from a CMIS Repositories based on certain metadata property values, much like a SQL query uses clauses to retrieve data from a database based on column values. In both cases, a SELECT clause is used to select content based on certain conditions determined by a WHERE clause

In general the CMISQL statement takes the following form:

SELECT * FROM <Content Type>

WHERE <Conditions>

ORDER BY <Sort Criteria>

For example, if you wanted to retrieve all email messages in your Inbox folder over an Exchange CMIS Connection from a particular sender and order results by the date you received the email (oldest first), you could use the following query:

SELECT * FROM Message

WHERE (Sender LIKE '%user@example.com' AND IN_FOLDER('/Inbox'))

ORDER BY DateTimeReceived ASC

Next, let's break down this query further and look at what each clause is doing.

Clause Information

SELECT

SELECT * FROM Message

WHERE (Sender LIKE '%user@example.com' AND IN_FOLDER('/Inbox'))

ORDER BY DateTimeReceived ASC


The SELECT clause defines a set of property values to be returned with each query result. "Property values" refers to the available metadata properties for the object type selected for the From statement. For example, when querying email messages using in an Exchange CMIS Repository, there are metadata properties for the sender, the subject line, the date the email was sent, and more. All these metadata properties are part and parcel to the email message file itself.

From a practical standpoint, your CMIS Query is ultimately returning documents. Files in a digital storage location. You aren't retrieving individual metadata values, rather the whole file with all its various metadata included. To that end you will use the * character to select all property values, when crafting a CMISQL query.

You will always start your CMIS Query with SELECT * when using a CMIS Query to import documents.

FROM

SELECT * FROM Message

WHERE (Sender LIKE '%user@example.com' AND IN_FOLDER('/Inbox'))

ORDER BY DateTimeReceived ASC


The FROM clause determines what type of content you're searching for, such as a document or a folder. What type of object you select determines what metadata properties you have access to when building the conditional logic in your query's WHERE clause.

Depending on the CMIS Connection type, you will have different options to select from. Some do have simple "Document" and "Folder" object types, representing documents and folders respectively. Some have metadata properties specific to the CMIS Binding.

  • For example, the Exchange binding's Message content type has property values relating to email messages, such as Subject, Sender and DateTimeSent.
  • For example, The SharePoint binding's content types can be Document Libraries, giving you access to custom metadata properties you've created on the SharePoint site.

In our example, we wanted to query email messages in an Exchange repository. So, we selected the Message type by using FROM Message in our query.

WHERE

SELECT * FROM Message

WHERE (Sender LIKE '%user@example.com' AND IN_FOLDER('/Inbox'))

ORDER BY DateTimeReceived ASC


The WHERE clause defines what search conditions must be met for a document to be retrieved. Each condition is defined by a "predicate". You can use any metadata available to the content type (as determined by the FROM clause) in your query conditions. Multiple conditions can be joined with the AND or OR or NOT operators. You can change the order of operations by using nested parenthesis.

The following is a list of predicates:

Predicate Description Example
Comparison Predicate

Specifies a condition for an individual property using comparisons, such as "equals to" = or "less than" <.

  • The LIKE and IS operators are also a comparison predicates.
invoice_date<'12/31/2007'
In Predicate Specifies a list of allowed values for a property. This list is separated by commas. FileExtension IN ('.pdf', '.docx', '.xlsx')
Contains Predicate Specifies a full-text query.
  • You can use AND, OR and NOT operators when crafting search parameters.
CONTAINS('mortgage AND payment AND NOT vehicle')
Scope Predicate Restricts the search scope to a specific folder (IN_FOLDER) or a specific folder including its subfolders(IN_TREE) IN_FOLDER(/Inbox)
  • Note: Not every property type may be able to utilize every predicate/operators within a predicate. For example, the Subject property on the Exchange binding cannot use the = operator when crafting a Comparison Predicate.
  • Note: The NOT operator cannot be used with the IN_FOLDER or IN_TREE Scope Predicate.

In our example, we used a combination of the Comparison Predicate and Scope Predicate to only retrieve email messages from certain sender (Sender LIKE '%user@example.com') and in our inbox folder ((IN_FOLDER('/Inbox/')) Since we wanted to filter content by multiple criteria, the two conditions were joined with the AND operator.

ORDER BY

SELECT * FROM Message

WHERE (Sender LIKE '%user@example.com' AND IN_FOLDER('/Inbox'))

ORDER BY DateTimeReceived ASC

The ORDER BY clause is an optional clause which allows you to specify the order in which results are returned.  You can sort by multiple properties using a comma separated list.  Each property name may be followed by ASC or DESC to indicate ascending or descending sort direction.

  • The default sort direction is ascending. However, it is still considered best practice to include the ASC tag to properties you wish to sort in ascending order.

In our example, we wanted to sort our incoming messages by oldest received first.

CMIS Query Editor

If you are not familiar with the CMISQL syntax (or even SQL), you may find it easier to use the CMIS Query Editor, instead of writing the full query yourself. This allows you to configure the CMISQL query using a series of property grids and menu selections instead of writing out the query from scratch.

The CMIS Query Editor is accessible in two locations:

  1. When configuring the Import Query Results Import Provider's CMIS Query property.
  2. Using a CMIS Repository's "Search Repository" tab.

How to Access the CMIS Query Editor

Import Query Results > CMIS Query

Most commonly, you will use a CMISQL query to import documents using the Import Query Results provider.

FYI The Import Query Results provider is one of two Import Providers that use a CMIS Connection to bring document content into Grooper. If you need more information on Import Query Results, please visit the CMIS Import article.


  1. When configuring the import settings, you will first select which CMIS Repository you want to query by selecting one using the Repository property.
  2. Then, select the CMIS Query property.
    • Note: Some CMIS Bindings are not CMISQL queryable (For example, the FTP and SFTP CMIS Bindings). If the CMIS Repository is not queryable, you will not see this CMIS Query property.
  3. Press the ellipsis button at the end of the property to bring up the CMIS Query Editor.


This is the CMIS Query Editor. Using this interface you can craft a CMISQL query in one of two ways:

  1. You can use this property grid to configure the various clauses of the query statement.
  2. You can free-type the query in this text editor.


We will demonstrate how to use this query editor in the next section

The Search Repository Tab

You can also use the CMIS Query Editor when searching for documents in a CMIS Repository using the "Search Repository" tab.


  1. Select the CMIS Repository you want to query in the Node Tree.
  2. Navigate to the "Search Repository" tab.
    • Note: Some CMIS Bindings are not CMISQL queryable (For example, the FTP and SFTP CMIS Bindings). If the CMIS Repository is not queryable, you will not see this "Search Repository" tab.
  3. You can use this property grid to configure the various clauses of the query statement.
  4. You can free-type the query in this text editor.


We will demonstrate how to use this query editor in the next section

How to Use the CMIS Query Editor

Next, we're going to illustrate how to use the CMIS Query Editor in order to filter email messages based on certain criteria. The CMIS Query Editor's interface allows you to configure the CMISQL query based on available metadata to the CMIS Binding.

We will use the Exchange binding, which has a selection of queryable metadata for email messages, such as the email's subject, sender and date the message was received. Our query will import email messages based on the following conditions:

  1. Only messages are to be imported (as opposed to other Exchange content, such as appointments or tasks).
  2. Only messages in a certain folder will be imported.
  3. Only messages containing specific text search terms will be imported.
  4. Only messages from a certain sender will be imported.
  5. Only messages that have not been read yet will be imported.

This will get us through the basics of building a CMISQL query using the CMIS Query Editor.

The SELECT and FROM Clauses

The property grid is divided into the various clauses that form the CMISQL query. Adjusting these properties will adjust the CMISQL query.

  1. The Select property will adjust the SELECT clause.
    • By default this property is set to *.
    • This forms the start of the query SELECT *
  2. The From property will adjust the FROM clause.
    • Each CMIS Binding will default to a different content type. Generally, the default content type corresponds to a "document" in the storage location.
    • The Exchange binding's default is Item. However, we actually want to change this. We want to import email messages and filter our import using metadata specifically accessible to the Message content type.

Changing Content Types in the FROM Clause

We will use the CMIS Query Editor's property panel to edit our query going forward.

  1. Since we want to adjust the content type in our FROM clause, we will select the From property.
  2. Using the dropdown menu, we've selected Message.
  3. This automatically updates the full text of our CMISQL query.
    • SELECT * FROM Item changed to SELECT * FROM Message


As we use the property grid to further configure our query, the query's text will update automatically.

  • This gives you a method of editing the CMISQL query without free-typing the query.
  • However, you certainly can type out the full query text if you prefer not to use the property grid to enter the CMISQL query.
    • You can also go back and forth between the property grid and the text editor.

The WHERE Clause

Next, we will use the Where property to configure the various filter conditions we've set for ourselves.

  1. Select the Where property.
  2. Using the dropdown list, select the search predicate you wish to use.
    • If you are filtering based of multiple criteria. you will ALWAYS choose Predicate Collection.
    • We need to filter based on the values of multiple metadata properties, and we need to use both a Comparison Predicate and a Scope Predicate. So, we have selected Predicate Collection.

Using Predicate Collection for Multiple Search Conditions

  1. Expand the Where sub-properties to add your query criteria.
  2. The default Logical Operator is AND.
    • This is appropriate for our situation, as we want to use the AND operator. ALL conditions must be met in our case.
    • But be aware you also have access to the OR operator.
  3. Next, we will add our various query conditions using the Expressions property.
  4. Press the ellipsis button at the end to bring up the Where Predicate Expressions collection editor.

Adding Search Predicates

  1. Press the Add button to add a predicate to the list.
  2. We will start by adding a Scope Predicate to restrict the search scope to a specific folder.

The Scope Predicate

The Scope Predicate allows you to choose a folder from which you want to retrieve documents.

  1. Select the Search Scope property and press the ellipsis button at the end.
  2. This will bring up a folder browser, representing your connected CMIS Repository's hierarchical folder structure.
  3. We're going to restrict our scope to this subfolder named "Wiki" in the connected email account.
  4. Press OK to finalize your selection.
FYI

If Include Subfolders is False, the IN_FOLDER() predicate will be used.

• This will restrict the query's search to the single selected folder in the folder hierarchy.

If Include Subfolders is True, the IN_TREE() predicate will be used.

• This will restrict the query's search to the selected folder and search any of its subfolders in the folder hierarchy.

The Contains Predicate

If the storage location your CMIS Repository is connected to supports full text searching, you can use the Contains Predicate to query documents based on specific text search terms.

For example, we want to retrieve email messages pertaining to "Wiki Vitals", a specific kind of report generated nightly about the Grooper wiki and sent to my email.

To add a Contains Predicate':

  1. Press the Add button and select Contains Predicate
  2. This adds a new Contains Predicate to our list of predicates
  3. Using the Search String property, enter the text you would like to search for.
    • In our case, only messages with the text Wiki Vitals somewhere in the subject line or email body or otherwise in the message's metadata will be retrieved.
    • Notice Grooper automatically generates the property syntax for the query, CONTAINS('Wiki AND Vitals').

The NOT Operator

All predicates can be logically negated using the NOT operator. For example, most of these "Wiki Vitals" reports are daily reports. These are the emails I want to process. However, once a month, there is a monthly report. I DO NOT want to process those documents.

The only way I know it's a monthly report is the term "monthly report" is listed in the body of the email. I can use the Contains Predicate and the NOT operator to exempt these monthly reports from the query.

To logically negate a predicate:

  1. Add the predicate using the Add button.
  2. We've added another Contains Predicate.
  3. Configure the predicate's search parameters.
    • What do we not want to retrieve in this case? Messages with the text Monthly Report somewhere in their body. So, we've configured the Search String property to search for that term.
  4. Change the Not property to True.
    • This will negate the predicate's logic. If the email contains the search term, it will not be retrieved.