Import Query Results (Import Provider): Difference between revisions

From Grooper Wiki
No edit summary
 
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<blockquote>{{#lst:Glossary|Import Query Results}}</blockquote>
== About ==
== About ==


'''Import Query Results''' is an Import Provider used to search a repository for documents you want to import, given whatever set of criteria you define.  This is done by performing a CMISQL query into a batch and importing the results of that query. This is similar to the pre-2.72 CMIS Import provider. 
"Import Query Results" is one of the [[CMIS Import]] providers in Grooper. It is used to import files from '''CMIS Repositories''' for Batch processing in Grooper. It will import files from a folder structure of an on-premise or cloud-based document storage platform.
 
:*<li class="fyi-bullet"> While less common, Import Query Results can also import ''folders'' from CMIS Repositories. However, since importing files is most common, we focus on importing ''files'' in this article.
{|cellspacing="5" cellpadding="10"
|-style="background-color:#f89420; color:white"
|style="font-size:14pt"|'''!'''||The NTFS, FTP, SFTP, and OneDrive bindings are not currently queryable.  Therefore, these bindings are not suitable for the "Import Query Results" provider.
|}
 
== Version Differences ==
 
2.72 has expanded query capabilities:
 
* All bindings except NTFS, FTP, SFTP and OneDrive support full text search via the <code>CONTAINS</code> predicate.  The OneDrive binding will support full-text search in the next release.
* Expanded CMISQL Syntax:  This adds support for CMISQL to provide several new features.
** <code>CONTAINS</code> predicate - Allows full text search
** <code>IN</code> predicate - Specifies a list of allowed values for a property, such as defining a file extension (i.e: <code>FileExtension IN ('.pdf', '.docx', '.xlsx'</code>). 
** <code>IS</code> predicate - Either <code>IS NULL</code> or <code>IS NOT NULL</code>
** <code>NOT</code> operator - Inverts the logic of individual predicates or a group enclosed by parentheses.  Note some CMIS Bindings have limited support for the NOT operator.
** Unlimited nesting,
 
== How To: Import Items Via Import Query Results ==
 
 
<tabs>
<tab name="Step 1">
===== Create a Batch =====
 
To use Import Query Results you must first create a batch.
* In Grooper Design Studio or Grooper Dashboard, create a new batch by pressing "Batch > CMIS Import > Import Query Results...".
 
 
[[image:001.png|center]]
 
 
</tab>
<tab name="Step 2">
===== Choose the Repository =====
 
If you have multiple CMIS repositories available, you'll need to choose the one from which you wish to import.
* Select the appropriate repository from the "Repository" property.
 
{|style="text-align:center" cellpadding="15"
|-
|Find the "Repository" property||Select your Repository
|-valign="top"
|[[image:importQueryResults002.png|center|500px]]||[[image:importQueryResults003.png|center|500px]]
|}
 
 
</tab>
<tab name="Step 3">
===== Configure Query =====
 
Now we need to write a CMIS Query
* Open the "CMIS Query" dialog by clicking the dotted box at the right side of the property
* Configure your query by either:
** Using the "Basic Search" tab, or
** Using the "CMSQL" tab
 
The settings for the "Basic Search" tab will alter the query in the "CMSQL" tab and vice versa. You can use these to get a feel for what's going on.
 
{|style="text-align:center" cellpadding="15"
|-
|colspan="2"|Find the "CMIS Query" property
|-
|colspan="2"|[[image:importQueryResults004.png|center|900px]]
|-
|The "Basic Search" tab||The "CMISQL" tab
|-valign="top"
|[[image:importQueryResults005.png|center|500px]]||[[image:importQueryResults006.png|center|500px]]
 
</tab>
<tab name="Step 4">
===== Analyze and Import =====
 
Now we can test it out.
* Pressing "Analyze" will run a search against your repository with your configured settings.
* Pressing "Import" will create a batch of documents that were imported using your configured settings.
 
Using the "Starting Step" property, you can choose from your published Batch Processes the step where your batch should be created. For example, in these screenshots, the batch will be created at the "Image Review" step of a Batch Process.
 
 
[[image:1556632834217-872.png|center|900px]]
 
 
</tab>
</tabs>
 
== Example Queries ==
 
These are samples of a query string.  They take the following general form.
 
<code>SELECT * FROM <ContentType> WHERE <Criteria> ORDER BY <Sort></code>
 
Let's break each component, or "clause", to get a better idea of how this works
 
=== SELECT ===
 
This specifies which properties are to be returned with query results.
 
If you are querying all properties the asterisk or <code>*</code> will indicate all properties should be returned.
 
For example: <code>SELECT *</code>
 
Otherwise, you will list them out separated by commas.
 
For example:  Let's say you are querying an Exchange repository and you only want to search the sender and recipients properties.  You'd type <code>SELECT Sender, ToRecipients, CcRecipients, BccRecipients</code> to limit your query to only those four properties.
 
=== FROM ===
 
This clause indicates the type of object to search for.  This will be a content type defined in the CMIS Repository.  If the content type is document based, the query result will be a CMIS Document.  If it is folder based, it will be a CMIS Folder.


The content type specified in the <code>FROM</code> clause has two jobs.  One, it defines what properties are available to the other clauses. Two, it limits the scope of the search to only objects of the type specified in the clause.


For example:  Let's say you are querying an Exchange repository and want to search email messages and not contacts or tasks or appointments. You'd type <code>FROM Message</code> to limit your query to just the Message content type.
Just like any other Import Provider, Import Descendants is used to submit "'''Import Jobs'''". Import Jobs are how Grooper brings in files from a storage location for processing. For example, it's how PDFs from a Windows folder get into Grooper or messages from an email inbox get into Grooper. When an Import Job runs, Grooper first creates a Batch and then creates a Batch Folder for each imported file. A copy of the file is attached to the Batch Folder. This becomes the Batch Folder's "attachment" and is used when applying activities like "Split Pages".
:*<li class="fyi-bullet"> When files are imported into Grooper, a link to that file is stored on the Batch Folder. This link maintains a connection between the file's source location and the document in Grooper. This link also makes "Sparse" imports possible. [[#Import Mode (and "Sparse" imports)|See below for more.]]


=== WHERE ===


This is how you define what search conditions must be met to be included in your set of returns.  Multiple conditions can be joined with the AND or OR or NOT operators.  You can change the order of operations by using nested parenthesis. Each condition is followed by a predicate.  The following is a list of predicates.  Note not every property type may be able to utilize every predicate.  For example, the Subject property on the Exchange binding cannot use the "=" operator.
Import Jobs are submitted in one of two ways:
* '''By a user from the Imports page''': Ad-hoc or "user directed" Import Jobs are submitted from the [[Imports Page]], using the "Submit Import Job" button.
* '''From an Import Watcher service''': Automated or "scheduled" Import Jobs are submitted by an '''[[Import Watcher]]''' service according to its Poling Loop or Specific Times specification.
In both cases, an "Import Descendants" can be selected and configured using using the "Provider" property.


{|cellspacing="5" cellpadding="10"
{{#lst:CMIS Import|import_query_results_and_descendants_similarities}}
|'''Predicate'''||'''Description'''||'''Example'''
|-style="background-color:#ddf5f5"
|Comparison Predicate||Specifies a condition for an individual property using comparisons, such as "equals to" or "less than".  The <code>LIKE</code> and <code>IS</code> operators are also a comparison predicates.||<code>invoice_date<'12/31/2007'</code>
|-style="background-color:#ddf5f5"
|<code>IN</code> Predicate||Specifies a list of allowed values for a property.  This list is separated by commas.||<code>FileExtension IN ('.pdf', '.docx', '.xlsx')</code>
|-style="background-color:#ddf5f5"
|<code>CONTAINS</code> Predicate||Specifies a full-text query.  You can use AND, OR and NOT operators.||<code>CONTAINS('mortgage AND payment AND NOT vehicle')</code>
|-style="background-color:#ddf5f5"
|Scope Predicate||Restricts the search scope to children or descendants of a folder||<code>IN_FOLDER(/Inbox)</code>
|}


Note: The NOT operator cannot be used with the <code>IN_FOLDER</code> or <code>IN_TREE</code> predicates.
== Prereqs: CMIS Repository ==


For example:  Let's say you are querying an Exchange repository and want to find an email which contains the words "cake" and "free" but not "birthday" in it, that was not received last Christmas Day found in the inbox folder and has attachments.  That would look something like <code>WHERE IN_FOLDER(/Inbox) AND CONTAINS('cake AND free AND NOT birthday') AND (DateTimeReceived<>'12/25/2018') AND (HasAttachments=False)</code>
A CMIS Repository allows Grooper access to files and folders within a storage platform.


=== ORDER BY ===
Because Import Query Results imports from a CMIS Repository, you can import from numerous storage platforms determined by the "CMIS Binding" used. These CMIS Bindings include:
* [[NTFS]] to connect to Windows folders
* [[FTP]] to connect to FTP directories
* [[SFTP]] to connect to SFTP directories
* [[Exchange]] to connect to Outlook inboxes
* [[SharePoint]] to connect to SharePoint sites (and document libraries)
* [[OneDrive]] to connect to OneDrive drives
* [[Box]] to connect to Box accounts
* [[AppXtender]] to connect to AppEnhancer applications


This is an optional clause which allows you to specify the order in which results are returned.  You can sort by multiple properties using a comma separated list.  Optionally, each property name may be followed by <code>ASC</code> or <code>DESC</code> to indicate ascending or descending sort direction.  The default sort direction is ascending.
Before you can import files from these platforms using Import Query Results, there's some setup required in the Grooper Design page. You must:
# Create and configure a CMIS Connection.
# Import a folder location as a CMIS Repository.


For example:  If you wanted to sort a query of an Exchange repository by both whether they have attachments and by size in descending order you would type <code>ORDER BY HasAttachments, Size DESC</code>
This will allow you to import files from folders accessed by the CMIS Repository. For information on CMIS Connections and CMIS Repositories, including how to create them in Grooper, visit the [[CMIS Connection]] page.


=== Putting It All Together ===
[https://app.supademo.com/demo/cm8ddjcr91rb92ugqp7k9phqo Click here for an interactive walkthrough.]


Let's mash all our examples together and search for email messages in the Inbox that have the words "cake" and "free"  but not "birthday" in the body, received any day besides Christmas Day.  I'm going to go ahead and search all the properties available to me, and I want to sort the results by whether the message has attachments and by size in descending order.  This would be the resulting query
== Example Import Query Results configuration ==


<code>
Import Query Results relies on a "CMIS Query" to import files from a CMIS Repository. The CMIS Query (aka CMISQL Query) uses a syntax structure similar to a SQL query. Instead of querying rows in a database based on column values, you're querying documents in a storage location based on file property and metadata values.
SELECT * FROM Message  
* The general CMIS Query format is: <code>'''SELECT * FROM''' ''<a type of document in the CMIS Repository>'' '''WHERE''' ''<according to certain search conditions>''</code>
WHERE IN_FOLDER(/Inbox) AND CONTAINS('cake AND free AND NOT birthday') AND (DateTimeReceived<>'12/25/2018')
* What "type of document" in can search for in the FROM clause is determined by the storage platform and the CMIS Binding.
ORDER BY HasAttachments, Size DESC
** Example: For the NTFS binding you select "<code>FROM File</code>" for files in a Windows folder.
</code>
** Example: For the Exchange binding you select "<code>FROM Message</code>" to search for email messages.
* In the <code>WHERE</code> clause, you can set search parameters based on file properties and metadata values called "CMIS properties". What CMIS properties are "queryable" will also depend on the the CMIS Repository and its CMIS Binding.
** Example: The Exchange binding has a queryable "Subject" property.
** Example: Fields in a Box metadata template are queryable for the Box binding.
**<li class="fyi-bullet">Which CMIS properties are queryable can be determined by (1) navigating to the CMIS Repository in the Grooper node tree (2) going to the "Types" tab (3) selecting the CMIS document type whose properties you want to inspect and (4) reviewing the "Queryable" column for each CMIS property.
* The <code>WHERE</code> clause is also used to set the folder scope, using the <code>IN_FOLDER</code> and <code>IN_TREE</code> predicates (where supported).
* More information on CMIS Queries (including unsupported query configurations for various CMIS Bindings) can be found in the [[CMIS Query]] article.


== Query Examples ==
<big>Example: Submitting Import Query Results from the Imports Page</big>


Below are a few more examples of queries.
[https://app.supademo.com/demo/cm8ergggh03ij12zdakzyt3jp Click here for a step by step walkthrough.]


{|cellspacing="5" cellpadding="10"
# Go to the Imports Page.
|'''Filter'''||'''Description'''
# Press the "New Import Job" button.
|-style="background-color:#ddf5f5"
# This brings up the "Submit Import Job" editor.
|<code>SELECT * FROM File</code>||Import all descendant files.  This will import all files in the repository without any foldering.
# Enter a description in the Description property (This is required).
|-style="background-color:#ddf5f5"
# Open the "Provider" dropdown (Press the "☰" button).
|<code>SELECT * FROM File WHERE AT_LEVEL(1)</code>||Import files which are immediate children.  This will only import files at that level, not from subsequent levels.
# Select "Import Query Results" from the dropdown list.
|-style="background-color:#ddf5f5"
# Expand the Provider settings to configure it.
|<code>SELECT * FROM Folder</code>||Import folders which are immediate children.  This will import both files and their foldering.
# Open the "Repository" node selector (Press the "☰" button).
|-style="background-color:#ddf5f5"
# Select the CMIS Repository you wish to import from.
|<code>SELECT * FROM File WHERE cmis:name MATCHES '^\d{4}-\d{2}-\d{2}'</code>||Import files with a specific naming pattern, using regular expression.
# Open the "CMIS Query" editor (Press the "..." button).
|-style="background-color:#ddf5f5"
# Enter the CMIS Query by either:
|<code>SELECT * FROM File WHERE cmis:name LIKE 'ca%'</code>||Import files with a name starting with ca.
#* Typing it into the Query Editor.
|-style="background-color:#ddf5f5"
#* Or, using the Query Editor's property grid to construct the query (The text will populate the Query Editor as you configure these properties).
|<code>SELECT * FROM File WHERE cmis:contentStreamLength > 10000</code>||Import files larger than 10,000 bytes.
#**<li class="fyi-bullet"> A more in depth explanation of the Query Editor and CMIS Queries can be found in the [[CMIS Query]] article.
# Configure the Import Mode property, as needed.
# Configure the Batch Creation settings, as needed.
# Configure the file disposition options (Delete Item, Move To Folder, or Update Properties), as needed.
# Configure any remaining Import Descendants properties, as needed.
# Press the "Submit" button when finished.
# Your Import Watcher service will pick up and execute the Import Job.

Latest revision as of 16:00, 28 May 2025

Import Query Results is one of two Import Providers that use cloud CMIS Connections to import document content into Grooper. Import Query Results imports files from a settings_system_daydream CMIS Repository that match a "CMISQL query" (a specialized query language based on SQL database queries).

About

"Import Query Results" is one of the CMIS Import providers in Grooper. It is used to import files from CMIS Repositories for Batch processing in Grooper. It will import files from a folder structure of an on-premise or cloud-based document storage platform.

  • While less common, Import Query Results can also import folders from CMIS Repositories. However, since importing files is most common, we focus on importing files in this article.


Just like any other Import Provider, Import Descendants is used to submit "Import Jobs". Import Jobs are how Grooper brings in files from a storage location for processing. For example, it's how PDFs from a Windows folder get into Grooper or messages from an email inbox get into Grooper. When an Import Job runs, Grooper first creates a Batch and then creates a Batch Folder for each imported file. A copy of the file is attached to the Batch Folder. This becomes the Batch Folder's "attachment" and is used when applying activities like "Split Pages".

  • When files are imported into Grooper, a link to that file is stored on the Batch Folder. This link maintains a connection between the file's source location and the document in Grooper. This link also makes "Sparse" imports possible. See below for more.


Import Jobs are submitted in one of two ways:

  • By a user from the Imports page: Ad-hoc or "user directed" Import Jobs are submitted from the Imports Page, using the "Submit Import Job" button.
  • From an Import Watcher service: Automated or "scheduled" Import Jobs are submitted by an Import Watcher service according to its Poling Loop or Specific Times specification.

In both cases, an "Import Descendants" can be selected and configured using using the "Provider" property.


Similarities and differences between Import Query Results and Import Descendants

Overall, "Import Descendants" is a "simpler" version of "Import Query Results".

  • We advise to use Import Query Results over Import Descendants, when possible.
    • Import Query Results can do everything Import Descendants can do and more.
    • Import Query Results has more robust file filtering capabilities. This allows for more targeted, selective imports.
    • Import Query Results is newer (and better maintained) than Import Descendants.
    • There are only a handful of scenarios where Import Descendants must be used over Import Query Results.


Similarities

  • Both providers import files from a CMIS Repository.
  • Both providers have the same Batch Creation settings.
  • Both providers are capable of "Sparse" imports by changing the "Import Mode" to "Sparse".
  • Both providers can dispose of files on import (using the "Delete Item", "Move Item", or "Update Properties")

Differences

The biggest difference is in how the providers determine which files are imported (import criteria).

  • Import Descendants will import all files from a target location. This includes all files in all subfolders if present. You can, however, set a "Base Folder" within the CMIS Repository.
  • Import Query Results will import files that match a CMIS Query. This is a specialized query language based on SQL syntax. This gives you many more options for import conditions, using a "WHERE" clause in the query. CMIS Queries also give you the capability to restrict imports to a folder location without importing files in subfolders (This is something Import Descendants cannot do).
  • Import Descendants does have an "Import Filter" it can use to set import conditions. It also uses a SQL-like syntax. However, it is not as advanced as the CMIS Queries that Import Query Results uses.


CMIS Repositories that can only use Import Descendants

Certain CMIS Bindings are not queryable using CMIS Queries. Because of this, certain CMIS Repositories cannot utilize Import Query Results. The following CMIS Repositories must use Import Descendants to import file content:

  • FTP
  • SFTP
  • NTFS (only if the directory has not been indexed by the Windows Search service or the Windows Search service is not running)


Prereqs: CMIS Repository

A CMIS Repository allows Grooper access to files and folders within a storage platform.

Because Import Query Results imports from a CMIS Repository, you can import from numerous storage platforms determined by the "CMIS Binding" used. These CMIS Bindings include:

  • NTFS to connect to Windows folders
  • FTP to connect to FTP directories
  • SFTP to connect to SFTP directories
  • Exchange to connect to Outlook inboxes
  • SharePoint to connect to SharePoint sites (and document libraries)
  • OneDrive to connect to OneDrive drives
  • Box to connect to Box accounts
  • AppXtender to connect to AppEnhancer applications

Before you can import files from these platforms using Import Query Results, there's some setup required in the Grooper Design page. You must:

  1. Create and configure a CMIS Connection.
  2. Import a folder location as a CMIS Repository.

This will allow you to import files from folders accessed by the CMIS Repository. For information on CMIS Connections and CMIS Repositories, including how to create them in Grooper, visit the CMIS Connection page.

Click here for an interactive walkthrough.

Example Import Query Results configuration

Import Query Results relies on a "CMIS Query" to import files from a CMIS Repository. The CMIS Query (aka CMISQL Query) uses a syntax structure similar to a SQL query. Instead of querying rows in a database based on column values, you're querying documents in a storage location based on file property and metadata values.

  • The general CMIS Query format is: SELECT * FROM <a type of document in the CMIS Repository> WHERE <according to certain search conditions>
  • What "type of document" in can search for in the FROM clause is determined by the storage platform and the CMIS Binding.
    • Example: For the NTFS binding you select "FROM File" for files in a Windows folder.
    • Example: For the Exchange binding you select "FROM Message" to search for email messages.
  • In the WHERE clause, you can set search parameters based on file properties and metadata values called "CMIS properties". What CMIS properties are "queryable" will also depend on the the CMIS Repository and its CMIS Binding.
    • Example: The Exchange binding has a queryable "Subject" property.
    • Example: Fields in a Box metadata template are queryable for the Box binding.
    • Which CMIS properties are queryable can be determined by (1) navigating to the CMIS Repository in the Grooper node tree (2) going to the "Types" tab (3) selecting the CMIS document type whose properties you want to inspect and (4) reviewing the "Queryable" column for each CMIS property.
  • The WHERE clause is also used to set the folder scope, using the IN_FOLDER and IN_TREE predicates (where supported).
  • More information on CMIS Queries (including unsupported query configurations for various CMIS Bindings) can be found in the CMIS Query article.

Example: Submitting Import Query Results from the Imports Page

Click here for a step by step walkthrough.

  1. Go to the Imports Page.
  2. Press the "New Import Job" button.
  3. This brings up the "Submit Import Job" editor.
  4. Enter a description in the Description property (This is required).
  5. Open the "Provider" dropdown (Press the "☰" button).
  6. Select "Import Query Results" from the dropdown list.
  7. Expand the Provider settings to configure it.
  8. Open the "Repository" node selector (Press the "☰" button).
  9. Select the CMIS Repository you wish to import from.
  10. Open the "CMIS Query" editor (Press the "..." button).
  11. Enter the CMIS Query by either:
    • Typing it into the Query Editor.
    • Or, using the Query Editor's property grid to construct the query (The text will populate the Query Editor as you configure these properties).
      • A more in depth explanation of the Query Editor and CMIS Queries can be found in the CMIS Query article.
  12. Configure the Import Mode property, as needed.
  13. Configure the Batch Creation settings, as needed.
  14. Configure the file disposition options (Delete Item, Move To Folder, or Update Properties), as needed.
  15. Configure any remaining Import Descendants properties, as needed.
  16. Press the "Submit" button when finished.
  17. Your Import Watcher service will pick up and execute the Import Job.