Email Processing

From Grooper Wiki

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.

This article seeks to provide guidance for users processing documents coming from an email inbox. Grooper can ingest email messages, condition them for further processing, and process the email's body and/or any attachments (just like any document in Grooper).

However, there are several considerations when processing documents that come in from an email source.

  1. Import considerations - Are you importing emails manually or do you want an Import Watcher service to periodically poll an import source for new emails coming in or bring them in at scheduled times?
  2. Attachment considerations - Does the email have attachments that need to be processed?
  3. Body considerations - Do you want to process the email body? If so, do you need to just process the body's text? Do you need to process the rendered HTML seen in an email client? Does the email have any images you need to process?
  4. Conditioning considerations - Based on your answers to questions 2 and 3, the Batch Process will need to be adjusted to accommodate the scenario.
    • This article will step through a Batch Process that accommodates all common scenarios. This will give you a starting point to process email content, normalizing source content for further processing.

Import considerations

When importing emails, you should use one of the two CMIS Import providers. Either Import Descendants or Import Query Results.

Of the two, Import Query Results is more common for importing email messages. This article will focus on using this provider.

Main import considerations

When configuring email import, there are three main considerations:

  • How do you want to connect to the email source? Grooper will use a "CMIS Connection" to do this.
  • Do you want to perform a user-directed (ad-hoc) import? Users will perform the import from the Imports Page.
  • Do you want to perform automated (scheduled) imports? The import will will be performed by an Import Watcher service.


Common secondary considerations

After resolving you main import considerations, there are other considerations you should evaluate too.

  • Are you going to filter the import by message properties like the sender or sent date or text in the subject line? This is where Import Query Results really shines. It will selectively import files that match criteria set by the search query. Searches are defined by a SQL-like query called a "CMISQL Query" that uses file metadata and folder hierarchies to set search parameters.
  • Are you going to dispose of the emails after importing them? If so, how? Disposing of imported files is particularly important for automated imports. Files can be "disposed" by deleting them, moving them, or updating their metadata.
  • Is "sparse import" right for you? Sparse imports can speed up import time. Instead of copying file content from the import source into the Grooper Repository, sparse import establishes links to file content when adding document folders to the Batch. However, there are other considerations besides import speed you will need to evaluate when performing a sparse import.

Creating a CMIS Connection

To use Import Query Results you will first need a CMIS Connection. A CMIS Connection is what Grooper uses to connect to external content management systems, including email clients. There are two "CMIS Connection Types" that can be used for email imports.

  • Exchange - Connects Grooper to Exchange email servers. This is used to connect to Outlook inboxes.
  • IMAP - Connects Grooper to any email client using the IMAP protocol.

The Exchange connection is both more common and more fully featured. For these reasons, we will focus on importing emails from an Outlook inbox in this article.

Step 1: Create a new CMIS Connection

  1. Right click a Project in your Grooper Repository.
  2. Select "Add", then "CMIS Connection..."
  3. In the pop-up window, name your CMIS Connection.
  4. Click "Execute" to finish.

Step 2: Configure the CMIS Connection

  1. Select the CMIS Connection and make sure you're on the "CMIS Connection" tab.
  2. Select the Connection Settings property and press its dropdown list button.
  3. Select "Exchange" from the list.
  4. Expand the Connection Settings properties and enter the Exchange server's host name or IP address in the Host Name property.
    • For Microsoft 365 Outlook users, enter outlook.office365.com
  5. Configure the Authentication Method you are using to log into the email client.
    • Exchange OAuth is the easiest and most common method.
  6. Use the Mailbox List editor to enter at least one mailbox (even if this is simply your own email address).

FYI

The Use Search Folder property will enable an Exchange "Search Folder". This will enhance the query capabilities of Import Query Results and the imported CMIS Repository's "Search" tab.

  • Grooper will automatically create a Search Folder named "Grooper Search" the first time a CMIS query is executed from Grooper.
  • The Search Folder will only be used when:
    • The content type being queried is "Message"
    • The query applies to the entire inbox (no IN_FOLDER or IN_TREE predicates are used)
    • And, the WHERE clause does not include a CONTAINS predicate.

Step 3: Tie the mailbox to Grooper by importing it as a CMIS Repository

  1. With the CMIS Connection selected and configured, press the "List Repositories" button in the upper right corner of the "Repositories" panel.
  2. Select a mailbox from the list.
  3. Press the "Import Repositories" button in the upper right corner of the "Repositories" panel.
  4. This will add a CMIS Repository object to the CMIS Connection (as a child node).
    • The CMIS Repository is a direct representation of the mailbox in Grooper.
    • Using the CMIS Repository, Grooper has total control to interact with messages, much like a user does in an email client.
    • The CMIS Repository is needed to configure the import provider used to import emails into Grooper.

User-directed (ad-hoc) email imports

User-directed (a.k.a "ad-hoc imports") imports are import jobs submitted manually by a user from the Imports Page. User-directed imports are useful for:

  • Bulk imports: When a large number of files need to be imported into Grooper all at once but only once.
  • Sporadic imports: When files need to be imported into Grooper from time to time, but not at any set schedule.

If these scenarios are right for you, you should follow the advice below to import email messages from the Imports Page.

If, on the other hand, you need to import emails regularly, either at a set schedule or immediately as they come in, you should instead follow the instructions in the #Automated (scheduled) email imports portion of this article.

Automated (scheduled) email imports

Automated (a.k.a scheduled) imports are import jobs submitted by a Grooper Import Watcher service. An Import Watcher will automatically import file content according to a predefined schedule. This schedule will be executed in one of two ways:

  • Using a "Polling Loop" - Grooper will import files from a location on a continuous loop at a set interval (every 30 seconds, every 5 mins, every 24 hours, etc.)
  • Using "Specific Times" - Grooper will import files from a location at set days and times. For example, this can be used to run the import every Monday and Wednesday at 6:00 AM.

Automated imports are useful for any scenario where files hit an import source (like an email inbox) at regular intervals, continuously. The Import Watcher allows Grooper to watch that import source at regular intervals and process incoming content continuously.

Attachment considerations

Body considerations

Conditioning considerations