Conditioning Emails (Simple Functionality)
Introduction
This article is oriented around a "one size fits all" Batch Process designed to condition emails for standard document processing operations in Grooper.
Everything in the "Email Conditioning Process" up to "Split Pages" are steps designed to transform documents contained in or attached to an email into usable PDF documents in Grooper. Everything from Split Pages after are fairly typical steps in a Grooper Batch Process (with "Extract Email Data" being an exception. This will extract data specifically from the email file itself like the Sender and Subject).
More information on email processing can be found in the Grooper Wiki at:
Email Processing
The companion Batch in this Project has several examples of common email scenarios:
- When the document is attached to the email.
- When several documents are included in an attached ZIP file.
- When the document is an embedded image in the email's body.
- When the document is the email's body itself.
- (Least common) When the document is a "nested email". When the document is attached to an email that is itself attached to the email.
Setup for AI Extract
This portion of the article focuses on configuring Grooper’s AI Extract capability so documents can be analyzed by a Large Language Model (LLM) and mapped into a Data Model. It involves setting up an LLM Connector within the Grooper Repository and selecting an appropriate model through the Data Model’s Fill Methods.
The goal of this configuration is to enable Grooper to interpret document content and populate generic fields—such as document identifiers, dates, and party information—without relying on rigid, template-based extraction. This setup establishes the connection between Grooper and the external LLM provider, ensuring AI Extract can execute during Batch Processing.
- Select the Root node, then click the ellipsis button for the Options property to open the Options editor.
- Add an LLM Connector, then be sure to properly configure it.
- The most important configuration is choosing a service provider for the Service Provider property, and properly configuring it.
- Expand the Node Tree and select the Data Model from the provided "Conditioning Emails" Project, then click the ellipsis button for the Fill Methods property to open the "Fill Methods" editor.
- Expand the Generator sub-properties and be sure to select a desired model for the Model property.
Setup for Azure DI OCR
This section covers configuring the Azure DI OCR Profile, which is responsible for converting image-based content into machine-readable text. By supplying an Azure Computer Vision API key and matching the correct region, Grooper can leverage Azure DI's OCR engine to process scanned or image-only documents.
This step ensures that all documents—whether they contain embedded text or not—have usable text content for downstream processing. OCR output is critical not only for AI Extract, but also for search indexing, as it provides the textual data that both extraction models and search engines rely on.
- Select the Root node, then click the ellipsis button for the Options property to open the Options editor.
- In the "Options" editor, add an "Azure Document Intelligence" option, then properly configure it.
- The most important property is the API Key.
- Expand the Node Tree and right-click the "Azure OCR" OCR Profile from the provided "Conditioning Emails" Project, then select "Rename" from the pop-out menu.
- Set the New Name property to "Azure DI OCR".
- Right-click the OCR Engine property, then select "Reset" from the pop-out menu.
- Set the OCR Engine property to "Azure DI OCR".
Batch Process testing pt.1
In this section we'll step through a series of Batch Process Steps that highlight the use of the Execute activity and several of its commands.