Conditioning Emails (Simple Functionality): Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Line 1: Line 1:
{{AutoVersion}}


{|class="download-box"
|
[[File:Asset 22@4x.png]]
|
You may download the ZIP files below for use in your own Grooper environment (version 2025). There is a Project and a Batch ZIP file.
* [https://www.bisok.com/resources/example-projects/simple-functionality/conditioning-emails/2025-Project_Conditioning-Emails.zip 2025-Project_Conditioning-Emails.zip]
* [https://www.bisok.com/resources/example-projects/simple-functionality/conditioning-emails/2025-Batch_Example-Email-Scenarios.zip 2025-Batch_Example-Email-Scenarios.zip]
|}


== Introduction ==
== Introduction ==
Line 17: Line 26:


== Setup for AI Extract ==
== Setup for AI Extract ==
This portion of the article focuses on configuring Grooper’s AI Extract capability so documents can be analyzed by a [https://en.wikipedia.org/wiki/Large_language_model Large Language Model (LLM)] and mapped into a Data Model. It involves setting up an LLM Connector within the [[Repository|Grooper Repository]] and selecting an appropriate model through the Data Model’s [[Fill Method|Fill Methods]].
This portion of the article focuses on configuring Grooper’s AI Extract capability so documents can be analyzed by a [https://en.wikipedia.org/wiki/Large_language_model Large Language Model (LLM)] and mapped into a [[Data Model]]. It involves setting up an LLM Connector within the [[Repository|Grooper Repository]] and selecting an appropriate model through the Data Model’s [[Fill Method|Fill Methods]].


The goal of this configuration is to enable Grooper to interpret document content and populate generic fields—such as document identifiers, dates, and party information—without relying on rigid, template-based extraction. This setup establishes the connection between Grooper and the external LLM provider, ensuring AI Extract can execute during Batch Processing.
The goal of this configuration is to enable Grooper to interpret document content and populate generic fields—such as document identifiers, dates, and party information—without relying on rigid, template-based extraction. This setup establishes the connection between Grooper and the external LLM provider, ensuring AI Extract can execute during Batch Processing.
Line 45: Line 54:


== Batch Process testing pt.1 ==
== Batch Process testing pt.1 ==
In this section we'll step through a series of Batch Process Steps that highlight the use of the Execute activity and several of its commands.
In this section we'll step through a series of [[Batch Process Step|Batch Process Steps]] that highlight the use of the [[Execute]] activity and several of its commands.


# Let's learn about email processing by stepping through a series of Batch Process Steps that highlight the use of the Execute activity and several of its commands. To get started, click the "Design" button to go to the Design page.
# Expand the Node Tree and select the "Expand Email Attachments & Inline Images" Batch Process Step from the provided "Conditioning Emails" Project. Notice The Activity is "Execute", the Scope is "Folder", and the Folder Level is "1".
# Expand the Node Tree and select the "Expand Email Attachments & Inline Images" Batch Process Step from the provided "Conditioning Emails" Project. Notice The Activity is "Execute", the Scope is "Folder", and the Folder Level is "1".
# The Execute Command is "Mail Message - Expand Attachments", and Expand Attachments and Expand Inline Images are set to "True". This means it is targeting dot e.m.l. files. If any dot e.m.l. file exists at Folder Level 1 and it has attachments or images embedded in its email body, they'll be expanded.
# The Execute Command is "Mail Message - Expand Attachments", and Expand Attachments and Expand Inline Images are set to "True". This means it is targeting dot e.m.l. files. If any dot e.m.l. file exists at Folder Level 1 and it has attachments or images embedded in its email body, they'll be expanded.
Line 54: Line 62:
# Click the "Submit Job" button.
# Click the "Submit Job" button.
# In the "Submit Job" window, click the "OK" button.
# In the "Submit Job" window, click the "OK" button.
# You'll be taken to the Jobs Page. Look for Progress to be "100%" and Status to be "Completed" in the "Processing" properties, then click the "Design Page" button to go back to Design.
# You'll be taken to the [[Jobs Page]]. Look for Progress to be "100%" and Status to be "Completed" in the "Processing" properties, then click the "[[Design Page]]" button to go back to Design.
# You'll see the attachments and inline images of the EML files for documents 1 through 4 are expanded and at Folder Level 2.
# You'll see the attachments and inline images of the EML files for documents 1 through 4 are expanded and at Folder Level 2.
# Select the "Expand Email Body (if no attachments)" Batch Process Step. This is an Execute activity with a Scope of "Folder" and a Folder Level of "1".
# Select the "Expand Email Body (if no attachments)" Batch Process Step. This is an Execute activity with a Scope of "Folder" and a Folder Level of "1".
Line 79: Line 87:


<div style="position: relative; box-sizing: content-box; max-height: 80vh; max-height: 80svh; width: 100%; aspect-ratio: 1.78; padding: 40px 0 40px 0;"><iframe src="https://app.supademo.com/embed/cmo1wgdev00hlyv0jylde51o9?embed_v=2&utm_source=embed" loading="lazy" title="Conditioning Emails - Batch Process Testing Pt.1" allow="clipboard-write" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<div style="position: relative; box-sizing: content-box; max-height: 80vh; max-height: 80svh; width: 100%; aspect-ratio: 1.78; padding: 40px 0 40px 0;"><iframe src="https://app.supademo.com/embed/cmo1wgdev00hlyv0jylde51o9?embed_v=2&utm_source=embed" loading="lazy" title="Conditioning Emails - Batch Process Testing Pt.1" allow="clipboard-write" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
== Batch Process testing pt.2 ==
In this section we will complete testing of the Conditioning Emails Batch Process Steps.
# Select the "Split Pages" Batch Process Step. The "Split Pages" activity is used at a Scope of "Folder" and a Folder Level of "2".
# Click the Activity Tester tab.
# Click the "Submit Job" button.
# The Batch Folders at Folder Level 2 now have child Batch Page objects.
# Select the "Recognize" Batch Process Step. The "[[Recognize]]" activity is used at a Scope of "Page".
# The "Azure DI OCR" OCR Profile is being used. Click the Activity Tester tab.
# Click the "Submit Job" button.
# The Batch Pages now have recognized text.
# Expand the Node Tree and select one of the [[Data Field|Data Fields]] within the Data Model of the "Email" Document Type. These fields are using Default Value expressions to collect the email metadata of "From", "Date", and "Subject".
# Select the "Extract Email Data" Batch Process Step. The "Extract" activity is used at a Scope of "Folder" and Folder Level of "1".
# The Default Content Type is "Email". Folder Level 1 Batch Folders will be targeted for extraction, and if they are not classified, they will be assigned the "Email" Document Type. Click the Activity Tester tab.
# Click the "Submit Job" button.
# Select the "Extract Document Data" Batch Process Step. The "Extract" activity is used at a Scope of "Folder" and a Folder Level of "2".
# Make sure the Default Content Type is set to the "Emailed Doc" Document Type. Folder Level 2 Batch Folders will be targeted for extraction, and if they are not classified, they will be assigned the "Emailed Doc" Document Type. Click the Activity Tester tab.
# Click the "Submit Job" button.
# Select the "Review" Batch Process Step. The "Review" activity is used at a Scope of "Batch".
# Click the ellipsis button for the Views property.
# Select the "Data View" View. Make sure the Processing Level is "Level2" and the Display Parents property is set to "True". One Review Job will be created for the entire Batch and will look at the data of Folder Level 2 Batch Folders. With Display Parents enabled, the extracted data of the Folder Level 1 Batch Folders will be visible as well. These will appear as tabs in the Data View.
# Click the Activity Tester tab.
# Select the Batch root folder, then click the "Test Activity" button.
# Click the "Email" tab.
# Here you can view the email metadata collected from the expressions seen earlier for the "Email" Document Type.
# Click the "Emailed Doc" tab.
# Here you can view the data collected with AI Extract for the "Emailed Doc" Document Type.
<div style="position: relative; box-sizing: content-box; max-height: 80vh; max-height: 80svh; width: 100%; aspect-ratio: 1.78; padding: 40px 0 40px 0;"><iframe src="https://app.supademo.com/embed/cmo1wghux002vup0iyhzvpu3t?embed_v=2&utm_source=embed" loading="lazy" title="Conditioning Emails - Batch Process Testing Pt.2" allow="clipboard-write" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
== For more information ==
* [[AI Extract]]
* [[Activity Processing]]
* [[Azure DI OCR]]
* [[Batch]]
* [[Batch Folder]]
* [[Batch Page]]
* [[Batch Process]]
* [[Batch Process Step]]
* [[Behaviors]]
* [[Content Model]]
* [[Data Field]]
* [[Data Model]]
* [[Design Page]]
* [[Execute]]
* [[Extract]]
* [[Fill Method]]
* [[Import Watcher]]
* [[Jobs Page]]
* [[LLM Connector]]
* [[Machine]]
* [[Node Tree]]
* [[OCR Profile]]
* [[Project]]
* [[Recognize]]
* [[Repository]]
* [[Review]]
* [[Root]]
* [[Search Page]]
* [[Split Pages]]

Revision as of 15:22, 16 April 2026

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

You may download the ZIP files below for use in your own Grooper environment (version 2025). There is a Project and a Batch ZIP file.

Introduction

This article is oriented around a "one size fits all" Batch Process designed to condition emails for standard document processing operations in Grooper.

Everything in the "Email Conditioning Process" up to "Split Pages" are steps designed to transform documents contained in or attached to an email into usable PDF documents in Grooper. Everything from Split Pages after are fairly typical steps in a Grooper Batch Process (with "Extract Email Data" being an exception. This will extract data specifically from the email file itself like the Sender and Subject).

More information on email processing can be found in the Grooper Wiki at:
Email Processing

The companion Batch in this Project has several examples of common email scenarios:

  1. When the document is attached to the email.
  2. When several documents are included in an attached ZIP file.
  3. When the document is an embedded image in the email's body.
  4. When the document is the email's body itself.
  5. (Least common) When the document is a "nested email". When the document is attached to an email that is itself attached to the email.

Setup for AI Extract

This portion of the article focuses on configuring Grooper’s AI Extract capability so documents can be analyzed by a Large Language Model (LLM) and mapped into a Data Model. It involves setting up an LLM Connector within the Grooper Repository and selecting an appropriate model through the Data Model’s Fill Methods.

The goal of this configuration is to enable Grooper to interpret document content and populate generic fields—such as document identifiers, dates, and party information—without relying on rigid, template-based extraction. This setup establishes the connection between Grooper and the external LLM provider, ensuring AI Extract can execute during Batch Processing.

  1. Select the Root node, then click the ellipsis button for the Options property to open the Options editor.
  2. Add an LLM Connector, then be sure to properly configure it.
    • The most important configuration is choosing a service provider for the Service Provider property, and properly configuring it.
  3. Expand the Node Tree and select the Data Model from the provided "Conditioning Emails" Project, then click the ellipsis button for the Fill Methods property to open the "Fill Methods" editor.
  4. Expand the Generator sub-properties and be sure to select a desired model for the Model property.

Setup for Azure DI OCR

This section covers configuring the Azure DI OCR Profile, which is responsible for converting image-based content into machine-readable text. By supplying an Azure Computer Vision API key and matching the correct region, Grooper can leverage Azure DI's OCR engine to process scanned or image-only documents.

This step ensures that all documents—whether they contain embedded text or not—have usable text content for downstream processing. OCR output is critical not only for AI Extract, but also for search indexing, as it provides the textual data that both extraction models and search engines rely on.

  1. Select the Root node, then click the ellipsis button for the Options property to open the Options editor.
  2. In the "Options" editor, add an "Azure Document Intelligence" option, then properly configure it.
    • The most important property is the API Key.
  3. Expand the Node Tree and right-click the "Azure OCR" OCR Profile from the provided "Conditioning Emails" Project, then select "Rename" from the pop-out menu.
  4. Set the New Name property to "Azure DI OCR".
  5. Right-click the OCR Engine property, then select "Reset" from the pop-out menu.
  6. Set the OCR Engine property to "Azure DI OCR".

Batch Process testing pt.1

In this section we'll step through a series of Batch Process Steps that highlight the use of the Execute activity and several of its commands.

  1. Expand the Node Tree and select the "Expand Email Attachments & Inline Images" Batch Process Step from the provided "Conditioning Emails" Project. Notice The Activity is "Execute", the Scope is "Folder", and the Folder Level is "1".
  2. The Execute Command is "Mail Message - Expand Attachments", and Expand Attachments and Expand Inline Images are set to "True". This means it is targeting dot e.m.l. files. If any dot e.m.l. file exists at Folder Level 1 and it has attachments or images embedded in its email body, they'll be expanded.
  3. Click the Activity Tester tab.
  4. Click the "Select Batch" button in the Batch Viewer, and make sure to select the provided "Example Email Scenarios" Batch.
  5. Click the "Submit Job" button.
  6. In the "Submit Job" window, click the "OK" button.
  7. You'll be taken to the Jobs Page. Look for Progress to be "100%" and Status to be "Completed" in the "Processing" properties, then click the "Design Page" button to go back to Design.
  8. You'll see the attachments and inline images of the EML files for documents 1 through 4 are expanded and at Folder Level 2.
  9. Select the "Expand Email Body (if no attachments)" Batch Process Step. This is an Execute activity with a Scope of "Folder" and a Folder Level of "1".
  10. It's using a Should Submit Expression to find Folder Level 1 Batch Folders with no child objects.The Execute Command is "Mail Message - Expand Attachments".
  11. The Body Expansion property is set to "Prefer HTML". So, if a Folder Level 1 Batch Folder has no children, but it has an attachment, it will be expanded as an HTML file. Click the Activity Tester tab.
  12. Click the "Submit Job" button.
  13. The attached email of the EML file is expanded as an HTM file at Folder Level 2.
  14. Select the "Convert HTML Body to PDF" Batch Process Step. The Execute activity is used at a Scope of "Folder" and a Folder Level of "2".
  15. The Execute Command is "HTML Document - Convert to PDF". This will take HTML files at Folder Level 2 and convert them to a PDF. This is important because Grooper cannot natively recognize the text of an HTML file. Click the Activity Tester tab.
  16. Click the "Submit Job" button.
  17. Select the "Body.htm" Folder Level 2 document of "Document (5)".You'll notice it has a PDF rendition now.
  18. Select the "Unzip ZIP Attachments" Batch Process Step. The "Execute" activity is used at a Scope of "Folder" and a Folder Level of "2".
  19. The Command is "ZIP Arcive - Unzip". If a ZIP file exists at Folder Level 2, its contents will be expanded. Click the Activity Tester tab.
  20. Click the "Submit Job" button.
  21. The contents of the ZIP attachment at Folder Level 2 of "Document (2)" are expanded at Folder Level 3.
  22. Select the "Expand Nested Email" Batch Process Step. The "Execute" activity is used at a Scope of "Folder" and a Folder Level of "2".
  23. The Command is "Mail Message - Expand Attachments". If an EML file exists at Folder Level 2 and it has attachments, they will be expanded out. Click the Activity Tester tab.
  24. Click the "Submit Job" button.
  25. The attachment of the EML file at Folder Level 2 of "Document (4)" was expanded at Folder Level 3.
  26. Select the "Remove ZIP & Nested Email Folders" Batch Process Step. The "Remove Level" activity is used at a Scope of "Folder" and a Folder Level of "1".
  27. The Level Count is "1" and the File Extension is ".eml, .zip". The target of the activity is Folder Level 1, and because of the Level Count property, it is looking one folder level within Folder Level 1. Therefore, if a Folder Level 1 Batch Folder has a child object of either EML or ZIP, it will remove the folder at Level 1. Click the Activity Tester tab.
  28. Click the "Submit Job" button.
  29. The .zip attachment of "Document (2)" that was at Folder Level 2 was removed, putting the PDFs that were at Folder Level 3 at Folder Level 2 instead. The EML attachment of "Document (4)" that was at Folder Level 2 was removed, putting its TIF attachment that was at Folder Level 3 at Folder Level 2 instead.

Batch Process testing pt.2

In this section we will complete testing of the Conditioning Emails Batch Process Steps.

  1. Select the "Split Pages" Batch Process Step. The "Split Pages" activity is used at a Scope of "Folder" and a Folder Level of "2".
  2. Click the Activity Tester tab.
  3. Click the "Submit Job" button.
  4. The Batch Folders at Folder Level 2 now have child Batch Page objects.
  5. Select the "Recognize" Batch Process Step. The "Recognize" activity is used at a Scope of "Page".
  6. The "Azure DI OCR" OCR Profile is being used. Click the Activity Tester tab.
  7. Click the "Submit Job" button.
  8. The Batch Pages now have recognized text.
  9. Expand the Node Tree and select one of the Data Fields within the Data Model of the "Email" Document Type. These fields are using Default Value expressions to collect the email metadata of "From", "Date", and "Subject".
  10. Select the "Extract Email Data" Batch Process Step. The "Extract" activity is used at a Scope of "Folder" and Folder Level of "1".
  11. The Default Content Type is "Email". Folder Level 1 Batch Folders will be targeted for extraction, and if they are not classified, they will be assigned the "Email" Document Type. Click the Activity Tester tab.
  12. Click the "Submit Job" button.
  13. Select the "Extract Document Data" Batch Process Step. The "Extract" activity is used at a Scope of "Folder" and a Folder Level of "2".
  14. Make sure the Default Content Type is set to the "Emailed Doc" Document Type. Folder Level 2 Batch Folders will be targeted for extraction, and if they are not classified, they will be assigned the "Emailed Doc" Document Type. Click the Activity Tester tab.
  15. Click the "Submit Job" button.
  16. Select the "Review" Batch Process Step. The "Review" activity is used at a Scope of "Batch".
  17. Click the ellipsis button for the Views property.
  18. Select the "Data View" View. Make sure the Processing Level is "Level2" and the Display Parents property is set to "True". One Review Job will be created for the entire Batch and will look at the data of Folder Level 2 Batch Folders. With Display Parents enabled, the extracted data of the Folder Level 1 Batch Folders will be visible as well. These will appear as tabs in the Data View.
  19. Click the Activity Tester tab.
  20. Select the Batch root folder, then click the "Test Activity" button.
  21. Click the "Email" tab.
  22. Here you can view the email metadata collected from the expressions seen earlier for the "Email" Document Type.
  23. Click the "Emailed Doc" tab.
  24. Here you can view the data collected with AI Extract for the "Emailed Doc" Document Type.

For more information