Invoice Processing (Use Case)

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

You may download the ZIP files below for use in your own Grooper environment (version 2025). These are Project ZIP files.

This is a Batch with example email scenarios:

2025 Batch – Example Email Scenarios

This is a normal ZIP file containing multiple image based invoice examples:

Sample Invoices – Image-Based PDFs

Introduction

Invoice Processing showcases how Grooper can automate the capture, understanding, validation, and organization of invoice documents using a combination of DI OCR, data extraction, review workflows, and AI-enabled capabilities. This article demonstrates a realistic business use case that reflects how organizations process accounts payable documents in production environments.

The intention of this article is to move beyond isolated feature demonstrations and show how Grooper’s technologies work together as part of a complete invoice processing solution. Rather than focusing on a single Activity or configuration object, this guide illustrates how invoices move through a coordinated workflow—from document ingestion and recognition to structured data extraction, validation, review, and downstream use.

This use case highlights several core Grooper concepts, AI Extract, Azure DI OCR, and more. This is a one-size-fits-all approach to invoice processing.

By the end of this guide, readers will have a foundational understanding of how Grooper can be used to build an end-to-end invoice processing solution and how the platform’s modular architecture supports scalable, production-ready document automation workflows.

Setup for AI Extract

This portion of the article focuses on configuring Grooper’s AI Extract capability so documents can be analyzed by a Large Language Model (LLM) and mapped into a Data Model. It involves setting up an LLM Connector within the Grooper Repository and selecting an appropriate model through the Data Model’s Fill Methods.

The goal of this configuration is to enable Grooper to interpret document content and populate generic fields—such as document identifiers, dates, and party information—without relying on rigid, template-based extraction. This setup establishes the connection between Grooper and the external LLM provider, ensuring AI Extract can execute during Batch Processing.

Select the Root node, then click the ellipsis button for the Options property to open the Options editor.
Add an LLM Connector, then be sure to properly configure it.
- The most important configuration is choosing a service provider for the Service Provider property, and properly configuring it.
Expand the Node Tree and select the Data Model from the provided "AI Invoice Processing (File Import)" Project, then click the ellipsis button for the Fill Methods property to open the "Fill Methods" editor.
Expand the Generator sub-properties and be sure to select a desired model for the Model property.

Setup for Azure DI OCR

This section covers configuring the Azure DI OCR Profile, which is responsible for converting image-based content into machine-readable text. By supplying an Azure Computer Vision API key and matching the correct region, Grooper can leverage Azure DI's OCR engine to process scanned or image-only documents.

This step ensures that all documents—whether they contain embedded text or not—have usable text content for downstream processing. OCR output is critical not only for AI Extract, but also for search indexing, as it provides the textual data that both extraction models and search engines rely on.

Select the Root node, then click the ellipsis button for the Options property to open the Options editor.
In the "Options" editor, add an "Azure Document Intelligence" option, then properly configure it.
- The most important property is the API Key.
Expand the Node Tree and right-click the "Azure OCR" OCR Profile from the provided "AI Invoice Processing (File Import)" Project, then select "Rename" from the pop-out menu.
Set the New Name property to "Azure DI OCR".
Right-click the OCR Engine property, then select "Reset" from the pop-out menu.
Set the OCR Engine property to "Azure DI OCR".

Final setup

The final section brings all components together into a complete, operational workflow. It covers preparing the necessary services (such as Activity Processing and Import Watcher), publishing the Batch Process, and configuring document ingestion from a file system.

This Batch Process orchestrates the full pipeline: importing documents, performing OCR, executing AI Extract, pausing for user validation in Review, and finally exporting the documents and data to a file system. After export is complete, documents can be viewed in the location structured by the Export Behavior.

This portion emphasizes how individual configurations—AI Extract, DI OCR, and export—work together as a cohesive system, enabling a seamless transition from raw document ingestion to fully searchable, structured content.

Select the Machines folder node. Verify an Activity Processing and Import Watcher Service are installed and running.
- These are needed if you wish to run a Batch through production in an automated fashion by starting with an import. For our purposes, we'll be using the Batch Process Step tester tabs to check each step individually.
Expand the Node Tree and select the "Invoices Model (File Import)" Content Model, then click the ellipsis button for the Behaviors property to open the "Behaviors" editor.
Select the "Export Behavior", then click the ellipsis button for the Export Definitions property to open the "Export Definitions" editor.
Supply a fully qualified UNC path to the Target Folder property, then click the ellipsis button for the Relative Path property to open the "Relative Path" editor.
Notice the expression used. The first portion defines the base folder, subsequent variables define sub-folders, and the final variable defines the name of the files.
Back in the "Export Definitions" editor, click the ellipsis button for the Export Formats property to open the "Export Formats" editor.
Notice a searchable PDF, and JSON metadata file are used for export.
Expand the Node Tree to the Test folder of the Batches node, then add a new "Test" Batch. Add a document, or documents, you wish to test processing with.
- In this example we'll use a single document to test.
Expand the Node Tree and select the "Split Pages" Batch Process Step from the provided "Ingest and Index (File Import)" Project, then click the Activity Tester tab.
Click the "Select Batch" button in the Batch Viewer, then be sure to select the Batch you recently created.
Select the Folder Level 1 Batch Folder, or folders in the Batch Viewer, then click the "Test Activity" button.
- If you have an Activity Processing service running, you can instead use the "Submit Job" button. This will be true for all steps moving forward.
Select the "Recognize" Batch Process Step from the Node Tree, then expand the Batch Folder contents in the Batch viewer.
Select the Batch Page in the Batch Viewer, then click the "Test Activity" button.
Select the "Extract" Batch Process Step from the Node Tree.
Select the Folder Level 1 Batch Folder from the Batch Viewer, then click the "Test Activity" button.
Select the "Review" Batch Process Step from the Node Tree.
Select the Batch root from the Batch Viewer, then click the "Test Activity" button.
Review the extracted Data from the Data Viewer, then click the "Back to Design Page" button.
Select the "Export" Batch Process Step.
Select the Folder Level 1 Batch Folder in the Batch Viewer, then click the "Test Activity" button.
In the output fulder specificed you will see the sub-foldering created by the Relative Path expression. You will also see the output PDF and JSON metadata files.

Considering emails and scanning

In this final section we'll take a quick look at the other two provided sample Projects and see how their Batch Processes differ when considering email processing and scanning.

A Project similiar to the "File Import" Project is provided, but it is suited for Email processing.
Before the "Split Pages" Batch Process Step are several Batch Process Steps that are specific to email processing.
- Feel free to look at the configuration of these steps to learn more about them. Not all of these steps are needed for all types of email processing, but this is a generic Batch Process that is built as a "one size fits all" scenario. In order to use this Batch Process you'll need to use the Imports Page and leverage a CMIS Connection configured to leverage your email system.
There is also a Project provided that is specific to scanning documents.
Split Pages is not needed for this type of processing, but a Review activity with the Scan Viewer is, as well as an Image Processing activity to clean up the scanned pages.
- You'll also notice there is a Separate activity for turning the loose pages into Batch Folders. Keep in mind, in order to use the Scan Viewer, you will need Grooper Desktop installed on the system that will be doing the scanning.

More information on Email processing and Scanning can be found with these links:

For More Information