Extract Data with AI Extract (Simple Functionality)

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

You may download the ZIP file for use in your own Grooper environment (version 2025). There is a Project ZIP file.

2025 Project – Extract Data with AI Extract.zip

Introduction

This article demonstrates how to use Grooper’s AI Extract capability to automatically analyze documents and populate a Data Model using a Large Language Model (LLM). This article focuses on the core mechanics of AI-based extraction—showing how Grooper combines OCR, Content Models, Data Models, and an LLM Connector to collect structured data from unstructured documents.

The intention of this article is to provide a minimal, easy-to-understand example of AI Extract in action. The included “Generic Document” configuration uses broadly applicable fields such as document ID, document date, and party information so it can work with almost any document type. The goal is not to model a specific form, but to illustrate how AI Extract interprets document content and fills Data Fields dynamically. Readers are encouraged to modify or replace these generic fields to match their own document requirements.

This guide also demonstrates two execution methods within Grooper:

Using Batch Process Step Testers from the Design page to observe and validate each stage of processing, including OCR, Document Type assignment, and AI-driven extraction.
Using the "Upload Documents" button on the Batches Page to experience how the same process runs in a streamlined, production-style workflow.

By the end of this article, readers will understand how to configure an LLM provider, associate an LLM with a Data Model’s Fill Method, execute AI Extract within a Batch Process, and review the structured data produced by the model. This example serves as a foundational pattern that can be expanded into more advanced AI-powered document processing solutions.

Test using Batch Process Step Testers

This portion of the article demonstrates how to execute the "Generic Document Extraction" Batch Process manually from the Design Page using Grooper’s Activity Tester tabs of Batch Process Steps. This approach is intended for development, validation, and learning. It allows you to observe how each processing stage contributes to AI-driven data extraction.

Users create a test Batch, import one or more documents, and then run each activity individually—Split Pages, Recognize, Extract, and Review. This makes it possible to confirm that:

Pages are properly separated into Batch Page nodes.
Text is successfully collected through native text extraction or OCR (using the configured Azure OCR Profile).
The correct Document Type is assigned.
AI Extract executes the Data Model’s Fill Method using the selected LLM model.
Structured field values are written to the Data Model and can be inspected in the Data Viewer of the Review activity.

This method emphasizes visibility and diagnostic control. It is especially useful for verifying LLM connectivity, testing prompt behavior, confirming field population logic, and troubleshooting extraction results before publishing the process for broader use.

Test with the "Upload Documents" button

This portion of the article demonstrates running the same AI Extract workflow from the Batches page using the "Upload Documents" button. After the Batch Process is published, users can upload documents and select the "Generic Document Extraction" process from a dropdown list.

Once started, the Batch executes automatically through Split Pages, Recognize, and Extract, pausing at Review. The Activity Processing service handles each step without requiring manual interaction. Users then complete the Review step and inspect the extracted values using the Data Viewer.

This approach highlights the production experience. It reflects how a deployed AI Extract solution behaves in real-world use: users upload documents, processing runs in the background, and structured data is presented for validation. The complexity of OCR, LLM integration, and Data Model execution is encapsulated within the Batch Process, providing a simple and accessible workflow for end users.

For more information

Please review the following articles for more information on these specific topics: