Azure Document Intelligence (Repository Option)

From Grooper Wiki

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

Azure Document Intelligence is a Repository Option that connects Grooper to an existing Document Intelligence service in an Azure environment. This enables several features that leverage Azure Document Intelligence for exceptional document analysis, including machine print and handwritten text extraction (OCR), layout data collection, and document structure analysis.

Overview of Azure Document Intelligence in Grooper

Azure Document Intelligence is a cloud service from Microsoft that provides intelligent document processing capabilities, including text extraction, layout analysis, and semantic understanding. Grooper connects to a Document Intelligence service by enabling and configuring the Azure Document Intelligence Repository Option. This is configured on the Grooper database Root node and provides connectivity by entering an API key and a resource name.

With the Azure Document Intelligence option added and configured, Grooper leverages the Document Intelligence service in two primary ways:

  • The Azure DI OCR engine - Used for text extraction and layout data collection by the Recognize activity.
  • The DI Analyze activity - Used for comprehensive document analysis that can be leveraged by Grooper's AI-enabled features (including AI Extract).
    • This analysis results in a JSON data file that is used by the DI Layout Quoting Method when configuring AI-enabled features.

FYI

What if I want to use both DI Analyze and Azure DI OCR? Does that mean the document gets sent to Azure twice?

Nope. You just need to run DI Analyze first.

Running DI Analyze will create a set of JSON files for each document and their child pages. When an OCR Profile uses Azure DI OCR, it will first look to see if one of these files exists. If so, it will get the text data from it, rather than making a duplicate call to the Document Intelligence service.

Key similarities and differences between DI Analyze and Azure DI OCR

Similarities

Azure DI OCR and DI Analyze have several things in common.

  • Both utilize the Document Intelligence service Grooper connects to using the Azure Document Intelligence option added to the Grooper Root.
  • Both have access to the same models (although they utilize them differently).
    • Be aware, Grooper's current integration with Azure Document Intelligence has focused on using the prebuilt-read and prebuilt-layout models.
  • Both can process page images and a Batch Folder's attachment file.

Differences

While both methods utilize Azure Document Intelligence, they differ in scope, output, and intended use:

Azure DI OCR
  • Focuses on OCR (text recognition) for machine and hand print and layout data collection.
    • Using the prebuilt-read will perform text recognition only. Using the prebuilt-layout model layout data is also collected. Lines, checkboxes and (optionally) barcodes will be saved to the layout data file created by Recognize.
  • Configured as an OCR Engine within Grooper's OCR Profile.
  • Results can be used with Grooper's Value Extractions (Pattern Match, Labeled OMR, Labeled Value, etc.)
  • Aligns Azure OCR results with Grooper's internal OCR engines for enhanced accuracy.
DI Analyze
  • Performs full document analysis, extracting text, layout, style, and semantic data.
  • Enables advanced AI workflows, including LLM prompt injection and "spatial grounding" to improve document highlighting when aligning an LLM's response back to the Grooper document.
  • When run on the folder level, can be configured to prefer the folder's child pages (default) or attachment file.
  • When run on the folder level, DI layout data is saved to both the folder and its child pages.
  • Using the DI Layout Quoting Method, AI-enabled features can access results in text, markdown and HTML formats.
  • Results cannot be used with Grooper's Value Extractions (Pattern Match, Labeled OMR, Labeled Value, etc.).

Adding and configuring the Azure Document Intelligence option

You must add the Azure Document Intelligence option to the Grooper Root before using Azure DI OCR or DI Analyze. These features will not function without it.

Adding and configuring the Azure Document Intelligence option in Grooper is simple.

  1. From the Design page, go to the database Root node.
  2. Open the "Options" editor (Press the "..." button).
  3. Press the Add button (add_circle) and select "Azure Document Intelligence" from the dropdown.
  4. In the "API Key" property, enter your Document Intelligence service's API key (from your Azure portal).
  5. In the "Resource Name" property, enter the Document Intelligence service's resource name (from your Azure portal).