Azure DI OCR (OCR Engine): Difference between revisions

Latest revision as of 16:07, 15 January 2026

Overview of Azure Document Intelligence in Grooper

Azure Document Intelligence is a cloud service from Microsoft that enables optical character recognition (OCR) and document analysis. Grooper's Azure Document Intelligence integration allows organizations to leverage Azure’s advanced machine learning models for raw text extraction, layout and structure analysis, and semantic understanding. A wide variety of document types are supported, including both machine-printed and handwritten content.

Grooper connects to an Azure Document Intelligence service by enabling and configuring the "Azure Document Intelligence" Repository Option. This option is configured on the Grooper database Root node and provides connectivity by entering an API key and resource name.

With the "Azure Document Intelligence" option added and configured, Grooper leverages the Document Intelligence service in two primary ways:

The Azure DI OCR engine – Used for text extraction and layout data collection by the Recognize activity.
The DI Analyze activity – Used for comprehensive document analysis that can be leveraged by Grooper's AI-enabled features (including AI Extract).
- This analysis produces JSON data files that are used by the DI Layout quoting method when configuring AI-enabled features.

FYI

What if I want to use both DI Analyze and Azure DI OCR? Does that mean the document gets sent to Azure twice?

No. However, DI Analyze must be run first.

Running DI Analyze generates a set of JSON files for each document and its child pages. When an OCR Profile uses Azure DI OCR, it first checks for the existence of these files. If they are found, Grooper retrieves the text and layout data from the JSON rather than making a duplicate call to the Azure Document Intelligence service.

What is Azure DI OCR?

Azure DI OCR is an OCR Engine in Grooper that leverages text recognition models in an Azure Document Intelligence service.

It is designed to recognize machine print and handwritten text from scanned documents, forms and images.
Multiple languages are supported.
Azure DI OCR further improves Document Intelligence's results by aligning and correcting OCR results with Grooper's internal OCR engines.
When configured appropriately, it recognizes checkboxes and barcodes innately (no IP Profile necessary).
- Grooper IP is still needed to detect line segments (such as table lines). However, Grooper's Line Detection runs alongside Document Intelligence thanks to Azure DI OCR's OCR Data Aligner.

Connecting Grooper to a Document Intelligence resource in Azure

There are two primary steps required to connect Grooper to Azure Document Intelligence.

1. Create an Azure Document Intelligence resource: Azure Document Intelligence is a cloud-based service and must be provisioned in the Azure portal before it can be used by Grooper. Instructions for creating a Document Intelligence resource are available in Microsoft’s Create a Document Intelligence resource article.

2. Add and configure an Azure Document Intelligence repository option in Grooper: Once the Document Intelligence resource is available in Azure, Grooper can be connected to it using the Azure Document Intelligence repository option. Configuration is straightforward and requires the service’s API key and resource name, both of which can be obtained from the Azure portal.

Purpose and Benefits

The main purpose of Azure DI OCR is to provide robust, cloud-powered OCR for document-centric workflows. Key benefits include:

High Accuracy: Azure’s models are trained on vast datasets, improving recognition of diverse fonts, layouts, and languages.
Hand Print Support: In addition to machine print, Azure DI OCR can extract handwritten text, expanding use cases to forms and notes.
Scalability: Cloud-based processing allows for rapid, parallel analysis of large document batches.
Advanced Layout Detection: Beyond text, Azure DI OCR can detect checkboxes and barcodes, supporting complex extraction scenarios.
- Azure DI OCR offers the best OMR checkbox detection in Grooper to date.

About Document Intelligence models

Azure Document Intelligence differs greatly from traditional OCR engines. It utilizes a combination of AI models and techniques working together, including CNNs (convolutional neural networks) and computer vision, to turn raw images into structured text data.

Grooper's initial Document Intelligence integration focuses on two prebuilt models.

The Read model (`prebuilt-read`)

Purpose: High-quality text extraction

What it does:

Extracts printed and handwritten text
Preserves language, font style, and confidence scores
Supports many languages
Detects barcodes (when the "barcode extraction" add-on feature is enabled)

Benefits:

Cost effective
Best model to use if you just need text

The Layout model (`prebuilt-layout`)

Purpose: High-quality text extraction + Document structure analysis

What it does:

Everything Read does plus:
Analyzes a document's structure for use by the DI Layout quoting method
- Detects paragraphs, titles, headers, footers
- Identifies tables, rows, and columns
- Keeps reading order
- The Azure DI OCR engine does not store structure analysis data. This information is only generated and returned by the DI Analyze activity.
Locates OMR checkboxes (Azure calls these "selection marks")

Benefits:

Best model to use with DI Analyze for DI Layout injection
Best model to use for checkbox detection

Cost:

The Read model is substantially cheaper than the Layout model.
At the time this article was written, the Layout model costs about $0.01 per page.
Azure's full Document Intelligence pricing model is found here.

About document structure analysis

The Azure DI OCR engine does not store structure analysis data. This information is only generated and returned by the DI Analyze activity.

Layout-enabled models (such as prebuilt-layout) go beyond simple text recognition and perform structure analysis, which focuses on understanding how information is organized on a page, not just what the text says. Using computer vision, these models analyze spatial layout, reading order, and visual cues to identify:

Paragraphs
Headers and footers
Lists
Tables
Form field groupings (when the "key-value pair" add-on feature is enabled)

These structural relationships are critical for accurate downstream data extraction during AI Extract. The DI Layout quoting method passes this structural information to a large language model (LLM), providing essential spatial context for more accurate Data Model extraction.

Add-on features

Microsoft's full documentation on Document Intelligence add-ons can be found here.

Microsoft has developed several optional capabilities for its Document Intelligence service. These features can be enabled or disabled to meet the needs of specific document scenarios.

The Document Intelligence add-ons are enabled in the Features properties in Grooper.

In Grooper, these add-ons can be enabled in the Azure DI OCR engine's set of Features properties.

Not all models can utilize every add-on. For more information see Azure's model overview documentation. Add-on features unsupported by the Read model are documented below.

Is there a cost to using add-on features?

Monetary cost - Some add-ons incur additional cost in your Azure billing. Azure's full Document Intelligence pricing model is found here.
Processing cost - All add-on features generate additional data and/or higher quality results. This takes additional processing time in both cases.

Add-on features relevant to Azure DI OCR

Azure DI OCR is an OCR engine in Grooper that uses Document Intelligence. It is used by the Recognize activity when referenced in an OCR Profile. As such, its primary focus is text and layout data.

DI Analyze is an activity in Grooper that generates a robust data file from Document Intelligence analysis. There are several add-ons that are only utilized by DI Analyze and the files it produces which are utilized by the DI Layout quoting method.

Relevant to Azure DI OCR - Enabling these features will affect results Azure DI OCR produces

Barcodes - Impacts the Grooper layout data Recognize generates (Grooper.Layout.json)
OCR High Resolution - Impacts the Grooper text data Recognize generates (Grooper.Characters.txt)

Irrelevant to Azure DI OCR - There is no need to enable these features for Azure DI OCR. These features only pertain to the data the DI Analyze activity generates.

Formulas
Key Value Pairs
Languages
Query Fields
Style Font

Barcodes

Incurs additional cost in Azure? No

Azure DI OCR stores detected barcodes in the Grooper Layout Data file (Grooper.Layout.json) created during Recognize. This extends barcode data to all mechanisms Grooper uses to evaluate and return barcodes from layout data, such as the Find Barcode extractor.

Supported barcode types:

QR Code
Code 39
Code 93
Code 128
UPC (UPC-A & UPC-E)
PDF417
EAN-8
EAN-13
Codabar
Databar
ITF
Data Matrix

From Azure:

The ocr.barcode capability extracts all identified barcodes in the barcodes collection as a top level object under content. Inside the content, detected barcodes are represented as :barcode:. Each entry in this collection represents a barcode and includes the barcode type as kind and the embedded barcode content as value along with its polygon coordinates. Initially, barcodes appear at the end of each page. The confidence is hard-coded for as 1.

OCR High Resolution

Incurs additional cost in Azure? Yes

This can be used on larger documents (such as engineering drawings) and high resolution documents to improve OCR accuracy. It has also shown to improve OCR accuracy on "normal" documents as well, but it does incur an additional cost from Azure.

From Azure:

The task of recognizing small text from large-size documents, like engineering drawings, is a challenge. Often the text is mixed with other graphical elements and has varying fonts, sizes, and orientations. Moreover, the text can be broken into separate parts or connected with other symbols. Document Intelligence now supports extracting content from these types of documents with the ocr.highResolution capability. You get improved quality of content extraction from A1/A2/A3 documents by enabling this add-on capability.

Languages

Language data is "structure analysis" data. The Azure DI OCR engine does not store structure analysis data. This information is only generated and returned by the DI Analyze activity.

Incurs additional cost in Azure? No

From Azure:

Adding the languages feature to the analyzeResult request predicts the detected primary language for each text line along with the confidence in the languages collection under analyzeResult.

Key Value Pairs

Key value pairs are not supported by the Read model.
Key value pairs are "structure analysis" data. The Azure DI OCR engine does not store structure analysis data. This information is only generated and returned by the DI Analyze activity.

Incurs additional cost in Azure? No

From Azure:

Key-value pairs are specific spans within the document that identify a label or key and its associated response or value. In a structured form, these pairs could be the label and the value the user entered for that field. In an unstructured document, they could be the date a contract was executed on based on the text in a paragraph. The AI model is trained to extract identifiable keys and values based on a wide variety of document types, formats, and structures.

Keys can also exist in isolation when the model detects that a key exists, with no associated value or when processing optional fields. For example, a middle name field can be left blank on a form in some instances. Key-value pairs are spans of text contained in the document. For documents where the same value is described in different ways, for example, customer/user, the associated key is either customer or user (based on context).

Style Font (font recognition)

Font recognition is "structure analysis" data. The Azure DI OCR engine does not store structure analysis data. This information is only generated and returned by the DI Analyze activity.

Incurs additional cost in Azure? Yes

From Azure:

The ocr.font capability extracts all font properties of text extracted in the styles collection as a top-level object under content. Each style object specifies a single font property, the text span it applies to, and its corresponding confidence score. The existing style property is extended with more font properties such as similarFontFamily for the font of the text, fontStyle for styles such as italic and normal, fontWeight for bold or normal, color for color of the text, and backgroundColor for color of the text bounding box.

Formulas

Formulas are "structure analysis" data. The Azure DI OCR engine does not store structure analysis data. This information is only generated and returned by the DI Analyze activity.

Incurs additional cost in Azure? Yes

From Azure:

The ocr.formula capability extracts all identified formulas, such as mathematical equations, in the formulas collection as a top level object under content. Inside content, detected formulas are represented as :formula:. Each entry in this collection represents a formula that includes the formula type as inline or display, and its LaTeX representation as value along with its polygon coordinates. Initially, formulas appear at the end of each page.

Query Fields

Query fields are "structure analysis" data. The Azure DI OCR engine does not store structure analysis data. This information is only generated and returned by the DI Analyze activity.

Incurs additional cost in Azure? Yes

UNDER DEVELOPMENT: This parameter is present in the Grooper property grid but is not fully implemented. Enabling this feature will do nothing at this time and is likely to throw an error.

How Azure DI OCR Utilizes the OCR Data Aligner

A unique feature of Grooper’s Azure DI OCR implementation is the use of the "OCR Data Aligner". This component coordinates the alignment and correction of OCR results between Azure Document Intelligence and Grooper’s internal OCR engines (such as Transym or Tesseract).

The OCR Data Aligner performs several key functions:

Alignment: It matches and merges text segments from Azure with those recognized by Grooper’s traditional OCR engines, improving consistency and accuracy.
Correction: When Azure and Grooper disagree on a segment, the aligner uses configurable vocabulary, preferred patterns (defined by regular expressions), and confidence thresholds to select the best result.
Diagnostics: The aligner provides advanced annotation and logging, helping users review and troubleshoot recognition results.
Layout Data Augmentation: It supplements the layout data with additional features detected by Azure, such as checkboxes and barcodes, ensuring comprehensive coverage.

By leveraging the OCR Data Aligner, Grooper ensures that the final OCR output is both accurate and contextually appropriate, even in challenging scenarios where multiple engines produce conflicting results.

@@ Line 139: / Line 139: @@
 Incurs additional cost in Azure?  '''No'''
-'''Azure DI OCR''' stores detected barcodes in the Grooper [[Layout Data]] file (Grooper.LayoutData.json) created during Recognize. This extends barcode data to all mechanisms Grooper uses to evaluate and return barcodes from layout data, such as the [[Find Barcode]] extractor.
+'''Azure DI OCR''' stores detected barcodes in the Grooper [[Layout Data]] file (Grooper.Layout.json) created during Recognize. This extends barcode data to all mechanisms Grooper uses to evaluate and return barcodes from layout data, such as the [[Find Barcode]] extractor.
 Supported barcode types:

Latest revision as of 16:07, 15 January 2026

Overview of Azure Document Intelligence in Grooper

What is Azure DI OCR?

Connecting Grooper to a Document Intelligence resource in Azure

Purpose and Benefits

About Document Intelligence models

The Read model (prebuilt-read)

The Layout model (prebuilt-layout)

About document structure analysis

Add-on features

Add-on features relevant to Azure DI OCR

Barcodes

OCR High Resolution

Languages

Key Value Pairs

Style Font (font recognition)

Formulas

Query Fields

How Azure DI OCR Utilizes the OCR Data Aligner

The Read model (`prebuilt-read`)

The Layout model (`prebuilt-layout`)