Azure DI OCR (OCR Engine)

From Grooper Wiki

Azure DI OCR is Grooper’s integration with Microsoft’s Azure Document Intelligence service, enabling cloud-based optical character recognition (OCR) and document analysis. This feature allows organizations to leverage Azure’s advanced machine learning models for extracting text, layout, and semantic data from a wide variety of documents, including both machine print and hand print.

What is Azure DI OCR?

Azure DI OCR is an OCR Engine in Grooper that leverages text recognition models in an Azure Document Intelligence service.

  • It is designed to recognize machine print and handwritten text from scanned documents, forms and images.
  • It recognizes layout data, such as lines, checkboxes and barcodes, innately (no IP Profile necessary).
  • Multiple languages are supported.
  • Azure DI OCR further improves Document Intelligence's results by aligning and correcting OCR results with Grooper's internal OCR engines.

Connecting Grooper to a Document Intelligence resource in Azure

There are two primary steps required to connect Grooper to Azure Document Intelligence.

1. Create an Azure Document Intelligence resource
Azure Document Intelligence is a cloud-based service and must be provisioned in the Azure portal before it can be used by Grooper. Instructions for creating a Document Intelligence resource are available in Microsoft’s Create a Document Intelligence resource article.
2. Add and configure an Azure Document Intelligence repository option in Grooper
Once the Document Intelligence resource is available in Azure, Grooper can be connected to it using the Azure Document Intelligence repository option. Configuration is straightforward and requires the service’s API key and resource name, both of which can be obtained from the Azure portal.

Purpose and Benefits

The main purpose of Azure DI OCR is to provide robust, cloud-powered OCR for document-centric workflows. Key benefits include:

  • High Accuracy: Azure’s models are trained on vast datasets, improving recognition of diverse fonts, layouts, and languages.
  • Hand Print Support: In addition to machine print, Azure DI OCR can extract handwritten text, expanding use cases to forms and notes.
  • Scalability: Cloud-based processing allows for rapid, parallel analysis of large document batches.
  • Advanced Layout Detection: Beyond text, Azure DI OCR can detect lines, checkboxes, and barcodes, supporting complex extraction scenarios.
    • Azure DI OCR offers the best OMR checkbox detection in Grooper to date.

About Document Intelligence models

Azure Document Intelligence differs greatly from traditional OCR engines. It utilizes a combination of AI models and techniques working together, including CNNs (convolutional neural networks) and computer vision, to turn raw images into structured text data.

Grooper's initial Document Intelligence integration focuses on two prebuilt models.

The Read model (prebuilt-read)

Purpose: High-quality text extraction

What it does:

  • Extracts printed and handwritten text
  • Preserves language, font style, and confidence scores
  • Supports many languages
  • Detects barcodes (when the "barcode extraction" add-on feature is enabled)

Benefits:

  • Cost effective
  • Best model to use if you just need text

The Layout model (prebuilt-layout)

Purpose: High-quality text extraction + Document structure analysis

What it does:

  • Everything Read does plus:
  • Analyzes a document's structure for use by the DI Layout quoting method
    • Detects paragraphs, titles, headers, footers
    • Identifies tables, rows, and columns
    • Keeps reading order
    • The Azure DI OCR engine does not store structure analysis data. This information is only generated and returned by the DI Analyze activity.
  • Locates OMR checkboxes (Azure calls these "selection marks")

Benefits:

  • Best model to use with DI Analyze for DI Layout injection
  • Best model to use for checkbox detection

Cost:

  • The Read model is substantially cheaper than the Layout model.
  • At the time this article was written, the Layout model costs about $0.01 per page.
  • Azure's full Document Intelligence pricing model is found here.

Document structure analysis

  • The Azure DI OCR engine does not store structure analysis data. This information is only generated and returned by the DI Analyze activity.

Layout-enabled models (such as prebuilt-layout) go beyond simple text recognition and perform structure analysis, which focuses on understanding how information is organized on a page, not just what the text says. Using computer vision, these models analyze spatial layout, reading order, and visual cues to identify:

  • Paragraphs
  • Headers and footers
  • Lists
  • Tables
  • Form field groupings (when the "key-value pair" add-on feature is enabled)

These structural relationships are critical for accurate downstream data extraction during AI Extract. The DI Layout quoting method passes this structural information to a large language model (LLM), providing essential spatial context for more accurate Data Model extraction.

Add-on features

Microsoft's full documentation on Document Intelligence add-ons can be found here.

Microsoft has developed several optional capabilities for its Document Intelligence service. These features can be enabled or disabled to meet the needs of specific document scenarios.

The Document Intelligence add-ons are enabled in the Features properties in Grooper.

In Grooper, these add-ons can be enabled in the Azure DI OCR engine's set of Features properties.

  • Not all models can utilize every add-on. For more information see Azure's model overview documentation. Add-on features unsupported by the Read model are documented below.

Is there a cost to using add-on features?

  • Monetary cost - Some add-ons incur additional cost in your Azure billing. Azure's full Document Intelligence pricing model is found here.
  • Processing cost - Add-on features generate additional data and/or higher quality results. This takes additional processing time in both cases.

Barcodes

Incurs additional cost in Azure? No

Azure DI OCR stores detected barcodes in the Grooper Layout Data file (Grooper.LayoutData.json) created during Recognize. This extends this data to all mechanisms Grooper uses to evaluate and return barcodes from layout data, such as the Find Barcode extractor.

Supported barcode types:

  • QR Code
  • Code 39
  • Code 93
  • Code 128
  • UPC (UPC-A & UPC-E)
  • PDF417
  • EAN-8
  • EAN-13
  • Codabar
  • Databar
  • ITF
  • Data Matrix

From Azure:

The ocr.barcode capability extracts all identified barcodes in the barcodes collection as a top level object under content. Inside the content, detected barcodes are represented as :barcode:. Each entry in this collection represents a barcode and includes the barcode type as kind and the embedded barcode content as value along with its polygon coordinates. Initially, barcodes appear at the end of each page. The confidence is hard-coded for as 1.

Languages

Incurs additional cost in Azure? No

From Azure:

Adding the languages feature to the analyzeResult request predicts the detected primary language for each text line along with the confidence in the languages collection under analyzeResult.

Key Value Pairs

Incurs additional cost in Azure? No

  • Key value pairs are not supported by the Read model.
  • Key value pairs are "structure analysis" data. The Azure DI OCR engine does not store structure analysis data. This information is only generated and returned by the DI Analyze activity.

From Azure:

Key-value pairs are specific spans within the document that identify a label or key and its associated response or value. In a structured form, these pairs could be the label and the value the user entered for that field. In an unstructured document, they could be the date a contract was executed on based on the text in a paragraph. The AI model is trained to extract identifiable keys and values based on a wide variety of document types, formats, and structures.

Keys can also exist in isolation when the model detects that a key exists, with no associated value or when processing optional fields. For example, a middle name field can be left blank on a form in some instances. Key-value pairs are spans of text contained in the document. For documents where the same value is described in different ways, for example, customer/user, the associated key is either customer or user (based on context).

OCR High Resolution

Incurs additional cost in Azure? Yes

This can be used on larger documents (such as engineering drawings) and high resolution documents to improve OCR accuracy. It has also shown to improve OCR accuracy on "normal" documents as well, but it does incur an additional cost from Azure.

From Azure:

The task of recognizing small text from large-size documents, like engineering drawings, is a challenge. Often the text is mixed with other graphical elements and has varying fonts, sizes, and orientations. Moreover, the text can be broken into separate parts or connected with other symbols. Document Intelligence now supports extracting content from these types of documents with the ocr.highResolution capability. You get improved quality of content extraction from A1/A2/A3 documents by enabling this add-on capability.

Style Font (font recognition)

  • Font recognition is "structure analysis" data. The Azure DI OCR engine does not store structure analysis data. This information is only generated and returned by the DI Analyze activity.

From Azure:

The ocr.font capability extracts all font properties of text extracted in the styles collection as a top-level object under content. Each style object specifies a single font property, the text span it applies to, and its corresponding confidence score. The existing style property is extended with more font properties such as similarFontFamily for the font of the text, fontStyle for styles such as italic and normal, fontWeight for bold or normal, color for color of the text, and backgroundColor for color of the text bounding box.

Query Fields

Incurs additional cost in Azure? Yes

UNDER DEVELOPMENT: This parameter is present in the Grooper property grid but is not fully implemented. Enabling this feature will do nothing at this time.

==

How Azure DI OCR Utilizes the OCR Data Aligner

A unique feature of Grooper’s Azure DI OCR implementation is the use of the "OCR Data Aligner". This component coordinates the alignment and correction of OCR results between Azure Document Intelligence and Grooper’s internal OCR engines (such as Transym or Tesseract).

The OCR Data Aligner performs several key functions:

  • Alignment: It matches and merges text segments from Azure with those recognized by Grooper’s traditional OCR engines, improving consistency and accuracy.
  • Correction: When Azure and Grooper disagree on a segment, the aligner uses configurable vocabulary, preferred patterns (defined by regular expressions), and confidence thresholds to select the best result.
  • Diagnostics: The aligner provides advanced annotation and logging, helping users review and troubleshoot recognition results.
  • Layout Data Augmentation: It supplements the layout data with additional features detected by Azure, such as checkboxes and barcodes, ensuring comprehensive coverage.

By leveraging the OCR Data Aligner, Grooper ensures that the final OCR output is both accurate and contextually appropriate, even in challenging scenarios where multiple engines produce conflicting results.