Azure DI OCR (OCR Engine)
Azure DI OCR is Grooper’s integration with Microsoft’s Azure Document Intelligence service, enabling cloud-based optical character recognition (OCR) and document analysis. This feature allows organizations to leverage Azure’s advanced machine learning models for extracting text, layout, and semantic data from a wide variety of documents, including both machine print and hand print.
What is Azure DI OCR?
Azure DI OCR is an OCR Engine in Grooper that leverages text recognition models in an Azure Document Intelligence service.
- It is designed to recognize machine print and handwritten text from scanned documents, forms and images.
- It recognizes layout data, such as lines, checkboxes and barcodes, innately (no IP Profile necessary).
- Multiple languages are supported.
- Azure DI OCR further improves Document Intelligence's results by aligning and correcting OCR results with Grooper's internal OCR engines.
Connecting Grooper to a Document Intelligence resource in Azure
There are two primary steps required to connect Grooper to Azure Document Intelligence.
- 1. Create an Azure Document Intelligence resource
- Azure Document Intelligence is a cloud-based service and must be provisioned in the Azure portal before it can be used by Grooper. Instructions for creating a Document Intelligence resource are available in Microsoft’s Create a Document Intelligence resource article.
- 2. Add and configure an Azure Document Intelligence repository option in Grooper
- Once the Document Intelligence resource is available in Azure, Grooper can be connected to it using the Azure Document Intelligence repository option. Configuration is straightforward and requires the service’s API key and resource name, both of which can be obtained from the Azure portal.
Purpose and Benefits
The main purpose of Azure DI OCR is to provide robust, cloud-powered OCR for document-centric workflows. Key benefits include:
- High Accuracy: Azure’s models are trained on vast datasets, improving recognition of diverse fonts, layouts, and languages.
- Hand Print Support: In addition to machine print, Azure DI OCR can extract handwritten text, expanding use cases to forms and notes.
- Scalability: Cloud-based processing allows for rapid, parallel analysis of large document batches.
- Advanced Layout Detection: Beyond text, Azure DI OCR can detect lines, checkboxes, and barcodes, supporting complex extraction scenarios.
- Azure DI OCR offers the best OMR checkbox detection in Grooper to date.
About Document Intelligence models
Azure Document Intelligence differs greatly from traditional OCR engines. It utilizes a combination of AI models and techniques working together, including CNNs (convolutional neural networks) and computer vision, to turn raw images into structured text data.
Grooper's initial Document Intelligence integration focuses on two prebuilt models.
The Read model (prebuilt-read)
Purpose: High-quality text extraction
What it does:
- Extracts printed and handwritten text
- Preserves language, font style, and confidence scores
- Supports many languages
- Detects barcodes (when the "barcode extraction" add-on feature is enabled)
Benefits:
- Cost effective
- Best model to use if you just need text
The Layout model (prebuilt-layout)
Purpose: High-quality text extraction + Document structure analysis
What it does:
- Everything Read does plus:
- Detects paragraphs, titles, headers, footers
- Identifies tables, rows, and columns
- Keeps reading order
- Locates OMR checkboxes (Azure calls these "selection marks")
Benefits:
- Best model to use with DI Analyze for DI Layout injection
- Best model to use for checkbox detection
Document structure analysis
Layout-enabled models (such as prebuilt-layout) go beyond simple text recognition and perform structure analysis, which focuses on understanding how information is organized on a page, not just what the text says. Using computer vision, these models analyze spatial layout, reading order, and visual cues to identify:
- Paragraphs
- Headers and footers
- Lists
- Tables
- Form field groupings (when the "key-value pair" add-on feature is enabled)
These structural relationships are critical for accurate downstream data extraction during AI Extract. The DI Layout quoting method passes this structural information to a large language model (LLM), providing essential spatial context for more accurate Data Model extraction.
- BE AWARE: The Azure DI OCR engine does not store structure analysis data. This information is only generated and returned by the DI Analyze activity.
Add-on features
How Azure DI OCR Utilizes the OCR Data Aligner
A unique feature of Grooper’s Azure DI OCR implementation is the use of the "OCR Data Aligner". This component coordinates the alignment and correction of OCR results between Azure Document Intelligence and Grooper’s internal OCR engines (such as Transym or Tesseract).
The OCR Data Aligner performs several key functions:
- Alignment: It matches and merges text segments from Azure with those recognized by Grooper’s traditional OCR engines, improving consistency and accuracy.
- Correction: When Azure and Grooper disagree on a segment, the aligner uses configurable vocabulary, preferred patterns (defined by regular expressions), and confidence thresholds to select the best result.
- Diagnostics: The aligner provides advanced annotation and logging, helping users review and troubleshoot recognition results.
- Layout Data Augmentation: It supplements the layout data with additional features detected by Azure, such as checkboxes and barcodes, ensuring comprehensive coverage.
By leveraging the OCR Data Aligner, Grooper ensures that the final OCR output is both accurate and contextually appropriate, even in challenging scenarios where multiple engines produce conflicting results.