Azure DI OCR (OCR Engine): Difference between revisions
Dgreenwood (talk | contribs) |
Dgreenwood (talk | contribs) |
||
| Line 139: | Line 139: | ||
Incurs additional cost in Azure? '''No''' | Incurs additional cost in Azure? '''No''' | ||
'''Azure DI OCR''' stores detected barcodes in the Grooper [[Layout Data]] file (Grooper. | '''Azure DI OCR''' stores detected barcodes in the Grooper [[Layout Data]] file (Grooper.Layout.json) created during Recognize. This extends barcode data to all mechanisms Grooper uses to evaluate and return barcodes from layout data, such as the [[Find Barcode]] extractor. | ||
Supported barcode types: | Supported barcode types: | ||
Latest revision as of 15:07, 15 January 2026
Overview of Azure Document Intelligence in Grooper
Azure Document Intelligence is a cloud service from Microsoft that enables optical character recognition (OCR) and document analysis. Grooper's Azure Document Intelligence integration allows organizations to leverage Azure’s advanced machine learning models for raw text extraction, layout and structure analysis, and semantic understanding. A wide variety of document types are supported, including both machine-printed and handwritten content.
Grooper connects to an Azure Document Intelligence service by enabling and configuring the "Azure Document Intelligence" Repository Option. This option is configured on the Grooper database Root node and provides connectivity by entering an API key and resource name.
With the "Azure Document Intelligence" option added and configured, Grooper leverages the Document Intelligence service in two primary ways:
- The Azure DI OCR engine – Used for text extraction and layout data collection by the Recognize activity.
- The DI Analyze activity – Used for comprehensive document analysis that can be leveraged by Grooper's AI-enabled features (including AI Extract).
- This analysis produces JSON data files that are used by the DI Layout quoting method when configuring AI-enabled features.
|
FYI |
What if I want to use both DI Analyze and Azure DI OCR? Does that mean the document gets sent to Azure twice? No. However, DI Analyze must be run first. Running DI Analyze generates a set of JSON files for each document and its child pages. When an OCR Profile uses Azure DI OCR, it first checks for the existence of these files. If they are found, Grooper retrieves the text and layout data from the JSON rather than making a duplicate call to the Azure Document Intelligence service. |
What is Azure DI OCR?
Azure DI OCR is an OCR Engine in Grooper that leverages text recognition models in an Azure Document Intelligence service.
- It is designed to recognize machine print and handwritten text from scanned documents, forms and images.
- Multiple languages are supported.
- Azure DI OCR further improves Document Intelligence's results by aligning and correcting OCR results with Grooper's internal OCR engines.
- When configured appropriately, it recognizes checkboxes and barcodes innately (no IP Profile necessary).
- Grooper IP is still needed to detect line segments (such as table lines). However, Grooper's Line Detection runs alongside Document Intelligence thanks to Azure DI OCR's OCR Data Aligner.
Connecting Grooper to a Document Intelligence resource in Azure
There are two primary steps required to connect Grooper to Azure Document Intelligence.
- 1. Create an Azure Document Intelligence resource
- Azure Document Intelligence is a cloud-based service and must be provisioned in the Azure portal before it can be used by Grooper. Instructions for creating a Document Intelligence resource are available in Microsoft’s Create a Document Intelligence resource article.
- 2. Add and configure an Azure Document Intelligence repository option in Grooper
- Once the Document Intelligence resource is available in Azure, Grooper can be connected to it using the Azure Document Intelligence repository option. Configuration is straightforward and requires the service’s API key and resource name, both of which can be obtained from the Azure portal.
Purpose and Benefits
The main purpose of Azure DI OCR is to provide robust, cloud-powered OCR for document-centric workflows. Key benefits include:
- High Accuracy: Azure’s models are trained on vast datasets, improving recognition of diverse fonts, layouts, and languages.
- Hand Print Support: In addition to machine print, Azure DI OCR can extract handwritten text, expanding use cases to forms and notes.
- Scalability: Cloud-based processing allows for rapid, parallel analysis of large document batches.
- Advanced Layout Detection: Beyond text, Azure DI OCR can detect checkboxes and barcodes, supporting complex extraction scenarios.
- Azure DI OCR offers the best OMR checkbox detection in Grooper to date.
About Document Intelligence models
Azure Document Intelligence differs greatly from traditional OCR engines. It utilizes a combination of AI models and techniques working together, including CNNs (convolutional neural networks) and computer vision, to turn raw images into structured text data.
Grooper's initial Document Intelligence integration focuses on two prebuilt models.
The Read model (prebuilt-read)
Purpose: High-quality text extraction
What it does:
- Extracts printed and handwritten text
- Preserves language, font style, and confidence scores
- Supports many languages
- Detects barcodes (when the "barcode extraction" add-on feature is enabled)
Benefits:
- Cost effective
- Best model to use if you just need text
The Layout model (prebuilt-layout)
Purpose: High-quality text extraction + Document structure analysis
What it does:
- Everything Read does plus:
- Analyzes a document's structure for use by the DI Layout quoting method
- Detects paragraphs, titles, headers, footers
- Identifies tables, rows, and columns
- Keeps reading order
- The Azure DI OCR engine does not store structure analysis data. This information is only generated and returned by the DI Analyze activity.
- Locates OMR checkboxes (Azure calls these "selection marks")
Benefits:
- Best model to use with DI Analyze for DI Layout injection
- Best model to use for checkbox detection
Cost:
- The Read model is substantially cheaper than the Layout model.
- At the time this article was written, the Layout model costs about $0.01 per page.
- Azure's full Document Intelligence pricing model is found here.
About document structure analysis
- The Azure DI OCR engine does not store structure analysis data. This information is only generated and returned by the DI Analyze activity.
Layout-enabled models (such as prebuilt-layout) go beyond simple text recognition and perform structure analysis, which focuses on understanding how information is organized on a page, not just what the text says. Using computer vision, these models analyze spatial layout, reading order, and visual cues to identify:
- Paragraphs
- Headers and footers
- Lists
- Tables
- Form field groupings (when the "key-value pair" add-on feature is enabled)
These structural relationships are critical for accurate downstream data extraction during AI Extract. The DI Layout quoting method passes this structural information to a large language model (LLM), providing essential spatial context for more accurate Data Model extraction.
Add-on features
Microsoft's full documentation on Document Intelligence add-ons can be found here.
Microsoft has developed several optional capabilities for its Document Intelligence service. These features can be enabled or disabled to meet the needs of specific document scenarios.

In Grooper, these add-ons can be enabled in the Azure DI OCR engine's set of Features properties.
- Not all models can utilize every add-on. For more information see Azure's model overview documentation. Add-on features unsupported by the Read model are documented below.
Is there a cost to using add-on features?
- Monetary cost - Some add-ons incur additional cost in your Azure billing. Azure's full Document Intelligence pricing model is found here.
- Processing cost - All add-on features generate additional data and/or higher quality results. This takes additional processing time in both cases.
Add-on features relevant to Azure DI OCR
Azure DI OCR is an OCR engine in Grooper that uses Document Intelligence. It is used by the Recognize activity when referenced in an OCR Profile. As such, its primary focus is text and layout data.
DI Analyze is an activity in Grooper that generates a robust data file from Document Intelligence analysis. There are several add-ons that are only utilized by DI Analyze and the files it produces which are utilized by the DI Layout quoting method.
- Relevant to Azure DI OCR - Enabling these features will affect results Azure DI OCR produces
- Barcodes - Impacts the Grooper layout data Recognize generates (Grooper.Layout.json)
- OCR High Resolution - Impacts the Grooper text data Recognize generates (Grooper.Characters.txt)
- Irrelevant to Azure DI OCR - There is no need to enable these features for Azure DI OCR. These features only pertain to the data the DI Analyze activity generates.
- Formulas
- Key Value Pairs
- Languages
- Query Fields
- Style Font
Barcodes
Incurs additional cost in Azure? No
Azure DI OCR stores detected barcodes in the Grooper Layout Data file (Grooper.Layout.json) created during Recognize. This extends barcode data to all mechanisms Grooper uses to evaluate and return barcodes from layout data, such as the Find Barcode extractor.
Supported barcode types:
- QR Code
- Code 39
- Code 93
- Code 128
- UPC (UPC-A & UPC-E)
- PDF417
- EAN-8
- EAN-13
- Codabar
- Databar
- ITF
- Data Matrix
From Azure:
- The
ocr.barcodecapability extracts all identified barcodes in thebarcodescollection as a top level object undercontent. Inside thecontent, detected barcodes are represented as:barcode:. Each entry in this collection represents a barcode and includes the barcode type askindand the embedded barcode content asvaluealong with its polygon coordinates. Initially, barcodes appear at the end of each page. Theconfidenceis hard-coded for as 1.
OCR High Resolution
Incurs additional cost in Azure? Yes
This can be used on larger documents (such as engineering drawings) and high resolution documents to improve OCR accuracy. It has also shown to improve OCR accuracy on "normal" documents as well, but it does incur an additional cost from Azure.
From Azure:
- The task of recognizing small text from large-size documents, like engineering drawings, is a challenge. Often the text is mixed with other graphical elements and has varying fonts, sizes, and orientations. Moreover, the text can be broken into separate parts or connected with other symbols. Document Intelligence now supports extracting content from these types of documents with the
ocr.highResolutioncapability. You get improved quality of content extraction from A1/A2/A3 documents by enabling this add-on capability.
Languages
- Language data is "structure analysis" data. The Azure DI OCR engine does not store structure analysis data. This information is only generated and returned by the DI Analyze activity.
Incurs additional cost in Azure? No
From Azure:
- Adding the
languagesfeature to theanalyzeResultrequest predicts the detected primary language for each text line along with the confidence in the languages collection underanalyzeResult.
Key Value Pairs
- Key value pairs are not supported by the Read model.
- Key value pairs are "structure analysis" data. The Azure DI OCR engine does not store structure analysis data. This information is only generated and returned by the DI Analyze activity.
Incurs additional cost in Azure? No
From Azure:
- Key-value pairs are specific spans within the document that identify a label or key and its associated response or value. In a structured form, these pairs could be the label and the value the user entered for that field. In an unstructured document, they could be the date a contract was executed on based on the text in a paragraph. The AI model is trained to extract identifiable keys and values based on a wide variety of document types, formats, and structures.
- Keys can also exist in isolation when the model detects that a key exists, with no associated value or when processing optional fields. For example, a middle name field can be left blank on a form in some instances. Key-value pairs are spans of text contained in the document. For documents where the same value is described in different ways, for example, customer/user, the associated key is either customer or user (based on context).
Style Font (font recognition)
- Font recognition is "structure analysis" data. The Azure DI OCR engine does not store structure analysis data. This information is only generated and returned by the DI Analyze activity.
Incurs additional cost in Azure? Yes
From Azure:
- The
ocr.fontcapability extracts all font properties of text extracted in thestylescollection as a top-level object undercontent. Each style object specifies a single font property, the text span it applies to, and its corresponding confidence score. The existing style property is extended with more font properties such assimilarFontFamilyfor the font of the text,fontStylefor styles such as italic and normal,fontWeightfor bold or normal, color for color of the text, andbackgroundColorfor color of the text bounding box.
Formulas
- Formulas are "structure analysis" data. The Azure DI OCR engine does not store structure analysis data. This information is only generated and returned by the DI Analyze activity.
Incurs additional cost in Azure? Yes
From Azure:
- The
ocr.formulacapability extracts all identified formulas, such as mathematical equations, in theformulascollection as a top level object undercontent. Insidecontent, detected formulas are represented as:formula:. Each entry in this collection represents a formula that includes the formula type asinlineordisplay, and its LaTeX representation asvaluealong with itspolygoncoordinates. Initially, formulas appear at the end of each page.
Query Fields
- Query fields are "structure analysis" data. The Azure DI OCR engine does not store structure analysis data. This information is only generated and returned by the DI Analyze activity.
Incurs additional cost in Azure? Yes
UNDER DEVELOPMENT: This parameter is present in the Grooper property grid but is not fully implemented. Enabling this feature will do nothing at this time and is likely to throw an error.
How Azure DI OCR Utilizes the OCR Data Aligner
A unique feature of Grooper’s Azure DI OCR implementation is the use of the "OCR Data Aligner". This component coordinates the alignment and correction of OCR results between Azure Document Intelligence and Grooper’s internal OCR engines (such as Transym or Tesseract).
The OCR Data Aligner performs several key functions:
- Alignment: It matches and merges text segments from Azure with those recognized by Grooper’s traditional OCR engines, improving consistency and accuracy.
- Correction: When Azure and Grooper disagree on a segment, the aligner uses configurable vocabulary, preferred patterns (defined by regular expressions), and confidence thresholds to select the best result.
- Diagnostics: The aligner provides advanced annotation and logging, helping users review and troubleshoot recognition results.
- Layout Data Augmentation: It supplements the layout data with additional features detected by Azure, such as checkboxes and barcodes, ensuring comprehensive coverage.
By leveraging the OCR Data Aligner, Grooper ensures that the final OCR output is both accurate and contextually appropriate, even in challenging scenarios where multiple engines produce conflicting results.