Azure OCR (OCR Engine): Difference between revisions

From Grooper Wiki
// via Wikitext Extension for VSCode
 
(15 intermediate revisions by 3 users not shown)
Line 14: Line 14:
== About ==
== About ==


'''''Azure OCR''''' is different from traditional '''''[[OCR Engine (Property)|OCR Engines]]'''''. It is a CNN (Convolutional Neural Network) based '''''OCR Engine''''' meaning that it is AI based. Due to the way this neural network has been trained, Azure OCR is less dependent on '''''[[Image Processing]]'''''.  
'''''Azure OCR''''' is different from traditional '''''[[OCR Engine]]s'''''. It uses Microsoft Azure's Read OCR engine which is a CNN (Convolutional Neural Network) based OCR engine. This uses a deep learning neural network to recognize characters from full images (as opposed to the segmentation-based matrix matching methods traditional OCR engines use).  


Unlike traditional [[OCR (Concept)|OCR]], '''''Azure OCR''''' has a far higher accuracy when recognizing handwritten text on documents. However, '''''Azure OCR''''' alone does not give 100% accurate position data of characters, it only gives us an approximation. This can cause problems for extractors that are reliant on character/text positions such as '''''[[Labeled Value (Extractor Type)|Labeled Value]]''''', '''''[[Labeled OMR (Extractor Type)|Labeled OMR]]''''', or '''''[[Tabular Layout (Table Extract Method)|Tabular Layout]]'''''. '''''Azure OCR''''' also does not always capture smaller numeric values such as 1s and 0s. This can make collecting some data problematic.  
Unlike traditional [[OCR]], Microsoft's Azure Read engine has a far higher accuracy over all when recognizing text from an image. It is capable of recognizing hand printed characters, which many traditional OCR engines cannot do at all.  It is also as good or better at a wide variety of machine print fonts as traditional OCR engines. This includes specialized fonts like MICR. Furthermore, due to the way this neural network has been trained, the Azure Read engine is less depending on image pre-processing than traditional OCR engines. This eliminates the need for complicated '''IP Profiles''' when using '''''Azure OCR''''' in Grooper.


To compensate, a traditional '''OCR Engine''' (Transym) runs at the same time when using '''''Azure OCR''''' because traditional OCR is highly effective at obtaining position data and can capture smaller values. A traditional '''OCR Engine''' is more dependent on '''''Image Processing'''''. When choosing '''''Azure OCR''''', a default set of '''''Image Processing''''' steps are applied to the document to improve traditional OCR accuracy.  
However, the Azure Read engine does have some drawbacks compared to traditional OCR. Unlike traditional OCR engines, like Transym, the Azure Read engine does not return the pixel-accurate position of characters. It only gives us an approximation. This can cause problems for extractors that are reliant on character/text positions such as '''''[[Labeled Value]]''''', '''''[[Labeled OMR]]''''', or '''''[[Tabular Layout]]'''''. The Azure Read engine also does not always capture data correctly from "data dense" documents littered with text on tables. This is particularly the case for tabular data with cells containing single characters, (such "0" or "1" or "A"). This can make collecting certain data structures problematic.  


Grooper attempts to return the most accurate results from both the '''''Azure OCR''''' and the traditional '''OCR Engine'''.  
Grooper's implementation of the Azure Read engine compensates for these shortcomings (and more). Using our '''''Azure OCR''''' offering, a traditional OCR engine (Transym) runs in parallel with Azure's Read engine. Results from Transym supplement Azure's results with more accurate character positions and values in areas we have found Azure to be deficient. This gives us the best in both worlds in terms of the Azure Read engine's strengths and traditional OCR engines' strengths.


=== Traditional OCR vs. Azure OCR ===
Supported image types: JPEG, PNG, BMP, PDF, and TIFF


In the screenshots below, we can see the difference between using traditional OCR and Azure OCR on a document that has small text and handwriting.  
{|class="fyi-box"
|
'''FYI'''
|
You will need an API key from Azure when setting up an '''OCR Profile''' that uses '''''Azure OCR'''''. If you do not have an API key and need some instructions on how to create one in Azure, visit the following link in our "Grooper and AI" article:
: [[Grooper and AI#Azure OCR Quickstart|Azure OCR Quickstart]]
|}
 
=== Azure OCR strengths over traditional OCR ===
 
Azure OCR has many advantages over traditional OCR engines (such as Transym).
* Azure OCR is less dependent on image processing than traditional OCR engines.
* It can handle poor quality images and even photos taken with a digital camera or phone camera without the aid of an '''IP Profile'''.
* Azure OCR has exceptionally good handwritten text recognition. Traditional OCR engines have poor handwriting recognition or none at all.
* Azure OCR's machine print recognition is generally as good or better than traditional OCR engines. This is particularly the case for lexical data (i.e. words).
 
 
In the screenshots below, we can see the difference between using traditional OCR and '''''Azure OCR''''' on a document that has small text and handwriting.  


# In the first screenshot, we can see the result of using traditional OCR. The traditional OCR is not equipped to handle handwriting, and the small print with minimal spaces between the characters makes it very difficult for traditional OCR.  
# In the first screenshot, we can see the result of using traditional OCR. The traditional OCR is not equipped to handle handwriting, and the small print with minimal spaces between the characters makes it very difficult for traditional OCR.  
Line 31: Line 48:




#<li value=2> In this second screenshot, we have used Azure OCR on the same document. Azure OCR relies on the CNN AI training rather than individual character analysis. Azure OCR does a much better job at returning accurate data for this particular document, even the handwritten sections.  
#<li value=2> In this second screenshot, we have used '''''Azure OCR''''' on the same document. '''''Azure OCR''''' relies on the CNN AI training rather than individual character analysis. '''''Azure OCR''''' does a much better job at returning accurate data for this particular document, even the handwritten sections.  


[[File:2024 Azure-OCR 01 01 02.png]]
[[File:2024 Azure-OCR 01 01 02.png]]


=== How Grooper overcomes Azure OCR drawbacks ===


=== Azure OCR drawbacks ===
Microsoft Azure OCR by itself is not perfect. There are things that traditional OCR is better at capturing than Azure.


'''''Azure OCR''''' by itself is not perfect. There are things that traditional OCR is better at capturing than '''''Azure OCR''''', so we don't solely rely on '''''Azure OCR''''' for recognizing text. Instead, when selecting '''''Azure OCR''''' as your '''''OCR Engine''''' both '''''Azure OCR''''' and a default traditional '''''OCR Engine''''' will both run and Grooper will combine the results.  
Some things Azure OCR is not as adept at on its own include:
* Pixel-perfect character positions. Azure's character position data is more of an approximation than an exact location. This can be especially problematic when using extractors that heavily rely on positioning data (such as the '''''Labeled OMR''''' extractor or the '''''Tabular Layout''''' table extract method.
* Totally accurate recognition of numeric data. This is particularly the case for small numbers on highly dense documents (those that have a lot of text on them). It is often the case Azure will miss small digits like 0s and 1s.


A couple of things that '''''Azure OCR''''' is not as adept at returning are small numbers such as single 0s and 1s and accurate character/text segment positions. As said before, this can be especially problematic when using extractors that heavily rely on positioning data.


Grooper's implementation of Azure OCR does not solely rely on Azure for recognizing text. Instead, when selecting '''''Azure OCR''''' as your '''''OCR Engine''''' both Azure and a traditional OCR engine will run and Grooper will combine the results.
:*<li class="fyi-bullet"> By default, Grooper runs an implementation of the Transym OCR engine. However, users may customize the traditional OCR engine by selecting an '''OCR Profile''' using the '''''Traditional OCR Profile''''' property.
Next, we will detail an example where Azure OCR fails on its own and Grooper supplements the results with a traditional OCR engine.
# In the screenshot below, we are looking at the Diagnostics page after running '''''Recognize''''' configured with '''''Azure OCR''''' on a document.
# In the screenshot below, we are looking at the Diagnostics page after running '''''Recognize''''' configured with '''''Azure OCR''''' on a document.
# In the Diagnostics, the "Azure Words.tif" will show what '''''Azure OCR''''' by itself returned.  
# In the Diagnostics, the "Azure Words.tif" will show what '''''Azure OCR''''' by itself returned.  
Line 57: Line 80:




The result of combining both '''''OCR Engines''''' is what Grooper will actually recognize from the document.  
The result of combining both '''''OCR Engines''''' is what Grooper will actually recognize from the document.
 
=== Azure OCR in a Docker container ===
 
Azure AI services may be hosted in a [https://www.docker.com/resources/what-container/ Docker container]. This lets you "self-host" Azure OCR. It lets you use the same APIs available in Azure, but on-premises. Reasons to do this include compliance, security or other operational concerns.
 
More information on deploying Azure AI services in containers can be found in Microsoft's [https://learn.microsoft.com/en-us/azure/ai-services/cognitive-services-container-support Azure AI containers overview].
 
[[File:2024 Azure-OCR 01 03 01-DockerURL.png|right|600px]]
When connecting to Azure AI Services hosted in a Docker container, you must enter the URL for the Azure AI container with the "vision" endpoint specified. The URL should look like this:
<pre style="width:auto">http://<container ip-address>:5000/vision</pre>
Be aware:
* "vision" must be lower-case.
* 5000 is the default port for the container.


== How to ==
== How to ==
Line 99: Line 136:
[[File:2024 Azure-OCR 02 02 02.png]]
[[File:2024 Azure-OCR 02 02 02.png]]


=== Adding the OCR Profile to the Recognize Step ===
=== Adding the OCR Profile to the Recognize step ===


You will need to add a '''Batch Process Step''' configured with the ''Recognize Activity'' to your '''Batch Process'''. You will also need to configure the Step Properties such as the '''''Activity''''' and '''''Scope'''''. For help with setting up your '''Batch Process''', take a look at our [[Batch Process (Object)|Batch Process]] article.  
You will need to add a '''Batch Process Step''' configured with the '''''Recognize Activity''''' to your '''Batch Process'''. In the '''''Step Properties''''', ensure the '''''Activity''''' is set to '''''Recognize''''' and the '''''Scope''''' is appropriate for your processing level. For help with setting up your '''Batch Process''', take a look at our [[Batch Process]] article.  


# Add and select the '''Recognize Step''' in your '''Batch Process''' in the Node Tree.  
# Add and select the '''''Recognize''''' step in your '''Batch Process''' in the Node Tree.  
# Click on the hamburger icon to the right of the '''''OCR Profile''''' property to access the navigation drop down.  
# Click on the hamburger icon to the right of the '''''OCR Profile''''' property to access the navigation drop down.  
# Navigate to and select the '''OCR Profile''' that has been configured with the '''''Azure OCR''''' '''''OCR Engine'''''.  
# Navigate to and select the '''OCR Profile''' that has been configured with the '''''Azure OCR''''' engine.  


[[File:2024 Azure-OCR 02 03 01.png]]
[[File:2024 Azure-OCR 02 03 01.png]]




#<li value=4> Finish configuring your '''Batch Process Step''' and then click the save icon located in the top right of the Step Properties property grid to save your changes.  
#<li value=4> Finish configuring your '''Batch Process Step''' and then click the save icon located in the top right of the '''''Step Properties''''' property grid to save your changes.  


[[File:2024 Azure-OCR 02 03 02.png]]
[[File:2024 Azure-OCR 02 03 02.png]]


== Glossary ==
== Image Segmentation and Segment Reprocessing incompatibility ==
 
<u><big>'''Activity'''</big></u>: {{#lst:Glossary|Activity}}
 
<u><big>'''Alignment'''</big></u>: {{#lst:Glossary|Alignment}}
 
<u><big>'''Batch'''</big></u>: {{#lst:Glossary|Batch}}
 
<u><big>'''Batch Process Step'''</big></u>: {{#lst:Glossary|Batch Process Step}}
 
<u><big>'''Batch Process'''</big></u>: {{#lst:Glossary|Batch Process}}
 
<u><big>'''Image Processing'''</big></u>: {{#lst:Glossary|Image Processing Concept}}
 
<u><big>'''Image Processing'''</big></u>: {{#lst:Glossary|Image Processing}}
 
<u><big>'''IP Profile'''</big></u>: {{#lst:Glossary|IP Profile}}
 
<u><big>'''Labeled OMR'''</big></u>: {{#lst:Glossary|Labeled OMR}}
 
<u><big>'''Labeled Value'''</big></u>: {{#lst:Glossary|Labeled Value}}
 
<u><big>'''Node Tree'''</big></u>: {{#lst:Glossary|Node Tree}}
 
<u><big>'''OCR'''</big></u>: {{#lst:Glossary|OCR}}
 
<u><big>'''OCR Engine'''</big></u>: {{#lst:Glossary|OCR Engine}}
 
<u><big>'''OCR Profile'''</big></u>: {{#lst:Glossary|OCR Profile}}
 
<u><big>'''Project'''</big></u>: {{#lst:Glossary|Project}}
 
<u><big>'''Recognize'''</big></u>: {{#lst:Glossary|Recognize}}


<u><big>'''Scope'''</big></u>: {{#lst:Glossary|Scope}}
Azure OCR is NOT compatible with Grooper's "Image Segmentation" and "Segment Reprocessing" Synthesis features. The Read API will return an error if you attempt to configure these properties in an OCR Profile. The image snippets Grooper uses to process the segments fall under Azure's minimum image dimensions.


<u><big>'''Tabular Layout'''</big></u>: {{#lst:Glossary|Tabular Layout}}
OUR BEST PRACTICE ADVICE: Set the "Synthesis" property to Disabled when using Azure OCR in an OCR Profile.

Latest revision as of 14:24, 3 April 2025

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

Azure OCR is an OCR Engine option for OCR Profiles that utilizes Microsoft Azure's Read API. Azure's Read engine is an AI-based text recognition software that uses a convolutional neural network (CNN) to recognize text. Compared to traditional OCR engines, it yields superior results, especially for handwritten text and poor quality images. Furthermore, Grooper supplements Azure's results with those from a traditional OCR engine in areas where traditional OCR is better than the Read engine.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2024). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

About

Azure OCR is different from traditional OCR Engines. It uses Microsoft Azure's Read OCR engine which is a CNN (Convolutional Neural Network) based OCR engine. This uses a deep learning neural network to recognize characters from full images (as opposed to the segmentation-based matrix matching methods traditional OCR engines use).

Unlike traditional OCR, Microsoft's Azure Read engine has a far higher accuracy over all when recognizing text from an image. It is capable of recognizing hand printed characters, which many traditional OCR engines cannot do at all. It is also as good or better at a wide variety of machine print fonts as traditional OCR engines. This includes specialized fonts like MICR. Furthermore, due to the way this neural network has been trained, the Azure Read engine is less depending on image pre-processing than traditional OCR engines. This eliminates the need for complicated IP Profiles when using Azure OCR in Grooper.

However, the Azure Read engine does have some drawbacks compared to traditional OCR. Unlike traditional OCR engines, like Transym, the Azure Read engine does not return the pixel-accurate position of characters. It only gives us an approximation. This can cause problems for extractors that are reliant on character/text positions such as Labeled Value, Labeled OMR, or Tabular Layout. The Azure Read engine also does not always capture data correctly from "data dense" documents littered with text on tables. This is particularly the case for tabular data with cells containing single characters, (such "0" or "1" or "A"). This can make collecting certain data structures problematic.

Grooper's implementation of the Azure Read engine compensates for these shortcomings (and more). Using our Azure OCR offering, a traditional OCR engine (Transym) runs in parallel with Azure's Read engine. Results from Transym supplement Azure's results with more accurate character positions and values in areas we have found Azure to be deficient. This gives us the best in both worlds in terms of the Azure Read engine's strengths and traditional OCR engines' strengths.

Supported image types: JPEG, PNG, BMP, PDF, and TIFF

FYI

You will need an API key from Azure when setting up an OCR Profile that uses Azure OCR. If you do not have an API key and need some instructions on how to create one in Azure, visit the following link in our "Grooper and AI" article:

Azure OCR Quickstart

Azure OCR strengths over traditional OCR

Azure OCR has many advantages over traditional OCR engines (such as Transym).

  • Azure OCR is less dependent on image processing than traditional OCR engines.
  • It can handle poor quality images and even photos taken with a digital camera or phone camera without the aid of an IP Profile.
  • Azure OCR has exceptionally good handwritten text recognition. Traditional OCR engines have poor handwriting recognition or none at all.
  • Azure OCR's machine print recognition is generally as good or better than traditional OCR engines. This is particularly the case for lexical data (i.e. words).


In the screenshots below, we can see the difference between using traditional OCR and Azure OCR on a document that has small text and handwriting.

  1. In the first screenshot, we can see the result of using traditional OCR. The traditional OCR is not equipped to handle handwriting, and the small print with minimal spaces between the characters makes it very difficult for traditional OCR.


  1. In this second screenshot, we have used Azure OCR on the same document. Azure OCR relies on the CNN AI training rather than individual character analysis. Azure OCR does a much better job at returning accurate data for this particular document, even the handwritten sections.

How Grooper overcomes Azure OCR drawbacks

Microsoft Azure OCR by itself is not perfect. There are things that traditional OCR is better at capturing than Azure.

Some things Azure OCR is not as adept at on its own include:

  • Pixel-perfect character positions. Azure's character position data is more of an approximation than an exact location. This can be especially problematic when using extractors that heavily rely on positioning data (such as the Labeled OMR extractor or the Tabular Layout table extract method.
  • Totally accurate recognition of numeric data. This is particularly the case for small numbers on highly dense documents (those that have a lot of text on them). It is often the case Azure will miss small digits like 0s and 1s.


Grooper's implementation of Azure OCR does not solely rely on Azure for recognizing text. Instead, when selecting Azure OCR as your OCR Engine both Azure and a traditional OCR engine will run and Grooper will combine the results.

  • By default, Grooper runs an implementation of the Transym OCR engine. However, users may customize the traditional OCR engine by selecting an OCR Profile using the Traditional OCR Profile property.

Next, we will detail an example where Azure OCR fails on its own and Grooper supplements the results with a traditional OCR engine.

  1. In the screenshot below, we are looking at the Diagnostics page after running Recognize configured with Azure OCR on a document.
  2. In the Diagnostics, the "Azure Words.tif" will show what Azure OCR by itself returned.
  3. In this case, there are two 0s that are not being captured at all by Azure OCR. They are small numbers that have been skipped.
  4. We also see that Azure OCR found all numeric values in the PAID AMT column in the table, but the positioning data is not accurate.


  1. If we select the "Alignment.tif" in the Diagnostics tree on the left, we can see the combined result of Azure OCR and the traditional OCR Engine.
  2. The characters and text segments on the document highlighted in orange are corrections made from the results of traditional OCR. The traditional OCR Engine detected the 0s that Azure OCR missed.
  3. Grooper also determined that the traditional OCR Engine did a better job at recognizing one of the numeric values whose position data was not accurately detected by Azure OCR.


The result of combining both OCR Engines is what Grooper will actually recognize from the document.

Azure OCR in a Docker container

Azure AI services may be hosted in a Docker container. This lets you "self-host" Azure OCR. It lets you use the same APIs available in Azure, but on-premises. Reasons to do this include compliance, security or other operational concerns.

More information on deploying Azure AI services in containers can be found in Microsoft's Azure AI containers overview.

When connecting to Azure AI Services hosted in a Docker container, you must enter the URL for the Azure AI container with the "vision" endpoint specified. The URL should look like this:

http://<container ip-address>:5000/vision

Be aware:

  • "vision" must be lower-case.
  • 5000 is the default port for the container.

How to

To use Azure OCR you will need to add and configure an OCR Profile. Then you will need to add that OCR Profile to your Recognize Batch Process Step. Then you can test your Step or run the Batch Process when complete.

Setting up the OCR Profile

Adding an OCR Profile

  1. Right-click on the Project or folder inside of your Project in your Node Tree where you want to add your OCR Profile.
  2. Hover over "Add".
  3. Click on "OCR Profile..."


  1. Enter in your desired name for your OCR Profile in the Name property field.
  2. Click "EXECUTE" in the top right-hand corner of the pop-up window to create your OCR Profile.


  1. Now you should have a new OCR Profile in your Node Tree.


Configuring the OCR Profile

  1. Click the hamburger icon to the right of the OCR Engine property to access the drop down menu.
  2. Select Azure OCR from the drop down menu.


  1. Copy and paste your unique API Key into the API Key property and select your API Region from the drop down menu accessed by clicking on the hamburger icon next to the property.
  2. Optionally, you can add a Traditional Ocr Profile. If this property is left blank, Grooper will run a basic Traditional OCR Engine (Transym) in addition to Azure OCR. If you would like to override the default, you can select a different OCR Profile here.
  3. Click the save icon in the top right of the property grid to save your changes.

Adding the OCR Profile to the Recognize step

You will need to add a Batch Process Step configured with the Recognize Activity to your Batch Process. In the Step Properties, ensure the Activity is set to Recognize and the Scope is appropriate for your processing level. For help with setting up your Batch Process, take a look at our Batch Process article.

  1. Add and select the Recognize step in your Batch Process in the Node Tree.
  2. Click on the hamburger icon to the right of the OCR Profile property to access the navigation drop down.
  3. Navigate to and select the OCR Profile that has been configured with the Azure OCR engine.


  1. Finish configuring your Batch Process Step and then click the save icon located in the top right of the Step Properties property grid to save your changes.

Image Segmentation and Segment Reprocessing incompatibility

Azure OCR is NOT compatible with Grooper's "Image Segmentation" and "Segment Reprocessing" Synthesis features. The Read API will return an error if you attempt to configure these properties in an OCR Profile. The image snippets Grooper uses to process the segments fall under Azure's minimum image dimensions.

OUR BEST PRACTICE ADVICE: Set the "Synthesis" property to Disabled when using Azure OCR in an OCR Profile.