DI Analyze (Activity): Difference between revisions

Latest revision as of 10:53, 5 June 2026

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

format_image_leftDI Analyze is a Grooper Activity (used in a Batch Process) that submits document content to Azure Document Intelligence for document analysis and saves the results back onto the processed item as files. It is designed to capture not only recognized text, but also layout and semantic structure (for example: lines, words, paragraphs, sections, and other document elements exposed by the selected Azure model).

Introduction

DI Analyze is a Grooper Activity used in a Batch Process to submit document content (pages, folders, and in some cases attachments like PDFs/TIFFs) to Azure Document Intelligence for analysis, then save the returned analysis results back onto the item as a JSON file (for example, AzureDI_prebuilt-layout.json).

Unlike a traditional OCR step that primarily returns recognized text, DI Analyze is meant to capture and persist richer “document understanding” output—recognized text plus layout and structural elements (such as lines, words, paragraphs, and higher-level document organization). Those saved results can then be reused by downstream steps (including Azure DI OCR and extraction/validation logic) without re-analyzing, unless "Overwrite" is enabled.

DI Analyze can also optionally correct page orientation based on Azure's layout analysis, rotating PDF pages via PDF rotation and rotating image pages by saving a rotated primary image. Configuration is driven by the selected Azure model ("Model Name"), language, optional feature flags, and content format, and it relies on the repository's Azure Document Intelligence option.

Other Grooper Features that can use DI Analyze output:

Azure DI OCR
DI layout (Quoting Method)
JSON File (Quoting Method)
Diagnostics and review outputs (for troubleshooting/validation)

What DI Analyze produces

DI Analyze produces a set of “saved results” that describe what Azure Document Intelligence found on your document, so other Grooper steps (and users) can reuse and review that information. These saved results include:

A results file (the main output)

DI Analyze saves a JSON file to the Batch Page or Batch Folder it processed.
Think of this as a “report” that contains:
- The text it was able to read
- Where that text appears on the page (layout/position)
- Page-by-page details (and other structure, depending on the model you chose)
The file name is based on the model, for example: AzureDI_prebuilt-layout.json

Optional “review” files (only if diagnostics are turned on)

A readable version of the same results that you can open to inspect what Azure returned.
A markdown file and an HTML file that present the extracted content in an easier-to-read format than raw results.

Optional marked-up images (only if diagnostics are turned on, and you ran it on pages)

DI Analyze can generate images that look like your original page but with boxes/overlays showing what it detected (like lines, words, and paragraphs).
These are mainly for troubleshooting—so you can quickly see whether recognition and layout detection look correct.

Optional page rotation (only if “Correct Orientation” is enabled)

If the page is sideways or upside down, DI Analyze can automatically rotate it based on what it detects.
This is helpful because it can improve accuracy for later OCR/extraction steps.

Test it yourself

The Document Intelligence Studio on Microsoft Azure's website does a good job of illustrating what kinds of information DI Analyze collects from document. If you would like to test this yourself, you can create a free account on Azure.

To get started, go to the provided Microsoft Azure URL, then click the "Subscribe to Azure" link. It is highly recommended that you read through all of Micorsoft's documentation on Azure Search.
This walk through is assuming the use of a free trial subscription. You may need to alter your choices depending on your needs. Click the "start a trial subscription" link.
On the subsequent page, click the "Try Azure for free" button.
Sign in to your Microsoft account.
Check the first check box to agree to the subscription agreement as it is required to move forward, then click the "Next" button.
Identity verification needs to be done by phone and credit or debit card. First, enter your telephone information, then click your preferred method of verification.
Once received, enter the provided "Verification code" in the appropriate field, then click the "Verify code" button.
Next, we need to provide credit or debit card information. As stated, this will not charge your card. Provide the appropriate card information, then click the "Sign up" button.
Once you have signed up, go to the Azure portal page at https://portal.azure.com/#home. Click "All services" in the left menu.
Click "Document intelligences" from the AI + machine learning section.
If you haven't already done so, you'll need to create a new resource. Click the "Create" button.
Fill out the required information for the "Basics" section. You'll need to select a subscription and Resource group. If you do not have a Resource Group, click the "Create new" below the drop down and follow the instructions. Once you've finished filling out the rest of the information, click "Next".
On the Network tab, select your preference for network security then click "Next".
If desired, edit the Identity properties. We have left everything to defaults. Click "Next".
Add Tags if desired. When finished, click "Next".
Verify all information is correct. Then click "Create".
Now, if you want to go to the Document Intelligence Studio to see what Document Intelligence recognizes from a document, click on the resource name.
Click on the "Go to Document Intelligence Studio" button in the right panel.
Click the "Start with Document Intelligence" button.
Choose the types of information you want to gather with Document Intelligence, and click "Try it out" to test.

Now that you have access to Azure Document Intelligence Studio, you can import documents, run analyze, and see just what information is obtained off of the documents. Document Intelligence can often recognize parts of a document such as titles, labels, headings, tables, and OMR boxes.

This Azure's Document Intelligence Studio where we can better understand what kind of information Azure DI is collecting. This can be found on the Microsoft Azure website.
This is a sample mock document authorization form we have created and brought into the Document Intelligence Studio to analyze.
Let's run analysis on the document and see what types of information Azure Document Intelligence can collect.
Notice the blue and yellow highlighting on the document that shows the information detected. On the right, we have a panel that shows DI Analyze's results.
Hovering over one of the highlighted portions of the document gives you more information about what was detected. The "Account Authorization Form" text segment was recognized as a title of the document.
"Primary Account Holder Information" was recognized as a section heading.
Tables can also be detected.
Here is what Document Intelligence detected for the table. Document Intelligence does a pretty good job of determining the structure of a table on a document.
OMR boxes were also detected. We can see which boxes were marked and recognized as "selected". On this particular document, all of the options were selected, but were there any empty OMR boxes, Document Intelligence would be able to detect that.

DI Analyze and Azure DI OCR

DI Analyze and Azure DI OCR both use Azure Document Intelligence, but they are designed for different purposes in Grooper.

DI Analyze is a Batch Process activity that sends a document to Azure Document Intelligence and saves the returned analysis results as a file on the Batch Page or Batch Folder.
Azure DI OCR is an OCR engine used through an OCR Profile to return OCR results that Grooper can use during OCR processing.

The main difference is the type of output each one is meant to provide:

DI Analyze produces a saved analysis file
Azure DI OCR produces OCR results

Although both use the same Azure service connection, they are not interchangeable.

Use DI Analyze Data for Azure DI OCR

However, if you run DI Analyze before running Recognize configured with an Azure DI OCR OCR Profile, Azure DI OCR can use the information obtained by DI Analyze instead of making a second call to Azure Document Intelligence.

Keep in mind, that for this to work both DI Analyze and Azure DI OCR need to use the same Model, such as "prebuilt-layout", for Azure DI OCR to use the data from DI Analyze. If DI Analyze is using "prebuilt-layout" but the Azure DI OCR is using "prebuilt-read", then Azure DI OCR will make a second call because you're asking it to get different information using a different Model.

How DI Analyze data is used during extraction

DI Analyze data can be used during extraction as a saved source of Azure Document Intelligence results. Instead of relying only on OCR output at the moment of extraction, later steps in the Batch Process can reuse the analysis file created by DI Analyze.

Saved analysis for downstream use

When DI Analyze runs, it saves the Azure analysis result as a JSON file on the processed Batch Page or Batch Folder. This gives downstream extraction steps a stored source of:

recognized text
page layout information
document structure returned by Azure

Because the result is saved on the item, later steps can use the same analysis data without needing to run the Azure analysis again.

Use in AI-assisted extraction

DI Analyze data is especially useful in AI-assisted extraction workflows. When the "Model Name" is set to prebuilt-layout, the saved result can be used with the DI Layout Quoting Method to inject layout-aware content into AI-powered extraction operations. When other prebuilt or custom models are used, the saved JSON result can be referenced through the JSON File quoting method. This allows extraction steps to work from Azure’s saved analysis output instead of relying only on plain OCR text.

Why this helps extraction

Using DI Analyze data during extraction can help when document layout matters.

For example, it can provide additional context about:

where text appears on the page
how text is grouped into lines or paragraphs
how content is organized across the document

This can be helpful for documents such as:

invoices
statements
forms
other documents with variable layouts

Benefits during testing and troubleshooting

Because the DI Analyze result is saved to the document, it can also help when testing or troubleshooting extraction.

Users can review:

the saved JSON result
diagnostic markdown or HTML output
diagnostic images showing detected lines, words, and paragraphs

This makes it easier to compare what Azure found with what the extraction step used.

Typical workflow

A common extraction workflow is:

Run DI Analyze early in the Batch Process.
Save the Azure analysis result to the Batch Page or Batch Folder.
Use the saved result in later for Azure DI OCR, document quoting, AI-assisted extraction, or other downstream extraction steps.

In this way, DI Analyze acts as a reusable analysis step that prepares document content and layout data for later extraction.

How to

To use DI Analyze in Grooper, you must configure the Azure Document Intelligence Repository Option and then add a DI Analyze step to your Batch Process.

Configure Azure Document Intelligence (repository option)

DI Analyze requires a valid Azure Document Intelligence configuration.

To add the Azure Document Intelligence Repository Option, navigate to the Root Node in your node tree.
Click the ellipsis icon to the right of the Options property.
When the "Options" window appears, click the add icon located in the toolbar above the options list.
Select "Azure Document Intelligence" from the drop down menu.
In the property grid, enter in the API Key and Resource Name for your Azure Document Intelligence subscriptions.
Click "OK" on the "Options" window.
Save your changes to the Root Node.

FYI

You may need to restart IIS and/or Grooper services for the changes to take effect.

Configure a DI Analyze step to a Batch Process

To add a DI Analyze Batch Process Step, right-click on the Batch Process. Hover over "Add Activity", hover over "Cleanup & Recognition", and then click on "DI Analyze" from the fly out menu.
When the Add Activity window appears, change the Step Name if desired or leave it as its default and then click "Execute".
The order in which you run DI Analyze in your Batch Process matters. If you want to use the DI Analyze data during your Recognize Step, you will need to move your DI Analyze Step somewhere before your Recognize Step. The same applies if you want to use the DI Analyze data during Extract.
Set the Batch Process Step Scope. It is recommended to run the DI Analyze step at the Page level. This way you will have the data on the individual pages on the document and the process can be performed more efficiently using multiple threads to run multiple pages of a document at one time.
Next, Click the ellipsis icon to the right of the Model Name property.
Choose a model. Prebuilt-layout is recommended for DI Analyze.
- The main two models used for Document Intelligence in Grooper are prebuilt-layout and prebuilt-read. Prebuilt read will get you text data, where prebuilt layout will get you both text and layout data. Keep in mind that using the prebuild layout model will cost extra. You can find the differences in costs on the Microsoft Azure website.
Next set the Content Format property. You can choose from text or markdown. Either one will work, but keep in mind that AI can use markdown language to better understand the structure of a document.
Save your changes to the Batch Process Step.

Test your step

Click over to the "Activity Tester" tab.
Click the test icon or submit as a job to test your DI Analyze.
Click the Diagnostics icon.
- In the diagnostics we can see what DI Analyze returned. A JSON file contains text, layout, and location information that can later be used to identify different aspects of the text on the document. It also allows us to convert the text to markdown syntax.
- Click on the markdown file. Note how the text is organized. Click the code icon in the top right corner of the diagnostics viewer to see the markdown syntax. The AI is really good at understanding markdown syntax and it especially helps with understanding table structure.
- Click on the HTML file. The HTML file can be used downstream to aid in determining what sections of the document to highlight when going through review.