2024:Grooper and AI

From Grooper Wiki

Grooper's groundbreaking AI-based integrations make getting good data from your documents a reality with less setup than ever before.

Artificial intelligence and machine learning technology has always been a core part of the Grooper platform. It all started with ESP Auto Separation's capability to separate loose pages into discrete documents using trained examples. As Grooper has progressed, we've integrated with other AI technologies to improve our product.

Grooper not only offers Azure's machine-learning based OCR engine as an option, but we improve its results with supplementary data from traditional OCR engines. This gives you the "best of both worlds": Highly accurate character recognition and position data for both machine printed and hand written text with minimal setup.

In version 2024, Grooper added a slew of features incorporating cutting edge AI technologies into the platform. These AI based features accelerate Grooper's design and development. The end result is easier deployments, results with less setup and, for the first time in Grooper, a document search and retrieval mechanism.


Grooper's AI integrations and features include:

  • Azure OCR - A machine learning based OCR engine offered by Microsoft Azure.
  • Large language model (LLM) based data extraction at scale
  • Document assistants allowing users to chat with their documents
  • A robust and speedy document search and retrieval mechanism


In this article, you will find:

Relevant article links

Here, you will find links to all currently available articles pertaining to Grooper AI functionality.

Azure OCR

  • Azure OCR - An OCR Engine option for OCR Profiles that utilizes Microsoft Azure's Read API. Azure's Read engine is an AI-based text recognition software that uses a convolutional neural network (CNN) to recognize text. Compared to traditional OCR engines, it yields superior results, especially for handwritten text and poor quality images. Furthermore, Grooper supplements Azure's results with those from a traditional OCR engine in areas where traditional OCR is better than the Read engine.

LLM connectivity and constructs

Major LLM constructs

  • LLM Connector - A "Repository Option" set on the Grooper Root node. This provides connectivity to LLMs offered by OpenAI and Microsoft Azure.
  • Ask AI - An LLM-based Value Extractor. This extractor returns results by passing the document's text and a natural language prompt to a chat completion service (a chatbot). The chatbot uses the document's text and the prompt to respond with an answer. The chatbot's answer is the extractor's result.
  • AI Extract - An LLM-based Fill Method. Fill Methods are configured on Data Models and Data Sections. They perform a secondary extraction after the child Data Elements extractors and extract methods execute. Or, they can act as the primary extraction mechanism if no extractors or extract methods are configured. AI Extract extracts the Data Model using an LLM chat completion service. In many cases, the only configuration required is Data Elements added to the model. AI Extract fills in the Data Model based on what its Data Elements are named.
  • Clause Detection - An LLM-based Data Section Extract Method. Clause Detection is designed to locate specified types of clauses in natural language documents and return them as section instances in Grooper. The user provides one or more sample clauses, then an embeddings model determines the portion of the document (or "chunk") most similar to the samples, and Grooper returns the paragraph(s) in the chunk. Data Sections using this Extract Method can then use AI Extract to return data using only the text-content of the detected paragraph.
  • AI Analyst - AI Analysts facilitate document chat in Grooper. They use an LLM service's "assistants" API to create document chatbots with domain-specific knowledge. This allows Grooper users to chat with one or more selected documents. When the AI Analyst spins up a chat session, it creates an analyst and passes the document(s) text into its custom knowledge file used to answer questions.
    • AI Dialogue - A Grooper Activity that automates a chat conversion in a Batch Process. The responses can then be accessed by a human operator in a Review step with a Chat Viewer.

LLM construct related properties and concepts

  • Parameters - These properties adjust how an LLM gives a response. For example, Temperature controls the "randomness" of an LLMs response.
  • Document Quoting - This property determines what text is fed to an LLM. This could be the whole document, an extracted portion of the document, a paragraph semantically similar to a given example, or the documents previously extracted Data Model values depending on its configuration.
  • Alignment - These properties control how Grooper highlights LLM-based results in a Document Viewer (or how it "aligns" an LLM response to what text on the document was used to give that response).
  • Prompt Engineering - This concept refers to the process of designing and refining natural language prompts to get a desired answer from an LLM chatbot.

AI Search

AI Search - This is a larger article covering Grooper's document search and retrieval mechanism. This mechanism uses Microsoft Azure AI Search to index Grooper processed documents for fast and efficient searching. Documents can be searched using their full text content (e.g. OCR from Recognize) and their extracted data (e.g. Data Model values collected from Extract). This article covers the following topics:

OpenAI and Azure OpenAI Service privacy policies

Here, you will find links to OpenAI and Azure OpenAI Service privacy policies.

Note: Grooper integrates with OpenAI using the API not ChatGPT

  • OpenAI Data Processing Addendum
  • OpenAI enterprise privacy policy
  • Open AI API Documentation - How we use your data
    Relavent excerpt:
    Your data is your data.

    As of March 1, 2023, data sent to the OpenAI API will not be used to train or improve OpenAI models (unless you explicitly opt-in to share data with us, such as by providing feedback in the Playground). One advantage to opting in is that the models may get better at your use case over time.

    To help identify abuse, API data may be retained for up to 30 days, after which it will be deleted (unless otherwise required by law). For trusted customers with sensitive applications, zero data retention may be available. With zero data retention, request and response bodies are not persisted to any logging mechanism and exist only in memory in order to serve the request.

    Note that this data policy does not apply to OpenAI's non-API consumer services like ChatGPT or DALL·E Labs.
  • OpenAI API usage policies by endpoint

OpenAI and Azure account setup

For Azure OCR connectivity

Grooper can implement Azure AI Vision's Read engine for OCR. "Read" is a machine learning based OCR engine used to recognize text from images with superior accuracy, including both machine printed and handwritten text.

Azure OCR Quickstart

In Azure:

  1. If not done so already, create an Azure account.
  2. Go to your Azure portal
  3. If not done so already, create a "Subscription" and "Resource Group."
  4. Click the "Create a resource" button
  5. Search for "Computer Vision"
  6. Find "Computer Vision" in the list click the "Create" button.
  7. Follow the prompts to create the Computer Vision resource.
  8. Go to the resource.
  9. Under "Keys and endpoint", copy the API key.

In Grooper

  1. Go to an OCR Profile in Grooper.
  2. For the OCR Engine property, select Azure OCR.
  3. Paste the API key in the API Key property.
  4. Adjust the API Region property to match your Computer Vision resource's Region value
    • You can find this by going to the resource in the Azure portal and inspecting its "Location" value.
External resources

For Azure AI Search connectivity

Grooper uses Azure AI Search's API to create a document search and retrieval mechanism.

Azure AI Search Quickstart

In Azure:

  1. If not done so already, create an Azure account.
  2. Go to your Azure portal
  3. If not done so already, create a "Subscription" and "Resource Group."
  4. Click the "Create a resource" button
  5. Search for "Azure AI Search"
  6. Find "Azure AI Search" in the list click the "Create" button.
  7. Follow the prompts to create the Azure AI Search resource.
    • The pricing tier defaults to "Standard". There is a "Free" option if you're just wanting to try this feature out.
  8. Go to the resource.
  9. In the "Essentials" panel, copy the "Url" value. You will need this in Grooper.
  10. In the left-hand navigation panel, expand "Settings" and select "Keys"
  11. Copy the "Primary admin key" or "Secondary admin key." You will need this in Grooper.

In Grooper:

  1. Connect to the Azure AI Search service by adding the "AI Search" option to your Grooper Repository.
  2. From the Design page, go to the Grooper Root node.
  3. Select the Options property and open its editor.
  4. Add the AI Search option.
  5. Paste the AI Search service's URL in the API Key property.
  6. Paste the AI Search service's API key in the API Key property.
  7. Full documentation on Grooper's AI Search capabilities can be found in the AI Search article.
External resources

For LLM connectivity

Grooper can integrate with OpenAI and Microsoft Azure's LLM models to implement various LLM-based functionality. Before you are able to connect Grooper to these resources, you will need accounts setup with these providers. For information on how to set up these accounts/resources, refer to the following links:

OpenAI LLM Quickstart

In OpenAI:

  1. If you have not done so already, create an OpenAI account.
  2. Go to the API keys section of the OpenAI Platform.
  3. Press the "Crate new secret key" button.
  4. Follow the prompts to create a new key.
  5. Copy the API key. You will not be able to view/copy it again.

In Grooper:

  1. Connect to the OpenAI API by adding an "LLM Connector" option to your Grooper Repository.
  2. From the Design page, go to the Grooper Root node.
  3. Select the Options property and open its editor.
  4. Add the LLM Connector option.
  5. Select the Service Providers property and open its editor.
  6. Add an OpenAI provider.
  7. Paste the OpenAI API key in the API Key property.
  8. Refer to the following articles for more information on functionality enabled by the LLM Connector:
External resources

Azure LLM Quickstart

Microsoft Azure offers access to several LLMs through their "Model Catalog". This includes Azure OpenAI models, Mistral AI models, Meta's Llama models, and more. To access a model from the catalog in Grooper, it must first be deployed in Azure from Azure AI Studio.

  • BE AWARE: This advice is intended to give general advice for users wanting to deploy LLM models for use in Grooper. We will instruct you how to use Azure AI Studio to deploy models. These instructions can be applied for users wanting to deploy using Azure OpenAI services only and/or Azure AI services to access more LLM models.
  • BE AWARE: This quickstart guide is incomplete at this time and may stop abruptly.

In Azure:

Create an "Azure AI project" in Azure AI Studio

  1. If not done so already, create an Azure account.
  2. If not done so already, create a "Subscription" and "Resource Group" in the Azure portal
  3. Go to Azure AI Studio.
  4. From the Home page, press the "New project" button.
  5. A dialogue will appear to create a new project. Enter a name for the project.
  6. If you don't have an Azure AI hub or "hub", select "Create a new hub"
  7. Follow the prompts to create a new hub.
    • When selecting your "Location" for the hub, be aware Azure AI services availability differs per region. Certain models may not be available in certain regions.
    • When selecting "Connect Azure AI Services or Azure OpenAI", you can connect to an existing Azure OpenAI or Azure AI Services resource or create one. If you want access to all models from all providers (not just OpenAI), use Azure AI Services.

Once a project is created, you can deploy an LLM model.

  1. Select your project in Azure AI Studio.
  2. In the left-hand navigation panel, select "Deployments"
  3. Select the "Deploy model" button.
  4. Select the model you wish to use from the list.
    • Grooper's integration with Azure's Model Catalog uses either "chat completion" or "embeddings" models depending on the feature. To better narrow down your list of options, expand the "Inference tasks" dropdown and select "Chat completion" and/or "Embeddings."
    • Embeddings models are only required when using Clause Detection or the "Semantic" Document Quoting method in Grooper.
  5. Select "Confirm"
  6. Follow the onscreen prompts to deploy the selected model.


External resources