Grooper on Azure Marketplace

Want to get started using Grooper without setting up a bunch of infrastructure on your own? Check out Grooper on the Azure Marketplace!

Create a Grooper VM with everything already installed and ready to go.
Users can choose a 60 day free trial or bring their own license key.
The Grooper VM comes with several example Grooper Projects and Batches to explore.

Getting Started: Create a Grooper VM from Azure Marketplace

This tutorial assumes you already have an Azure account and capabilities to create resources, such as virtual machines.

Go to the Microsoft Azure portal (portal.azure.com)
Search for "Marketplace".
Search the Marketplace for "Grooper".
Select "Grooper".
Using the "Subscription" dropdown, select the Azure subscription you wish to use to create the VM.
Using the "Plan" dropdown choose one of the following:
- 60-Day Free Trial: Choose this if you do not have a Grooper license key and simply wish to evaluate the product for a trial period.
- BYOL: Choose this if you want to "bring your own license". If you already have a Grooper license key, you can enter it into the root of the Grooper Repository (We'll show you how in subsequent steps).
Press the "Create" button to start configuring the VM.
Select or create a new resource group for the VM.
Configure the "Instance details" as needed.
- You must enter a name for the VM in the "Virtual machine name" field.
- The "Image" field will be preselected based on the plan selected in step 6. Do not adjust this. Doing so will install a different VM without Grooper pre-installed.
- The "Size" field will determine the VM's processing power and memory. We recommend at least the default "Standard_B8ms" which has 8vcpus and 32 GB memory. Prices for these VMs are set by Azure.
Under "Administrator account" set logon credentials for the VM's admin.
Press "Next: Disks" to continue.
Configure disk settings as needed and select "Next: Networking" to continue.
Configure networking settings as needed and select "Next: Management" to continue.
Configure management settings as needed and select "Next: Monitoring" to continue.
Configure monitoring settings as needed and select "Next: Advanced" to continue.
Configure advanced settings as needed and select "Next: Tags" to continue.
Add resource tags if desired and select "Next: Review + create" to continue.
Review the VM's configuration and press "Create" to create it.
It will take some time for Azure to deploy the VM. Azure will inform you of its progress as it is created.
When the deployment is complete, you can select the "Go to resource" button to navigate to the VM.
From the VM's "Overview" panel, you can start, stop and restart the VM.
Go to the "Connect" panel to connect to the VM via RDP.
Press the "Download RDP file" button to download an RDP file.
Open the downloaded RDP file to connect to the VM.
Enter the admin credentials to login.
Once logged in, you can imeidiately open the Grooper app.
Open Microsoft Edge.
Go to https://localhost/Grooper to open Grooper.

What comes on the Grooper VM?

Everything needed to give Grooper a test run already comes installed and configured. This includes:

Grooper
Grooper Web Client
SQL Express (needed to host the Grooper Repository's database)
Internet Information Services (needed to host the Grooper web app)
A Grooper Repository
Demo Grooper Projects and Batches

Navigating Grooper for the first time

If you're new to Grooper, you need to know about the main navigation pages:

Home - The landing page for a Grooper install. Users can review recent entries in the Grooper Repository's log and navigate to other pages from here.
Design - Provides a comprehensive user interface for developing, configuring, and testing Grooper Projects and other configuration nodes (Content Models, Data Models, Batch Processes, etc) in a Grooper Repository.
Batches - Provides a user interface for managing Batches (the primary container for documents in Grooper) actively being processed in production. Users can pause Batch processing, resume processing, reset steps, update a Batch's Batch Process. open paused Batches to inspect its contents, and execute ready Review tasks.
Tasks - Provides a user interface for filtering and performing Review tasks. This is more streamlined than the Batches page. It is well suited for end-users who just execute Review tasks in bulk.
Imports - Provides a user interface to manage importing files into new Grooper Batches. Files are importing by setting up "Import Jobs". Users can create ad-hoc Import Jobs from the Imports page and manage Import Jobs automated by Import Watcher services.
Jobs - Provides a user interface for viewing, filtering, and managing Processing Jobs in the Grooper Repository. Processing Jobs are created by Activity Processing services whenever Batch content runs through a Batch Process. As each Batch Process Step is executed, a Processing Job is created. The Processing Job executes the Activity assigned to the step with a set of tasks for all items in scope (either the Batch, Batch Folders at a set level, or Batch Pages).
Stats - Provides a user interface for managing and viewing saved Stats Queries. This allows users to view session statistics for Batches and their processing history.
Search - (Enabled after adding AI Search to the Grooper Repository) Provides a user interface for searching documents in a Grooper Repository. Documents must be added to an AI Search index before they can be searched.
Chat - (Enabled after adding an LLM Connector to the Grooper Repository) Allows users to chat with AI Assistants from the Grooper UI. Users can select AI Assistants, start chat sessions with them, continue existing chat sessions and view documents linked in footnotes in a Document Viewer.

The Home Page

The Design Page

The Batches Page

The Tasks Page

The Imports Page

The Jobs Page

The Stats Page

The Search Page

The Chat Page

Demo Projects and Batches

There are several example Projects and Batches that are already present on the Grooper VM. You can find these Batches and Projects in the Grooper Repository from the Design page.

They are organized into three folders:

Simple Functionality
Use Cases
Miscellaneous Demos

To find these resources:

Open Grooper in a browser (http://localhost/Grooper)
Go to the Design page.
Expand the "Projects" folder in the Node Tree.
Expand the "Simple Functionality", "Use Cases" and "Miscellaneous Demos" folder to view the example Projects.
Expand the "Batches" folder in the Node Tree.
Expand the "Test" subfolder.
Expand the "Simple Functionality", "Use Cases" and "Miscellaneous Demos" folder to view the example Batches.

More information about these Projects can be found below.

Simple Functionality

These are Projects that demonstrate one or more Grooper features.

Each use-case based Project has three different versions. Each is tailored for different ingestion scenarios. There is one Project for:

Scanned documents
Documents imported from a file system
Documents imported from an email source

Each Project will contain one or more “READ ME” resource files with notes and instructions on how to configure nodes. Some of these Projects will also contain accompanying Grooper Batches in the Test branch that can be used to help test the Project.

Convert Image Based Documents to a Text Searchable PDF

This resources in this Project can be used to do something very basic that many Grooper users want to do: (1) Get quality OCR results from a document’s page images and (2) create a text searchable PDF version of the document.

If you don’t have an image-based PDF or TIF of your own, you can test using the ones we’ve provided below.

Extract Data with AI Extract

This Project shows of Grooper’s AI Extract functionality. AI Extract uses a large language model to extract data without the need for traditional extractor setup in Grooper. The Data Model used in this Project has some generic fields that could be extracted from many different kinds of documents. Supply your own documents to see what the Data Model collects! Then, try adding fields relevant to the document to get what you want from the documents.

Ingest and Index

The resources in this Project introduce users to adding documents to a Grooper search index. Once added to an index, users can take advantage of the Search Page, Grooper’s powerful document search and retrieval tool.

There are three different versions of this Project. Each is tailored for different ingestion scenarios. There is one Project for:

Scanned documents
Documents imported from a file system
Documents imported from an email source

Conditioning Emails

Importing documents from an email source is a common starting point in document processing. In Grooper, email files need to be conditioned before they can be further processed, depending on how the document is contained within the email (attached to the email, contained in a ZIP attached to the email, contained in the body, etc.). This Project contains a Batch Process designed to handle numerous different email import scenarios. Use the accompanying Batch to test the steps in the Batch Process to get a better understanding of how Grooper can handle multiple different email scenarios in a single Batch Process.

Making an AI Assistant from HTTP Import

AI Assistants are Grooper’s powerful conversational AIs. They can answer domain specific questions by accessing a specified set of knowledge resources. Websites are a wealth of knowledge that can can be added to the AI Assistant’s expertise by importing them into Batches using HTTP Import. This Project introduces you to AI Assistants and HTTP Import by instructing you how to import webpages, processing them as documents in a Grooper Batch, adding them to a search index, and hooking that index up to an AI Assistant.

Use Cases

There are three sets of use-case based Projects used to demonstrate one or more Grooper features. These use cases include:

Invoice processing
Oil and gas lease processing
Student transcript processing

These Projects use Grooper’s AI-based document processing for data extraction and more.

Each use-case based Project has three different versions. Each is tailored for different ingestion scenarios. There is one Project for:

Scanned documents
Documents imported from a file system
Documents imported from an email source

Each Project will contain one or more “READ ME” resource files with notes and instructions on how to configure nodes. These Projects also contain accompanying Grooper Batches in the Test branch that can be used to help test the Project. If you'd like to download the PDFs used in these Batches directly, use the links below.

Miscellaneous Demos

These are miscellaneous Projects used to demonstrate one or more Grooper features. The Project will contain a “READ ME” resource file with notes and instructions on how to configure nodes. These Projects have accompanying Grooper Batches that can be accessed from the Test branch.

Travel Expenses

This Project was mainly created to show off Grooper’s Search Page and Chat Page. Using this Project and the resources it contains, users can process a large number of receipts from a publicly available document set. After running these documents through the Batch Process and adding them to a search index, users can use the Search Page to query the processed documents in Grooper. From the Search Page, users can search the full text of each document and/or query by filtering extracted data values. Users may also build an AI Assistant to ask questions about this document set from the Grooper Chat Page.

Receipts document set

OSCN Legal Documents

This Project was created to show off Grooper’s AI Assistant capabilties. The documents used for this demo have already been processed and can be uploaded with the accompanying Grooper Batch ZIP. After creating a search index and adding these documents to the index, users can ask questions about the documents from an AI Assistant by adding a “Search Index” reference. This AI Assistant also demonstrates adding a “Web Service” reference. This Web Service reference uses a RAML definition that describes how to query dockets in the Oklahoma State Courts Network. You can ask the AI Assistant questions and will perform a search for case information using various oscn.net web services.

Quick Guides

Below you will find various guides to help set up resources required to execute the demo Projects in the Grooper VM.

Create an Activity Processing service

Activity Processing services are required to automate task processing. The Activity Processing service will pick up Processing Jobs for steps in a Batch's Batch Process as they become ready. One Processing Job is created for each step and the Activity Processing service executes the step's activity for each task in scope (the Batch, Batch Folders at a certain level or Batch Pages depending on the step's configuration).

Create an Import Watcher service

Import Watcher services are required to import files into a Batch in Grooper. The Import Watcher service executes "Import Jobs" that are either created by a user from the Import Page or from schedules configured in the Import Watcher itself.

Add an LLM Connector to the Grooper Repository

LLM Connectors are required to take advantage of Grooper's LLM-based features, such as AI Extract and AI Assistants. LLM Connector is a Repository Option added to the Root of a Grooper Repository. It connects Grooper to LLM chat and embeddings services, such as the OpenAI API.

This section is transcluded from the LLM Connector article.

While there are two primary LLM Providers (OpenAI and Azure), there are several different providers and model types you can connect Grooper to using just these two providers.

OpenAI API

Grooper's LLM based features were primarily designed around OpenAI's models. The Grooper development team uses OpenAI models internally using the OpenAI API. Connecting to the OpenAI API is regarded as the "standard" way connecting Grooper to LLMs.

When connecting Grooper to the OpenAI API, you will need an API key. You can visit our OpenAI quickstart if you need instructions on setting up an OpenAI account and obtaining an API key.
Be aware! You must have a payment method in your OpenAI account to use LLM-based features (such as AI Extract) in Grooper. If you do not have a payment method, Grooper cannot return a list of models when configuring LLM features.

Connecting Grooper to the OpenAI API is simple:

Go to the Grooper Root node.
Open the "Options" editor.
Add the "LLM Connector" option.
Open the "Service Providers" editor.
Add the "OpenAI" option.
In the "API Key" property, enter your OpenAI API Key.
- You do not need to adjust the "Authorization" property. It should be "Bearer". The OpenAI API uses bearer tokens to authenticate API calls.
(Recommended) Turn on "Use System Messages" (change the property to "True).
- When enabled, certain instructions Grooper generates and information handed to a model are sent as "system" messages instead of "user" messages. This helps an LLM distinguish between contextual information and input from a user. This is recommended for most OpenAI models, but may not be supported by all compatible services.
Press "OK" buttons in each editor to confirm changes and press "Save" on the Grooper Root.

OpenAI compatible services

Grooper can connect to any LLM service that adheres to the OpenAI API standard using the "OpenAI" provider. Compatible APIs must use "chat/completions" and "embeddings" endpoints like OpenAI API to interoperate with Grooper's LLM features.

We have confirmed the following services will integrate using the OpenAI provider:

Various models using Groq
Various models using OpenRouter
Various models hosted locally using LMStudio
Various models hosted locally using Ollama

BE AWARE: While the OpenAI API is fully compatible with all LLM constructs in Grooper, these "OpenAI compatible" services may only have partial compatibility using the OpenAI provider.

Connecting Grooper to the OpenAI compatible APIs is slightly more involved:

Go to the Grooper Root node.
Open the "Options" editor.
Add the "LLM Connector" option.
Open the "Service Providers" editor.
Add the "OpenAI" option.
Enter the API service's endpoint URL in the "URL" property.
- This property's value defaults to the OpenAI API's base URL. You must change this to connect to a different web service.
Select the "Authorization" appropriate for your API. Grooper's supported authorization methods are:
- None (Uncommon) - Choose this option if the API does not require any authorization.
- Bearer - Choose this option if the API uses a bearer token to authorize web calls (Example: OpenAI API and Groq use the bearer method).
- APIKey - Choose this option if the API uses an API key to authorize web calls (Example: Nvidia's NIM API uses the key method).
In the "API Key" property, enter the API's key.
(Recommended) Turn on "Use System Messages" (change the property to "True").
- When enabled, certain instructions Grooper generates and information handed to a model are sent as "system" messages instead of "user" messages. This helps an LLM distinguish between contextual information and input from a user. This is recommended for most OpenAI models, but may not be supported by all compatible services.
Press "OK" buttons in each editor to confirm changes and press "Save" on the Grooper Root.

Azure AI Foundry deployments

Grooper connects to Microsoft Azure OpenAI models and Azure AI Foundry (formerly Azure AI Studio) model deployments by adding an "Azure" provider to an LLM Connector. Each model must be deployed in Azure before Grooper can connect to it.

There are a plethora of models you can deploy in Azure AI Foundry. This includes:

Azure OpenAI models
MistralAI models
Meta's llama models
DeepSeek models
xAI's grok models

Once a model is deployed in Azure, a "Service Deployment" can be defined in Grooper. There are two types of Service Deployments:

Chat Service - This is for Azure OpenAI models' "chat/completion" operations and the "chat completion" models in Azure AI Foundry's Model Catalog. This is required for most LLM-based functionality in Grooper, including AI Extract, AI Assistants, separating documents with Auto Separate, and classifying documents with LLM Classifier.
- When searching for compatible models in Azure's Model Catalog, narrow the "inference tasks" to "chat completions".
Embeddings Service - This is for Azure OpenAI models' "embeddings" operations and the "embeddings" models in Azure AI Foundry's Model Catalog. This is required when enabling "Vector Search" for an Indexing Behavior, when using the Clause Detection section extract method, or when using the "Semantic" Document Quoting method in Grooper.
- When searching for compatible models in Azure's Model Catalog, narrow the "inference tasks" to "embeddings".

To connect Grooper to an Azure model deployment:

Go to the Grooper Root node.
Open the "Options" editor.
Add the "LLM Connector" option.
Open the "Service Providers" editor.
Add an "Azure" provider.
Select the "Deployments'" property and open its editor.
Add a "Chat Service", "Embeddings Service", or both.
- "Chat Services" are required for most LLM-based features in Grooper, such as AI Extract and AI Assistants. "Embeddings Services" are required when enabling "Vector Search" for an Indexing Behavior, when using the Clause Detection section extract method, or when using the "Semantic" Document Quoting method in Grooper.
In the "Model Id" property, enter the model's name (Example: "gpt-35-turbo").
In the "URL" property, enter the "Target URI" from Azure
- For Azure OpenAI model deployments this will resemble:
  - https://{your-resource-name}.openai.azure.com/openai/deployments/{model-id}/chat/completions?api-version={api-version} for Chat Service deployments
  - https://{your-resource-name}.openai.azure.com/openai/deployments/{model-id}/embeddings?api-version={api-version} for Embeddings Service deployments
- For other models deployed in Azure AI Foundry (formerly Azure AI Studio) this will resemble:
  - https://{model-id}.{your-region}.models.ai.azure.com/v1/chat/completions for Chat Service deployments
  - https://{model-id}.{your-region}.models.ai.azure.com/v1/embeddings for Chat Service deployments
Set "Authorization" to the method appropriate for the model deployment in Azure.
- How do I know which method to choose? In Azure, under "Keys and Endpoint", you'll typically see
  - "API Key" if the model uses API key authentication (Choose "ApiKey" in Grooper).
  - "Microsoft Entra ID" if the model supports token-based authentication via Microsoft Entra ID (Choose "Bearer" in Grooper).
- Azure OpenAI supports both API Key and Microsoft Entra ID (formerly Azure AD) authentication. Azure AI Foundry (formerly Azure AI Studio) models often lean toward token-based (Bearer in Grooper) authentication.
In the "API Key" property, enter the Key copied from Azure.
(Recommended) Turn on "Use System Messages" (change the property to "True").
- When enabled, certain instructions Grooper generates and information handed to a model are sent as "system" messages instead of "user" messages. This helps an LLM distinguish between contextual information and input from a user. This is recommended for most OpenAI models, but may not be supported by all compatible services.
Press "OK" buttons in each editor to confirm changes and press "Save" on the Grooper Root.

Getting started with AI Search

"AI Search" refers to the suite of Grooper features that allow users to add documents and their data to a search index. Once in a search index, users can search for documents from the Search Page and add indexed documents to an AI Assistant's base of knowledge resources.

Create an Azure AI Search service in Azure

Grooper uses Azure AI Search for the infrastructure that allows us to index and search documents. We integrate with Azure AI Search because:

It is fast.
It is easily scalable.
It has robust querying capabilities for both full text and metadata (including fields extracted from a document's Data Model).

To create an Azure AI Search service in Azure:

If not done so already, create an Azure account.
Go to your Azure portal
If not done so already, create a "Subscription" and "Resource Group."
Click the "Create a resource" button
Search for "Azure AI Search"
Find "Azure AI Search" in the list click the "Create" button.
Follow the prompts to create the Azure AI Search resource.
- The pricing tier defaults to "Standard". There is a "Free" option if you're just wanting to try this feature out.
Go to the resource.
In the "Essentials" panel, copy the "Url" value. You will need this in Grooper.
In the left-hand navigation panel, expand "Settings" and select "Keys"
Copy the "Primary admin key" or "Secondary admin key." You will need this in Grooper.

Add the AI Search option to the Grooper Repository

With an Azure AI Search service set up in Azure, you can now connect Grooper to it by adding the Grooper AI Search option to the Grooper Repository.

From the Design page, go to the Grooper Root node.
Select the Options property and open its editor (Press the "..." button).
Add the AI Search option.
Paste the AI Search service's URL in the URL property.
Paste the AI Search service's API key in the API Key property.
Full documentation on Grooper's AI Search capabilities can be found in the AI Search article.

Add an Indexing Behavior to a Content Type

After adding the AI Search option to the Grooper Repository, the next step is configuring an Indexing Behavior.

Indexing Behaviors are added to a Content Type in Grooper (usually a Content Model).

Select the Content Type you wish to add the Indexing Behavior to.
Select the "Behaviors" property and open its editor (Press the "..." button).
Press the Add button (add_circle) to add a new Behavior.
Select "Indexing Behavior" from the dropdown list.
With the Indexing Behavior selected, configure its properties as needed. Key properties are detailed below. Visit the Indexing Behavior article for more information.
- "Name" - Grooper will automatically name the search index based on the Content Types name, formatted to Azure's specifications. You can adjust it if you wish.
- "Included Elements" - If you want to create metadata fields in the search index using elements from the Content Type's Data Model, select them here.
- "Built In Fields" - These are additional metadata fields added to the search index. All are selected by default. However, "Content" should be selected to include the document's full text in the index.
- "Vector Search" - Enable this property if you wish to activate semantic (embeddings-based) indexing. This will allow AI Assistants to use natural language searches from the Chat Page.
- "Auto Index" - Enable this property when using an Indexing Service to add new documents to the search index automatically.
When finished editing the Indexing Behavior, press "OK".
Press "OK" to save changes to the "Behaviors" editor.
Press the "Save" button (save) to save changes to the Content Type.

Create the search index

Once an Indexing Behavior is added, you need to create the search index. To do this, you will execute the "Create Search Index" command.

To execute the "Create Search Index" command:

Right-click the Content Type where the Indexing Behavior is configured.
Select "Search" then "Create Search Index".
Press the "Execute" button to create the search index.

Grooper will use the Azure AI Search API to create the search index. The search index lives in Azure, but is entirely accessible in Grooper. You use Grooper to add documents and data to the search index. You use the Grooper Search page to retrieve documents added to the search index.

⚠

In most cases you will only need to create the search index once. Unless making major changes to the index. Then you will need to delete it, recreate it, and reindex your documents.

Big changes to a search index require the index to be recreated. This includes anything that changes the structure of the index, such as adding metadata fields. When this happens you will need to delete the search index (execute the "Search > Delete Index" command), then recreate it (execute the "Search > Create Search Index" command), and reindex all documents (most easily done with the "Search > Submit Indexing Job" command).

Index documents

Documents will only be indexed if their Content Type is or inherits from the Content Type where the Indexing Behavior is configured.

There are four ways to index documents:

One-by-one: To index a single document, right click the document and execute the "Add to Index" command.
In a Batch Process: Add an Execute step with an "Add to Index" command to index documents as part of a Batch Process.
With the "Submit Indexing Job" command: This command is executed from the Content Type with the Indexing Behavior. It will analyze Content Types in scope and update the search index with new, changed and removed documents. You can find this command by right-clicking the Content Type.
With the "Indexing Service": This is a Grooper service that runs in the background of your Grooper Repository. It periodically polls the Grooper Repository to determine if the search index needs to be updated. It is installed from GCC.