LLM Connector (Repository Option): Difference between revisions

Latest revision as of 16:05, 18 March 2026

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

2024

LLM Connector is a Repository Option that enables large language model (LLM) powered AI features for a Grooper Repository.

About

LLM Connectors enable Grooper's AI-based features, including AI Extract and AI Assistants. Adding an LLM Connector connects Grooper to large language models (LLMs) such as OpenAI's GPT models and models in Microsoft Azure's "Model Catalog" (including Azure OpenAI models). An LLM Connector is enabled by adding it with the Grooper Root's "Options" editor.

LLM Connector is a "Repository Option" in Grooper. Repository Options enable optional features in Grooper. This means enabling LLM connectivity is entirely up to you and your organization. Grooper is enhanced by AI features enabled by an LLM Connector but can operate without it.

Once added to a Grooper Repository, LLM Connector is configured by adding an "LLM Provider". The LLM Provider connects Grooper to service providers that offer LLMs, such as OpenAI, Microsoft Azure, and other providers that use OpenAI's API standard.

There are currently 3 "LLM Provider" options:

OpenAI - This provider connects Grooper to LLMs offered by the OpenAI API or compatible APIs.
- Compatible APIs must use "chat/completions" and "embeddings" endpoints similar to OpenAI API to interoperate with Grooper's LLM features.
Azure - This provider connects Grooper to LLMs offered by Microsoft Azure in their "Model Catalog" (including Azure OpenAI models).
GCS - GCS (Grooper Cloud Services) is a prototype service for LLMs offered by and through Grooper directly.
- This provider is still under development. Users may ignore this provider at this time.

LLM-enabled features

After an LLM Connector has been added to the Grooper Root, several LLM-enabled features are available in Grooper. This includes:

LLM-enabled extraction capabilities

Ask AI – An LLM-based Value Extractor specialized for natural-language responses. This extractor sends the document’s text and a natural-language prompt to a chat completion service. The chatbot’s response becomes the extractor’s result.

AI Schema Extractor – An LLM-based Value Extractor specialized for structured JSON responses. The AI Schema Extractor enables advanced, schema-driven extraction from unstructured or semi-structured documents, supporting scenarios such as tables, line items, and multi-field records.

AI Extract – An LLM-based Fill Method for large-scale data extraction. Fill Methods are configured on Data Models and Data Sections. They perform a secondary extraction after child Data Elements extractors and extract methods execute, or they may act as the primary extraction mechanism when no other extractors are configured. AI Extract uses an LLM chat completion service to populate Data Models, often requiring only that Data Elements be defined.

LLM-enabled Data Section Extract Methods

AI Collection Reader – An LLM-based Section Extract Method for multi-instance section extraction. AI Collection Reader extends AI Section Reader for repeating records and is optimized for large, multi-page documents that must be processed in chunks to avoid exceeding LLM context limits.

AI Section Reader – An LLM-based Section Extract Method for single-instance section extraction. It enables advanced extraction from complex, variable, or ambiguous document layouts using generative AI.

AI Transaction Detection – An LLM-based Section Extract Method specialized for transaction-based documents such as payroll reports or EOBs. It automatically segments documents into individual transactions using detected anchors and can extract structured data from each transaction.

Clause Detection – Designed to locate specified clause types in natural-language documents. Users provide one or more sample clauses, and an embeddings model identifies the most similar document chunks. Detected sections can then use AI Extract to extract structured data from the clause text.

AI Table Reader – An LLM-based Data Table Extract Method that enables extraction of tabular data from semi-structured or unstructured documents, even when table layouts are ambiguous or inconsistent.

LLM-enabled separation and classification capabilities

AI Separate – An LLM-based Separation Provider that evaluates page meaning and structure rather than relying on fixed rules, barcodes, or control sheets.

LLM Classifier – An LLM-based Classify Method that sends document content and candidate Document Types (with descriptions) to an LLM, which selects the best match.

Mark Attachments – Assists document separation by attaching documents to parent documents. When the Generative AI option is enabled, an LLM determines whether a document should be attached to the preceding or following document.

Other LLM-enabled capabilities

AI Generator – Used on the Search page to generate text-based outputs (TXT, CSV, HTML, etc.) such as reports, summaries, or contact lists from search results.

AI Productivity Helpers – A collection of LLM-powered tools that assist Grooper Designers with tasks such as writing regular expressions, building Data Models, and composing AI Search queries.

VLM capabilities (experimental)

We are experimenting with Vision-Language Model (VLM) integration in Grooper. The following activities are available in current builds for experimentation but are not yet considered production-ready:

VLM Analyze – Analyzes pages or folders using a VLM and saves the response as structured JSON for downstream extraction. This JSON can be accessed by LLM-based constructs using the "JSON File" Quoting Method.

VLM OCR – Uses VLM models to recognize text from images, built on the VLM Analyze activity.

LLM-related properties and concepts

Parameters – Properties that adjust how an LLM generates responses (for example, Temperature controls randomness).

Quoting Method – Determines what content is provided to an LLM, such as full document text, partial text, layout data, extracted values, or combinations thereof.

Alignment – Controls how LLM-based results are highlighted and aligned in the Document Viewer.

Prompt Engineering – The practice of designing and refining prompts to obtain desired responses from an LLM.

LLM connection options

While there are two primary LLM Providers (OpenAI and Azure), there are several different providers and model types you can connect Grooper to using just these two providers.

OpenAI API

Grooper's LLM based features were primarily designed around OpenAI's models. The Grooper development team uses OpenAI models internally using the OpenAI API. Connecting to the OpenAI API is regarded as the "standard" way connecting Grooper to LLMs.

When connecting Grooper to the OpenAI API, you will need an API key. You can visit our OpenAI quickstart if you need instructions on setting up an OpenAI account and obtaining an API key.
Be aware! You must have a payment method in your OpenAI account to use LLM-based features (such as AI Extract) in Grooper. If you do not have a payment method, Grooper cannot return a list of models when configuring LLM features.

Connecting Grooper to the OpenAI API is simple:

Go to the Grooper Root node.
Open the "Options" editor.
Add the "LLM Connector" option.
Open the "Service Providers" editor.
Add the "OpenAI" option.
In the "API Key" property, enter your OpenAI API Key.
- You do not need to adjust the "Authorization" property. It should be "Bearer". The OpenAI API uses bearer tokens to authenticate API calls.
(Recommended) Turn on "Use System Messages" (change the property to "True).
- When enabled, certain instructions Grooper generates and information handed to a model are sent as "system" messages instead of "user" messages. This helps an LLM distinguish between contextual information and input from a user. This is recommended for most OpenAI models, but may not be supported by all compatible services.
Press "OK" buttons in each editor to confirm changes and press "Save" on the Grooper Root.

OpenAI compatible services

Grooper can connect to any LLM service that adheres to the OpenAI API standard using the "OpenAI" provider. Compatible APIs must use "chat/completions" and "embeddings" endpoints like OpenAI API to interoperate with Grooper's LLM features.

We have confirmed the following services will integrate using the OpenAI provider:

Various models using Groq
Various models using OpenRouter
Various models hosted locally using LMStudio
Various models hosted locally using Ollama

BE AWARE: While the OpenAI API is fully compatible with all LLM constructs in Grooper, these "OpenAI compatible" services may only have partial compatibility using the OpenAI provider.

Connecting Grooper to the OpenAI compatible APIs is slightly more involved:

Go to the Grooper Root node.
Open the "Options" editor.
Add the "LLM Connector" option.
Open the "Service Providers" editor.
Add the "OpenAI" option.
Enter the API service's endpoint URL in the "URL" property.
- This property's value defaults to the OpenAI API's base URL. You must change this to connect to a different web service.
Select the "Authorization" appropriate for your API. Grooper's supported authorization methods are:
- None (Uncommon) - Choose this option if the API does not require any authorization.
- Bearer - Choose this option if the API uses a bearer token to authorize web calls (Example: OpenAI API and Groq use the bearer method).
- APIKey - Choose this option if the API uses an API key to authorize web calls (Example: Nvidia's NIM API uses the key method).
In the "API Key" property, enter the API's key.
(Recommended) Turn on "Use System Messages" (change the property to "True").
- When enabled, certain instructions Grooper generates and information handed to a model are sent as "system" messages instead of "user" messages. This helps an LLM distinguish between contextual information and input from a user. This is recommended for most OpenAI models, but may not be supported by all compatible services.
Press "OK" buttons in each editor to confirm changes and press "Save" on the Grooper Root.

Azure AI Foundry deployments

Grooper connects to Microsoft Azure OpenAI models and Azure AI Foundry (formerly Azure AI Studio) model deployments by adding an "Azure" provider to an LLM Connector. Each model must be deployed in Azure before Grooper can connect to it.

There are a plethora of models you can deploy in Azure AI Foundry. This includes:

Azure OpenAI models
MistralAI models
Meta's llama models
DeepSeek models
xAI's grok models

Once a model is deployed in Azure, a "Service Deployment" can be defined in Grooper. There are two types of Service Deployments:

Chat Service - This is for Azure OpenAI models' "chat/completion" operations and the "chat completion" models in Azure AI Foundry's Model Catalog. This is required for most LLM-based functionality in Grooper, including AI Extract, AI Assistants, separating documents with Auto Separate, and classifying documents with LLM Classifier.
- When searching for compatible models in Azure's Model Catalog, narrow the "inference tasks" to "chat completions".
Embeddings Service - This is for Azure OpenAI models' "embeddings" operations and the "embeddings" models in Azure AI Foundry's Model Catalog. This is required when enabling "Vector Search" for an Indexing Behavior, when using the Clause Detection section extract method, or when using the "Semantic" Document Quoting method in Grooper.
- When searching for compatible models in Azure's Model Catalog, narrow the "inference tasks" to "embeddings".

To connect Grooper to an Azure model deployment:

Go to the Grooper Root node.
Open the "Options" editor.
Add the "LLM Connector" option.
Open the "Service Providers" editor.
Add an "Azure" provider.
Select the "Deployments'" property and open its editor.
Add a "Chat Service", "Embeddings Service", or both.
- "Chat Services" are required for most LLM-based features in Grooper, such as AI Extract and AI Assistants. "Embeddings Services" are required when enabling "Vector Search" for an Indexing Behavior, when using the Clause Detection section extract method, or when using the "Semantic" Document Quoting method in Grooper.
In the "Model Id" property, enter the model's name (Example: "gpt-35-turbo").
In the "URL" property, enter the "Target URI" from Azure
- For Azure OpenAI model deployments this will resemble:
  - https://{your-resource-name}.openai.azure.com/openai/deployments/{model-id}/chat/completions?api-version={api-version} for Chat Service deployments
  - https://{your-resource-name}.openai.azure.com/openai/deployments/{model-id}/embeddings?api-version={api-version} for Embeddings Service deployments
- For other models deployed in Azure AI Foundry (formerly Azure AI Studio) this will resemble:
  - https://{model-id}.{your-region}.models.ai.azure.com/v1/chat/completions for Chat Service deployments
  - https://{model-id}.{your-region}.models.ai.azure.com/v1/embeddings for Chat Service deployments
Set "Authorization" to the method appropriate for the model deployment in Azure.
- How do I know which method to choose? In Azure, under "Keys and Endpoint", you'll typically see
  - "API Key" if the model uses API key authentication (Choose "ApiKey" in Grooper).
  - "Microsoft Entra ID" if the model supports token-based authentication via Microsoft Entra ID (Choose "Bearer" in Grooper).
- Azure OpenAI supports both API Key and Microsoft Entra ID (formerly Azure AD) authentication. Azure AI Foundry (formerly Azure AI Studio) models often lean toward token-based (Bearer in Grooper) authentication.
In the "API Key" property, enter the Key copied from Azure.
(Recommended) Turn on "Use System Messages" (change the property to "True").
- When enabled, certain instructions Grooper generates and information handed to a model are sent as "system" messages instead of "user" messages. This helps an LLM distinguish between contextual information and input from a user. This is recommended for most OpenAI models, but may not be supported by all compatible services.
Press "OK" buttons in each editor to confirm changes and press "Save" on the Grooper Root.

@@ Line 13: / Line 13: @@
 There are currently 3 "LLM Provider" options:
 * '''OpenAI''' - This provider connects Grooper to LLMs offered by the OpenAI API or compatible APIs.
-*:*<li class="fyi-bullet"> Compatible APIs must use "chat/completions" and "embeddings" endpoints similar to OpenAI API to interoperate with Grooper's LLM features.
+**<li class="fyi-bullet"> Compatible APIs must use "chat/completions" and "embeddings" endpoints similar to OpenAI API to interoperate with Grooper's LLM features.
 * '''Azure''' - This provider connects Grooper to LLMs offered by Microsoft Azure in their "Model Catalog" (including Azure OpenAI models).
-* '''GCS''' - GCS (Grooper Cloud Services) is a prototype services for LLMs offered by and through Grooper directly. This is still under development at this time. Most users may ignore this provider at this time.
+* '''GCS''' - GCS (Grooper Cloud Services) is a prototype service for LLMs offered by and through Grooper directly.
+**<li class="attn-bullet">''This provider is still under development. Users may ignore this provider at this time.''
+== LLM-enabled features ==
+After an LLM Connector has been added to the Grooper Root, several LLM-enabled features are available in Grooper. This includes:
+{{#lst:Grooper and AI|llm-constructs}}
 ==LLM connection options ==
+<section begin="LLM connection options" />
 While there are two primary LLM Providers (OpenAI and Azure), there are several different providers and model types you can connect Grooper to using just these two providers.
-=== OpenAI API ===
+==== OpenAI API ====
 Grooper's LLM based features were primarily designed around OpenAI's models. The Grooper development team uses OpenAI models internally using the OpenAI API. Connecting to the OpenAI API is regarded as the "standard" way connecting Grooper to LLMs.
@@ Line 35: / Line 41: @@
 # Add the "OpenAI" option.
 # In the "API Key" property, enter your OpenAI API Key.
-#:*<li class="fyi-bullet"> Technically OpenAI's "API keys" are bearer tokens. You '''''do not''''' need to adjust the "Authorization" property. It should be "Bearer".
+#:*<li class="fyi-bullet"> You '''''do not''''' need to adjust the "Authorization" property. It should be "Bearer". The OpenAI API uses bearer tokens to authenticate API calls.
 # (Recommended) Turn on "Use System Messages" (change the property to "True).
 #:*<li class="fyi-bullet"> When enabled, certain instructions Grooper generates and information handed to a model are sent as "system" messages instead of "user" messages. This helps an LLM distinguish between contextual information and input from a user. This is recommended for most OpenAI models, but may not be supported by all compatible services.
 # Press "OK" buttons in each editor to confirm changes and press "Save" on the Grooper Root.
-=== OpenAI compatible services ===
+==== OpenAI compatible services ====
-Grooper can connect to any LLM service that adheres to the OpenAI API standard using the "OpenAI" provider. Compatible APIs must use "chat/completions" and "embeddings" endpoints similar to OpenAI API to interoperate with Grooper's LLM features.
+Grooper can connect to any LLM service that adheres to the OpenAI API standard using the "OpenAI" provider. Compatible APIs must use "chat/completions" and "embeddings" endpoints like OpenAI API to interoperate with Grooper's LLM features.
 We have confirmed the following services will integrate using the OpenAI provider:
@@ Line 65: / Line 71: @@
 #* ''Bearer'' - Choose this option if the API uses a bearer token to authorize web calls (Example: OpenAI API and Groq use the bearer method).
 #* ''APIKey'' - Choose this option if the API uses an API key to authorize web calls (Example: Nvidia's NIM API uses the key method).
-# In the "API Key" property, enter API key.
+# In the "API Key" property, enter the API's key.
 # (Recommended) Turn on "Use System Messages" (change the property to "True").
 #:*<li class="fyi-bullet"> When enabled, certain instructions Grooper generates and information handed to a model are sent as "system" messages instead of "user" messages. This helps an LLM distinguish between contextual information and input from a user. This is recommended for most OpenAI models, but may not be supported by all compatible services.
 # Press "OK" buttons in each editor to confirm changes and press "Save" on the Grooper Root.
-=== Azure deployments ===
+==== Azure AI Foundry deployments ====
+Grooper connects to Microsoft Azure OpenAI models and Azure AI Foundry (formerly Azure AI Studio) model deployments by adding an "Azure" provider to an LLM Connector. Each model must be deployed in Azure before Grooper can connect to it.
+There are a plethora of models you can deploy in Azure AI Foundry. This includes:
+* Azure OpenAI models
+* MistralAI models
+* Meta's llama models
+* DeepSeek models
+* xAI's grok models
-Grooper connects to Microsoft Azure OpenAI models and Azure AI Foundry (formerly Azure AI Studio) model deployments by adding an "Azure" provider to an LLM Connector. Each model must be deployed in Azure before Grooper can connect to it. Once a model is deployed in Azure, a "Service Deployment" can be defined in Grooper. There are two types of Service Deployments:
+Once a model is deployed in Azure, a "Service Deployment" can be defined in Grooper. There are two types of Service Deployments:
 * ''Chat Service'' - This is for Azure OpenAI models' "chat/completion" operations and the "chat completion" models in Azure AI Foundry's Model Catalog. This is required for most LLM-based functionality in Grooper, including AI Extract, AI Assistants, separating documents with Auto Separate, and classifying documents with LLM Classifier.
+*:*<li class="fyi-bullet"> When searching for compatible models in Azure's Model Catalog, narrow the "inference tasks" to "chat completions".
 * ''Embeddings Service'' - This is for Azure OpenAI models' "embeddings" operations and the "embeddings" models in Azure AI Foundry's Model Catalog. This is required when enabling "Vector Search" for an [[Indexing Behavior]], when using the [[Clause Detection]] section extract method, or when using the "Semantic" [[Document Quoting]] method in Grooper.
+*:*<li class="fyi-bullet"> When searching for compatible models in Azure's Model Catalog, narrow the "inference tasks" to "embeddings".
@@ Line 103: / Line 121: @@
 #:*<li class="fyi-bullet"> When enabled, certain instructions Grooper generates and information handed to a model are sent as "system" messages instead of "user" messages. This helps an LLM distinguish between contextual information and input from a user. This is recommended for most OpenAI models, but may not be supported by all compatible services.
 # Press "OK" buttons in each editor to confirm changes and press "Save" on the Grooper Root.
+<section end="LLM connection options" />