LLM Connector (Repository Option)

From Grooper Wiki

This article is about the current version of Grooper.

Note that some content may still need to be updated.

20252024

LLM Connector is a Repository Option that enables large language model (LLM) powered AI features for a Grooper Repository.

About

LLM Connectors enable Grooper's AI-based features, including AI Extract and AI Assistants. Adding an LLM Connector connects Grooper to large language models (LLMs) such as OpenAI's GPT models and models in Microsoft Azure's "Model Catalog" (including Azure OpenAI models). An LLM Connector is enabled by adding it with the Grooper Root's "Options" editor.

  • LLM Connector is a "Repository Option" in Grooper. Repository Options enable optional features in Grooper. This means enabling LLM connectivity is entirely up to you and your organization. Grooper is enhanced by AI features enabled by an LLM Connector but can operate without it.


Once added to a Grooper Repository, LLM Connector is configured by adding an "LLM Provider". The LLM Provider connects Grooper to service providers that offer LLMs, such as OpenAI, Microsoft Azure, and other providers that use OpenAI's API standard.

There are currently 3 "LLM Provider" options:

  • OpenAI - This provider connects Grooper to LLMs offered by the OpenAI API or compatible APIs.
    • Compatible APIs must use "chat/completions" and "embeddings" endpoints similar to OpenAI API to interoperate with Grooper's LLM features.
  • Azure - This provider connects Grooper to LLMs offered by Microsoft Azure in their "Model Catalog" (including Azure OpenAI models).
  • GCS - GCS (Grooper Cloud Services) is a prototype services for LLMs offered by and through Grooper directly.
    • This provider is still under development. Most users may ignore this provider at this time.

LLM-enabled features

After an LLM Connector has been added to the Grooper Root, several LLM-enabled features are available in Grooper. This includes:

LLM-enabled extraction capabilities

  • Ask AI: An LLM-based "Value Extractor" specialized for natural language responses. This extractor returns results by passing the document's text and a natural language prompt to a chat completion service (a chatbot). The chatbot uses the document's text and the prompt to respond with an answer. The chatbot's answer is the extractor's result.
  • AI Schema Extractor: An LLM-based "Value Extractor" specialized for structured JSON responses. The AI Schema Extractor enables advanced, schema-driven data extraction from unstructured or semi-structured documents by leveraging generative AI. It is designed for scenarios where precise, reliable, and repeatable extraction of structured data is required, such as tables, line items, or multi-field records.
  • AI Extract: An LLM-based "Fill Method" for large-scale data extraction. Fill Methods are configured on Data Models and Data Sections. They perform a secondary extraction after the child Data Elements extractors and extract methods execute. Or, they can act as the primary extraction mechanism if no extractors or extract methods are configured. AI Extract extracts the Data Model using an LLM chat completion service. In many cases, the only configuration required is Data Elements added to the model. AI Extract fills in the Data Model based on what its Data Elements are named.
  • LLM-enabled Data Section Extract Methods
    • AI Collection Reader: An LLM-based Section Extract Method for multi-instance section extraction. The AI Collection Reader extends the capabilities of AI Section Reader to multi-instance Data Sections, which represent repeating records inside a document. Note that it is also possible to extract multi-instance Data Sections using the AI Extract fill method. The main difference is that AI Collection Reader is optimized for processing large multi-page documents which need to be processed in chunks to avoid exceeding the context length large language models (LLMs).
    • AI Section Reader: An LLM-based Section Extract Method for single-instance section extraction. The AI Section Reader provides advanced extraction of single-instance Data Sections from documents using generative AI, leveraging large language models (LLMs) such as OpenAI's GPT series. It is designed to handle complex, variable, or ambiguous document layouts where traditional extraction methods may be insufficient.
    • AI Transaction Detection: An LLM-based Section Extract method specialized for transaction sections found on certain kinds of documents. The AI Transaction Detection extract method enables Grooper to automatically segment documents into individual transactions — such as payroll reports, EOBs, or other document types which contain repeating data structures. It works by detecting consistent features (anchors) that mark the start of each transaction. It can also extract structured data from each detected transaction using generative AI, supporting both simple and highly complex document layouts.
    • Clause Detection: Clause Detection is designed to locate specified types of clauses in natural language documents and return them as section instances in Grooper. The user provides one or more sample clauses, then an embeddings model determines the portion of the document (or "chunk") most similar to the samples, and Grooper returns the paragraph(s) in the chunk. Data Sections using this Section Extract Method can then use AI Extract to return data using only the text-content of the detected paragraph.
  • AI Table Reader: An LLM-based Data Table Extract Method. The AI Table Reader enables advanced extraction of tabular data from documents using generative AI, powered by large language models (LLMs) such as OpenAI's GPT series. It is designed to interpret semi-structured or unstructured content and transform it into structured table instances, even when the table layout is ambiguous or not explicitly formatted.

LLM-enabled separation and classification capabilities

  • AI Separate: An LLM-based Separation Provider. Unlike traditional separation methods that rely on fixed rules, barcodes, or control sheets, AI Separate uses natural language understanding to evaluate the meaning and structure of page content. This enables robust separation even when documents lack consistent separators or when page layouts vary significantly.
  • LLM Classifier: An LLM-based Classify Method. Classification is performed by sending the document’s content and a list of candidate Document Types (with their descriptions) to the LLM. The model analyzes the text and selects the best match, enabling robust classification even for complex or variable documents.
  • Mark Attachments: The Mark Attachments activity assists document separation decisioning by attaching documents to "parent" documents (such as an Exhibit that should be attached to a legal document). Mark Attachments is configured by adding one or more "Attachment Rules". When an Attachment Rule's "Generative AI" option is enabled, an LLM will analyze if a document should be attached to the one before it or after it.

Other LLM-enabled capabilities

  • AI Assistants: AI Assistants open up the Chat page to Grooper users. From the Chat page, users can select an AI Assistant and start asking it questions about resources its connected to. These resources include document content in an AI Search index, databases connected with a Grooper Data Connection and web services that can be connected using a RAML definition.
  • AI Generator: AI Generators can be used in the Search page (see AI Search and the Search page below). AI Generators use an LLM prompt to create simple text-based documents (TXT, CSV, HTML, etc) from results in the Search page. This can include simple reports, contact lists, summaries and more.
  • AI Productivity Helpers: There are several productivity helpers that use an LLM to assist Grooper Designers. AI Helpers can help users writing regular expressions, entering AI Search queries, build Data Models, and more! A full list of helpers is found on the AI Productivity Helpers page.

LLM connection options

While there are two primary LLM Providers (OpenAI and Azure), there are several different providers and model types you can connect Grooper to using just these two providers.

OpenAI API

Grooper's LLM based features were primarily designed around OpenAI's models. The Grooper development team uses OpenAI models internally using the OpenAI API. Connecting to the OpenAI API is regarded as the "standard" way connecting Grooper to LLMs.

  • When connecting Grooper to the OpenAI API, you will need an API key. You can visit our OpenAI quickstart if you need instructions on setting up an OpenAI account and obtaining an API key.
  • Be aware! You must have a payment method in your OpenAI account to use LLM-based features (such as AI Extract) in Grooper. If you do not have a payment method, Grooper cannot return a list of models when configuring LLM features.


Connecting Grooper to the OpenAI API is simple:

  1. Go to the Grooper Root node.
  2. Open the "Options" editor.
  3. Add the "LLM Connector" option.
  4. Open the "Service Providers" editor.
  5. Add the "OpenAI" option.
  6. In the "API Key" property, enter your OpenAI API Key.
    • You do not need to adjust the "Authorization" property. It should be "Bearer". The OpenAI API uses bearer tokens to authenticate API calls.
  7. (Recommended) Turn on "Use System Messages" (change the property to "True).
    • When enabled, certain instructions Grooper generates and information handed to a model are sent as "system" messages instead of "user" messages. This helps an LLM distinguish between contextual information and input from a user. This is recommended for most OpenAI models, but may not be supported by all compatible services.
  8. Press "OK" buttons in each editor to confirm changes and press "Save" on the Grooper Root.

OpenAI compatible services

Grooper can connect to any LLM service that adheres to the OpenAI API standard using the "OpenAI" provider. Compatible APIs must use "chat/completions" and "embeddings" endpoints like OpenAI API to interoperate with Grooper's LLM features.

We have confirmed the following services will integrate using the OpenAI provider:

  • Various models using Groq
  • Various models using OpenRouter
  • Various models hosted locally using LMStudio
  • Various models hosted locally using Ollama

BE AWARE: While the OpenAI API is fully compatible with all LLM constructs in Grooper, these "OpenAI compatible" services may only have partial compatibility using the OpenAI provider.


Connecting Grooper to the OpenAI compatible APIs is slightly more involved:

  1. Go to the Grooper Root node.
  2. Open the "Options" editor.
  3. Add the "LLM Connector" option.
  4. Open the "Service Providers" editor.
  5. Add the "OpenAI" option.
  6. Enter the API service's endpoint URL in the "URL" property.
    • This property's value defaults to the OpenAI API's base URL. You must change this to connect to a different web service.
  7. Select the "Authorization" appropriate for your API. Grooper's supported authorization methods are:
    • None (Uncommon) - Choose this option if the API does not require any authorization.
    • Bearer - Choose this option if the API uses a bearer token to authorize web calls (Example: OpenAI API and Groq use the bearer method).
    • APIKey - Choose this option if the API uses an API key to authorize web calls (Example: Nvidia's NIM API uses the key method).
  8. In the "API Key" property, enter the API's key.
  9. (Recommended) Turn on "Use System Messages" (change the property to "True").
    • When enabled, certain instructions Grooper generates and information handed to a model are sent as "system" messages instead of "user" messages. This helps an LLM distinguish between contextual information and input from a user. This is recommended for most OpenAI models, but may not be supported by all compatible services.
  10. Press "OK" buttons in each editor to confirm changes and press "Save" on the Grooper Root.

Azure AI Foundry deployments

Grooper connects to Microsoft Azure OpenAI models and Azure AI Foundry (formerly Azure AI Studio) model deployments by adding an "Azure" provider to an LLM Connector. Each model must be deployed in Azure before Grooper can connect to it.

There are a plethora of models you can deploy in Azure AI Foundry. This includes:

  • Azure OpenAI models
  • MistralAI models
  • Meta's llama models
  • DeepSeek models
  • xAI's grok models


Once a model is deployed in Azure, a "Service Deployment" can be defined in Grooper. There are two types of Service Deployments:

  • Chat Service - This is for Azure OpenAI models' "chat/completion" operations and the "chat completion" models in Azure AI Foundry's Model Catalog. This is required for most LLM-based functionality in Grooper, including AI Extract, AI Assistants, separating documents with Auto Separate, and classifying documents with LLM Classifier.
    • When searching for compatible models in Azure's Model Catalog, narrow the "inference tasks" to "chat completions".
  • Embeddings Service - This is for Azure OpenAI models' "embeddings" operations and the "embeddings" models in Azure AI Foundry's Model Catalog. This is required when enabling "Vector Search" for an Indexing Behavior, when using the Clause Detection section extract method, or when using the "Semantic" Document Quoting method in Grooper.
    • When searching for compatible models in Azure's Model Catalog, narrow the "inference tasks" to "embeddings".


To connect Grooper to an Azure model deployment:

  1. Go to the Grooper Root node.
  2. Open the "Options" editor.
  3. Add the "LLM Connector" option.
  4. Open the "Service Providers" editor.
  5. Add an "Azure" provider.
  6. Select the "Deployments'" property and open its editor.
  7. Add a "Chat Service", "Embeddings Service", or both.
    • "Chat Services" are required for most LLM-based features in Grooper, such as AI Extract and AI Assistants. "Embeddings Services" are required when enabling "Vector Search" for an Indexing Behavior, when using the Clause Detection section extract method, or when using the "Semantic" Document Quoting method in Grooper.
  8. In the "Model Id" property, enter the model's name (Example: "gpt-35-turbo").
  9. In the "URL" property, enter the "Target URI" from Azure
    • For Azure OpenAI model deployments this will resemble:
      • https://{your-resource-name}.openai.azure.com/openai/deployments/{model-id}/chat/completions?api-version={api-version} for Chat Service deployments
      • https://{your-resource-name}.openai.azure.com/openai/deployments/{model-id}/embeddings?api-version={api-version} for Embeddings Service deployments
    • For other models deployed in Azure AI Foundry (formerly Azure AI Studio) this will resemble:
      • https://{model-id}.{your-region}.models.ai.azure.com/v1/chat/completions for Chat Service deployments
      • https://{model-id}.{your-region}.models.ai.azure.com/v1/embeddings for Chat Service deployments
  10. Set "Authorization" to the method appropriate for the model deployment in Azure.
    • How do I know which method to choose? In Azure, under "Keys and Endpoint", you'll typically see
      • "API Key" if the model uses API key authentication (Choose "ApiKey" in Grooper).
      • "Microsoft Entra ID" if the model supports token-based authentication via Microsoft Entra ID (Choose "Bearer" in Grooper).
    • Azure OpenAI supports both API Key and Microsoft Entra ID (formerly Azure AD) authentication. Azure AI Foundry (formerly Azure AI Studio) models often lean toward token-based (Bearer in Grooper) authentication.
  11. In the "API Key" property, enter the Key copied from Azure.
  12. (Recommended) Turn on "Use System Messages" (change the property to "True").
    • When enabled, certain instructions Grooper generates and information handed to a model are sent as "system" messages instead of "user" messages. This helps an LLM distinguish between contextual information and input from a user. This is recommended for most OpenAI models, but may not be supported by all compatible services.
  13. Press "OK" buttons in each editor to confirm changes and press "Save" on the Grooper Root.