What's New in Grooper 2025: Difference between revisions

Revision as of 13:50, 10 June 2026

Grooper version 2025 is here!

Learn about new and improved features below.
When available, follow any links to extended articles on a topic.
Need help installing Grooper? Check out our Install and Setup article.

AI Assistants and the Chat Page

What is an AI Assistant?

AI Assistants are Grooper's conversational AI personas. They define a role to be used in Grooper Chat sessions. Each AI Assistant has access to a collection of user-defined resources.

Normally, conversational AIs ("chatbots") only have access to whatever they were trained on. User-defined resources extend the AI Assistant's ability to answer questions on domain-specific information contained in documents, databases, or retrieved from a web service.

AI Assistants use retrieval-augmented generation (RAG) — a technique that queries a knowledge base outside the LLM's core training data before generating a response. This extends the LLM's capabilities to a specific domain (like a corpus of documents) without the need to retrain the model.

FYI

AI Assistants are a replacement for the "AI Analyst" object. AI Analysts were Grooper's first attempt at a conversational AI. AI Assistants are a substantial improvement. They are able to access document content and data quicker, answer questions across larger document sets (even an entire Grooper Repository), and have access to more knowledge resources, such as information obtained from a database.

How does a user interact with an AI Assistant?

In Grooper

Users access AI Assistants using the Grooper Chat page. From here, users can select an AI Assistant previously configured in Grooper Design, start new conversations, or continue conversations they have previously started.

Visit this article for more.

Outside of Grooper

Users can also extend AI Assistants to external applications, including Teams, Slack, or custom-built applications. This allows Grooper assistants to be used in multiple channels.

There are two ways to extend a Grooper AI Assistant:

Azure Bot Services
- Microsoft's Azure Bot Framework allows AI Assistants to be exposed to a multitude of applications called "channels."
  - Channels include Teams, Slack, email, SMS, and more.
  - More information on Azure Bot Service channel support can be found here.
- Communication is secured with OAuth client credentials. Users have further control over whether and how documents are linked in AI responses.
- Integrating Grooper with Azure Bot Services requires setup in Grooper, in Azure, and in your own server infrastructure. For a quick reference, visit the Azure Bot Service article.
- The AI Assistant's Bot Connector settings configure the Azure Bot integration. This includes a Bot Id (the client ID for OAuth Client Credentials authentication) and a Bot Password (the client secret).
- Document Links settings control how documents are referenced in AI responses: None (no hyperlinks), Direct (public download links), or InApp (opens the document in the Grooper UI).
Grooper Web Services (GWS) REST API
- GWS is a new API set in Grooper.
- The /assistants endpoints were specifically created for developers who want to interact with AI Assistants via web calls.
- This allows developers to use AI Assistants in their own applications.
- See below for more information on GWS.

What resources can AI Assistants connect to?

AI Assistants can connect to the following resources:

Search Index References — Allows the AI Assistant to retrieve document text content from an Azure AI Search index. Both metadata search and vector searches are supported. Vector queries enable chatting across hundreds or thousands of documents in a large Grooper Repository.
Table References — Allows the AI Assistant to retrieve data from database tables using SQL queries. Because the retrieval plan executes SQL statements, if Grooper has appropriate rights in the connected database, the AI Assistant may also write data to the database.
Web Service References — Allows the AI Assistant to retrieve data from APIs using web service calls. The web service must be described by a RAML (RESTful API Modeling Language) definition in the Web Service Reference configuration.

The AI Assistant's retrieval plan determines which of these resources to use when responding to a chat. When a user message is received, an intermediate LLM operation generates the retrieval plan — the LLM analyzes the conversation and chooses one or more retrieval actions. This allows users to query vast amounts of document text (using vector searches), extracted data (stored as metadata in a Search Index Reference), and information from external sources (SQL tables and web services) — all with a natural language prompt. No complex syntax required.

Built-in Retrieval Tools

In addition to user-defined resources, AI Assistants include several built-in retrieval tools (which can be optionally disabled):

Ask User — Gets more information from the user when needed to complete the retrieval plan.
Help Search — Performs a vector search against Grooper Help topics.
Wiki Search — Performs a vector search against Grooper Wiki articles.
Load Schema — Retrieves schemas for configured resources.
Load Web Page — Fetches the content of a web page. Issues an HTTP GET request and injects the response into the context. If HTML is received, the content is cleaned significantly: script, style, meta, link, and comment elements are removed, most attributes are stripped, and browser-like HTTP headers are included to help avoid captchas and robot detection.

Search Index Subsets

Subsets allow a single search index to be logically divided into multiple sub-indexes using an OData filter. This is useful because every Azure AI Search service has a limit on the number of search indexes.

AI Assistants can reference any number of subsets to control what knowledge they have access to.
Saves AI Search resources when a document set can be divided by a meaningful attribute (such as Document Type) where all subsets share the same field schema.

Vector Index Metadata

Document metadata (such as Data Fields in a Data Model) can now be vector indexed alongside the document's full text.

Why do it? It improves the accuracy of chunked indexes.
Downside? It will increase the size of the index.

What are some benefits to AI Assistants?

AI Assistants provide users with a new way to interact with documents and other connected resources such as databases.

Users can search for documents and their data using natural language.
Provides on-demand access to data inside documents without requiring a Data Model and extraction logic to be configured in advance.
Provides near-instant time-to-value. Minimal processing is required in Grooper before users can start chatting with a single document or across large document sets.
Reduces the need to extract everything up front, allowing users to gain insights into documents without complicated extraction workflows.
Extends access to external data sources, including databases and web services.

Additional AI Assistant and Chat developments

Chained Retrieval

Chained retrieval enables an AI Assistant to execute multi-step retrieval plans. Sometimes the answer to a user's question requires first retrieving content from one resource, then using that content to determine what to retrieve next.

Controlled using the Maximum Retrieval Depth property on the AI Assistant.
Example: "Show me an invoice for ACME Parts and include their line items from the accounting database." — The first retrieval step returns the invoice document and its invoice number. The second step uses that invoice number to retrieve the corresponding line items from the accounting database.

Footnotes and Hyperlinks

To be effective, AI Assistants must refer users to the sources of their information. Grooper uses a custom URL schema to reference in-app resources:

grooper://documents/docId — Refers to a document.
grooper://rel/Help/TopicName — Refers to a help topic.
grooper://sources/id — Refers to an injected source (system message).

Footnotes are generated in a list at the bottom of each response. The Grooper Help index has been rebuilt; topics are now HTML instead of JSON, and all hyperlinks in the HTML use the grooper:// schema.

Chat History Tab

AI Assistants have a Chat History tab in Grooper Design that lets Design users view all messages.

Messages can be filtered by user and date.
Messages can be viewed in standard view or JSON view.
System messages can be hidden or shown using the Show System Messages toggle.
Messages can be deleted from this interface.

Usage Data and Access Control

Usage data — Shows total tokens consumed for a conversation. Individual messages from the assistant and system have tooltips showing input and output token usage. (Embeddings tokens consumed during vector search are not included.)
Result set — Now displays documents referenced in the conversation.
Retrieval plan — Is now included as a system message. The retrieval plan generator also knows the current date and time, which allows simpler queries and eliminates the need for the LLM to know platform-specific date/time functions.
Access Control — AI Assistant access can be controlled through Access Control Lists. Each AI Assistant has an Access List property to define Windows users and groups. Only AI Assistants a user has access to will appear on the Chat page.
Enabled property — AI Assistants can be enabled or disabled with the Enabled property.

Context Window Management

As a conversation grows, it becomes impractical to keep the entire conversation in the context window. Grooper currently applies the following rules:

The last N user and assistant messages are kept in scope.
All system messages referenced by those messages are kept in scope.
"N" is currently set to 3 (the last 3 user and assistant message pairs). This may be exposed as a configurable property in a future version.

Chat Console Improvements

Streaming Chat Completions

The Chat console now supports streaming chat completions, displaying responses one word at a time. This gets responses to users faster and makes it more apparent when the chatbot is processing a large response or if something has gone wrong. Footnote links are added at the end of the streamed response.

Markdown Support The Chat console now renders Markdown in chat responses, significantly improving the readability of LLM output.

HTTP Import

HTTP Import is a new Import Provider in Grooper 2025. It allows users to import website content into Grooper Batches. HTTP Import can be used to import:

Individual webpages
Documents hosted on a website and accessible from a URL
Entire websites

Mechanisms to select links using CSS and filter pages using regular expressions are included in the HTTP Import configuration. A new HTTP Link document type supports sparse ingestion, allowing webpages to be imported first and loaded multithreaded afterward.

Websites are a great resource for AI Assistants, serving as one of many knowledge resources that can be used to answer questions from the Chat page.

FYI

Use case example: The HTTP Import provider was used internally to index publicly available legal and regulatory content — including the California Code of Regulations, Texas Administrative Code, Code of Federal Regulations (Title 29), Oklahoma Statutes, and the DOL Wage & Hour Division website. These were indexed and connected to an AI Assistant designed to answer questions about payroll laws and regulations.

HTML Conditioning Commands

Several new HTTP and HTML commands are available in Grooper 2025 for conditioning HTML documents for further processing. These commands are particularly useful for preparing HTML documents for use with an AI Assistant.

HTTP Link > Load Content — Allows webpages to be imported into Grooper sparsely and then loaded multithreaded.
HTML Document > Condition HTML — Provides several cleanup and normalization options for webpages.
- The Body Selector uses CSS selectors to match an element to replace the HTML body, removing unnecessary text content before feeding webpages to an AI Assistant. For example, a <body> containing <header>, <main>, and <footer> elements can be reduced to just the content of <main>, discarding the header and footer.
- The Removal Selector uses CSS selectors to remove specific HTML elements — useful for stripping marketing sidebars (<aside>), navigation bars (<nav>), and other repetitive content.
- The Site URL can be prepended to relative links in the HTML page for a better viewing experience in the Document Viewer. This ensures the browser can resolve CSS, inline images, and other linked resources.
- Attribute Rules and Wrap Rules assist in styling HTML elements.
  - These rules were developed for use cases involving XML documents converted to HTML using the XML Transform activity.
  - Attribute Rules add attributes to existing HTML elements.
  - Wrap Rules wrap text in an HTML element. Text is matched with regular expressions, then wrapped in an HTML element of your choosing.
HTML Document > Convert to PDF — Converts the HTML page to a PDF document, which Grooper can then process like any other PDF.
HTML Document > Convert to Text — Converts the HTML page to a TXT document. Useful for webpages that present as text files (for example, this page from the US Code of Federal Regulations on govinfo.gov). Removes unnecessary HTML elements and leaves plain text.

Improved HTML Viewer

Highlighting in the Document Viewer's HTML Viewer has been improved. This enhances the user experience when reviewing footnote sources on the Chat page.

AI Productivity Helpers

Full article on AI productivity helpers

Grooper 2024 introduced a set of "AI productivity helpers." These features use a large language model (LLM) to assist Grooper Design users in building Grooper assets. They can assist with regular expressions, SQL queries for Database Lookups, and even creating full Data Models.

⚠	You must enable the LLM Connector option in your Grooper Repository to use these tools.

List of AI Productivity Helpers

AI Generated Schema Importer — Helps create Data Models quickly by generating Data Elements from a natural language prompt. For example, entering "Create a Data Model for invoice processing" will create unconfigured Data Sections, Data Fields, and Data Tables related to invoice processing.
AI Expression Helper — Helps users craft regular expressions for the Pattern Match extractor.
Db Lookup Helper — Helps users craft SQL queries for Database Lookups.
XSLT Helper — Found in the XML Transform activity's XSLT Tester. Generates an XSLT transform from a natural language prompt.
AI Helper — Appears throughout Grooper wherever there is a text editor. Potential uses include:
- Lexicon and Local List editors — Generate lists for List Match extractors.
- Description editors — Generate field descriptions to assist AI Extract, or descriptions for any other Grooper node.
- Code expression editors (Calculated Value, Default Value, Should Submit, etc.) — Generate expressions from natural language prompts.
- List editors — Generate list entries.
- Instruction editors — Generate instructions for AI Extract.

OAuth Support

See the OAuth Setup article for more information.

OAuth is an authentication method that allows third-party applications web access without sharing passwords.

FYI

Microsoft Entra ID (formerly Azure Active Directory) is the only supported OAuth provider at this time.

Benefits of OAuth:

Security — Users do not share their passwords with third-party applications. OAuth safely encrypts transmission of data between servers, making document links secure when connecting AI Assistants to chat clients via Azure Bot Services.
Simplified logins — Users can log into multiple applications with existing accounts. In Grooper's case, with a Microsoft Entra ID account.
Integrations — OAuth is the security standard for app-to-app communication. Securing Grooper with OAuth enables new integration options, including using Azure Bot Services to extend AI Assistants to external chat channels.

Both the Grooper website and the GWS website can be configured with OAuth authentication. Both rely on settings in the web.config file, configured with values from the Azure Portal. If no ida:ClientId setting is present, authentication works the same as in previous versions of Grooper.

Grooper and OAuth — When the Grooper website is configured to use OAuth, users log in using their Entra ID credentials.
- Previous login methods are still supported. Windows authentication remains the default.
- OAuth is required if you are extending an AI Assistant to an external channel like Teams via Azure Bot Services and want to provide users with document links in chat responses.
GWS and OAuth — GWS uses OAuth client credentials to communicate with Azure Bot Services.
- Required for extending AI Assistants to external channels via Azure Bot Services.
- If providing document links in chat responses, both the Grooper website and GWS website must be secured with OAuth.

Additional setup is required to configure OAuth authentication. You must register Grooper as an application in Microsoft Entra ID and configure each website's web.config file. Full instructions are coming soon.

OAuth Service Login for Exchange

Exchange CMIS Connections now have an OAuth Service Login method for connecting Grooper to Exchange servers (Outlook inboxes).

Implements the OAuth 2.0 Client Credentials flow for secure server-to-server authentication.
Allows the application (Grooper) to authenticate using a client ID and client secret without requiring user interaction, then obtain an access token to interact with Exchange APIs on Grooper's behalf.
The previous Exchange OAuth method (a user login method requiring a Microsoft account) is still available and is fine for testing, but OAuth Service Login is preferred for production scenarios.

Document indexing and search improvements

Search Page Improvements

The following improvements were made to the Search Page user interface throughout version 2025's development.

Saved Queries

Search queries are now stored in a Grooper database table ("Query") instead of the user's browser cache. Saved searches persist across browsers and machines.

A new Saved Queries side panel has been added to the left side of the Search page:

Saved queries appear in the list after pressing the save button.
Queries can be selected from the panel and run using the search button.
Queries can be renamed and deleted from the panel.
The panel can be collapsed to save screen space.

Search Parameter Editors

Editors have been added for the Filter, Select, and Order By parameters, making it easier for users unfamiliar with the search syntax to configure these parameters. Look for the "more" icon at the end of each property.

The Filter Editor supports the following component types:

Comparison — Compare a field against a value or list of values.
Group — Combine multiple conditions using AND / OR.
Lambda — Match field values inside collections.
IsMatch — Text search inside the document or a field.
GeoDistance — Match GeoPoint fields based on distance from a reference point.
GeoIntersects — Match GeoPoint[] fields based on intersection with a reference polygon.

A Show All toggle in the Filter Editor reveals all fields in the search index, giving users a simple way to browse available fields and construct basic queries.

AI Helper: Generate Filter

The Search page has a new AI Helper button for the Filter parameter. This replaces the previous AI Query Helper button and works significantly better — it focuses solely on generating a filter rather than building the whole query, and it injects more information including values for drop-down list fields.

Sort by Column Header

Users can now click a column header in the search results list to quickly sort results by that field.

Result Set Command Permissions

Result Set commands can now be secured using Permission Sets configurations, allowing administrators to control which users have access to specific commands on the Search page:

Submit Job — Enabled only if the user has access to the Jobs page.
Assistant Chat — Enabled only if the user has access to the Chat page.
Create Batch — Enabled only if the user has the right to create Batch nodes (defined in Node Permissions).
Download — Currently enabled for all users.

Miscellaneous Search Page Improvements

Case-insensitive string comparisons — String comparisons in Filters are now case-insensitive by default (the previous case-sensitive mode was difficult to use in practice). This is implemented by adding a lowercase normalizer for string fields. Requires API version 2025-09-01 or higher.
Remembered column widths — The Search page now remembers adjusted column widths in the results list, persisted in the user's browser cache across searches and page visits.

Search Indexing Improvements

The following improvements were made to how documents are added to a search index and how search indexes are managed throughout version 2025's development.

Indexes Tab

A new Indexes tab has been added to the Grooper Root node in the Design page. It is visible when the Grooper Repository has an AI Search option enabled.

This tab provides a centralized view of all search indexes with the following capabilities:

Navigate directly to related Content Types.
Inspect the full index definition.
Monitor usage metrics and limits for the search service.
Delete orphaned search indexes.
If an Index Name Prefix is in use, toggle the view to see all indexes displayed with full names.

Index Name Prefix

AI Search has a new Index Name Prefix property. This prevents name collisions when multiple Grooper repositories share a single AI Search service.

A prefix of prod- with an index named invoices results in a full index name of prod-invoices.
A prefix of hr- with an index named tax-documents results in hr-tax-documents.

This enables more efficient use of Azure resources by allowing multiple environments or departments to share a single AI Search service.

Large Document Search Indexing

Large documents would occasionally fail during search indexing due to how embeddings values were collected. Several improvements have been made:

Updated the internal tokenizer with a newer model to produce accurate token counts.
Added logic to enhance capabilities when requesting embeddings in bulk.
Fixed additional internal issues.

Chunking Method

A new Chunking Method configuration is available under Vector Search Options in an Indexing Behavior. This controls how chunks are created when collecting vector embeddings from large documents.

Currently one Chunking Method is available: Fixed Chunker, which allows control over the size of fixed chunks.
If no Chunking Method is set, a non-chunked index is created.

AI-Enabled Separation and Classification

New Separation Provider: AI Separate

AI Separate is a new LLM-based document separation method that requires significantly less configuration than traditional approaches.

How it works:

The LLM is presented with the current page plus N adjacent pages.
It determines whether the middle page is the start of a new document.
If so, a Batch Folder is inserted at that page.

Key properties:

Instructions — Guides the LLM's decisioning when evaluating whether a page begins a new document.
Window Extent — Controls how many adjacent pages are analyzed alongside the evaluated page. A value of "1" means three total pages are presented to the LLM (one before, the evaluated page, and one after).
Document Types — When configured, allows AI Separate to classify documents during separation. The LLM selects the most appropriate Document Type from the list.
Include Reason — When enabled, the LLM provides a reason for its separation decision on each page. This is valuable for troubleshooting and refining instructions.
Include Page Metadata — Enables instruction types based on page metadata, including page dimensions, description, values, and barcodes.

⚠	Separation decisioning using AI Separate is expected to improve over time as LLM models improve.

New Activities: Mark Attachments and Attach

Attachments are whole documents that are part of another (host) document — for example, an exhibit attached to a legal document, or a check attached to an EOB form. Because AIs can make mistakes during separation, sometimes a document that should be treated as an attachment is separated as a standalone document instead. The Mark Attachments and Attach activities work together to resolve this.

Mark Attachments defines Attachment Rules that specify which Document Types should be considered attachments, which Document Types they should be attached to (their hosts), and whether the host comes before or after the attachment.

When the Model property is configured, Mark Attachments operates in AI-enhanced mode: an LLM compares each document to its neighbors and determines whether it should be attached to the previous document, the next document, or not at all.
When the Model property is unconfigured, Mark Attachments operates in rule-based mode: documents are attached exactly as described by the Attachment Type, Host Content Type, and Direction configuration.
Mark Attachments only sets markers; it does not physically move documents. The Attach activity must follow it.

Attach performs the actual document attachment based on the markers set by Mark Attachments. Documents can be attached in one of two ways:

Nested under the host document, creating a parent-child document relationship.
Pages appended to the host document.

New Classification Method: LLM Classifier

LLM Classifier is a new AI-powered classification method that classifies documents by asking a large language model to select the Document Type from a list.

Configuration steps:

Set the Content Model's Classification Method to LLM Classifier.
Add descriptions to individual Document Types as needed to guide the model's decision.

In simple cases, the Document Type name alone may be sufficient. In more nuanced cases, meaningful and distinct descriptions will help the LLM make the correct choice.

FYI

Classification decisioning using LLM Classifier is expected to improve over time as LLM models improve.

New Classification Method: Search Classifier (Experimental)

The Search Classifier classifies documents using a search index. Document Types are assigned by finding similar documents in the index using vector search.

Requires an Indexing Behavior configured for the Content Model with Vector Search enabled.
Requires documents to be present in the search index before the Classify activity can run. Some manual effort is required to seed the index with examples of each Document Type.
Classification is expected to improve over time as corrected examples accumulate in the search index.
For more complex cases, the Search Classifier will likely need supplemental LLM pre-processing (to remove entity names, addresses, etc.) and post-processing (to verify results and break ties).

⚠	Search Classifier has not been tested against real-world document sets. Its efficacy has not been proven in production scenarios. It is largely an untested prototype at this point.

AI-Enabled Data Section Extraction

Grooper 2025 introduces several new AI-based Section Extract Methods for extracting Data Sections. These complement the existing AI Extract fill method.

AI Section Reader

AI Section Reader is a generative AI-based extract method for single-instance Data Sections (a single record on the document). It works nearly identically to AI Extract in terms of configuration (Model, quoting strategy, instructions, data element filter, etc.).

The key difference is timing: AI Section Reader executes at extract time (as a Section Extract Method), whereas AI Extract (as a Fill Method) executes after extraction completes. AI Section Reader also produces detailed diagnostics including schema, chat logs, operation logs, and performance metrics.

FYI

Design users may find it useful to temporarily configure AI Section Reader on a Data Section during testing, since its diagnostics allow unit testing of Data Section extraction without running AI Extract on the full Data Model.

AI Collection Reader

AI Collection Reader is a generative AI-based extract method for multi-instance Data Sections (repeating records on a document). It extends AI Section Reader's capabilities to handle documents of any length by dividing the document into chunks of N pages and processing each chunk independently in parallel.

Includes options for chunk size and maximum threads.
Produces detailed diagnostics.

⚠	Issues can occur when sections break across a page or chunk boundary. The Concat Data Action (see below) helps resolve this.

AI Transaction Detection

AI Transaction Detection is a custom-built Section Extract Method designed for repeating transaction-based sections, such as employee records in payroll reports or claims on an EOB form. It differs from AI Collection Reader in that it detects transaction boundaries first, then runs extraction on each individual transaction in parallel.

This approach handles transactions that span multiple pages better than AI Collection Reader, which splits the document into fixed-size page chunks.

Configuration involves three areas:

Generator — LLM model and generative AI settings.
Boundary Detector — Instructions for detecting transaction anchors (static text labels or regex patterns that identify the start of each record).
Data Extraction — Extraction settings for each detected transaction, including two quoting methods:
- Transaction Quoting — Preprocesses the transaction content.
- Document Quoting — Selects content outside of the transaction (such as a page-level table header) to be included in the extraction context.

Choosing the Right AI Section Extract Method

Grooper now has four AI-enabled Section Extract Methods (plus AI Extract as a Fill Method):

AI Extract — Still the best option for small documents.
AI Section Reader — Use this only when the Data Section must be extracted at extract time rather than fill time. Also useful during testing to view diagnostics and unit test Data Section extraction.
AI Collection Reader — A general-purpose multi-instance Section Extract Method. Use for larger documents with multiple repeating sections.
AI Transaction Detection — Tailor-made for transaction-based sections such as employee records in payroll reports or claims on an EOB.

New Fill Method: Fill Descendants

The Fill Descendants method was created to increase Extract efficiency when using AI Extract on multiple Data Sections in a Data Model. Fill Descendants executes fill methods (such as AI Extract) on descendant Data Elements in parallel, using multiple threads to perform prefetch operations instead of just one.

In one tested scenario with eleven AI Extract Data Sections, extraction time dropped from approximately 5 minutes to 25 seconds — nearly a 12x speedup.

How to use it:

Create two or more Data Sections configured with AI Extract.
Set the Trigger property to False on each Data Section so they do not run individually.
Add Fill Descendants at the Data Model level.

Azure Document Intelligence

About Azure Document Intelligence

Azure Document Intelligence (formerly Form Recognizer) is an AI-powered document processing service in Microsoft Azure. It uses the Azure Read OCR engine for base OCR results and uses prebuilt or custom models to extract text content, layout elements (tables, columns, sections), style information, and semantic elements (key-value pairs, labeled fields).

Grooper's current integration focuses on two prebuilt models:

prebuilt-read — High-accuracy machine print and handwritten OCR.
prebuilt-layout — Extracts document structure in addition to text: identifies tables, paragraphs, headings and sections; preserves reading order and spatial relationships; detects lines and OMR checkboxes.

All models in your Document Intelligence service will be available, but these two are the primary focus.

Connecting to Azure Document Intelligence

Create a Document Intelligence resource in the Azure portal and note the API key and resource name.
From Grooper Design, add the Azure Document Intelligence Repository Option to the Grooper Root and enter the API key and resource name.

FYI

Azure Document Intelligence also has a "Use GCS" option for installations that want to use Grooper Cloud Services for Document Intelligence features instead of their own Azure resource.

New OCR Engine: Azure DI OCR

Azure DI OCR is a new OCR Engine option available in OCR Profiles, powered by Azure Document Intelligence.

New Activity: DI Analyze

DI Analyze runs Azure Document Intelligence image analysis on a document or page and saves the JSON output for later use.

Primary use — AI-enabled extraction. The new DI Layout Quoting Method uses DI Analyze results to inject document structure into extraction operations (such as AI Extract).
Secondary use — OCR. If DI Analyze JSON output is present, the Azure DI OCR engine uses the saved results instead of making a second call to Azure.

⚠	If using both DI Analyze and Azure DI OCR in the same workflow, always run DI Analyze before Recognize to avoid duplicating calls (and costs) to Azure.

DI Analyze also supports an Correct Orientation property that rotates pages based on detected layout orientation using the predominant angle of text lines on the page.

FYI

To use orientation correction, run Split Pages before DI Analyze. Orientation correction applies to Batch Pages only — pages in files attached to Batch Folders must be split first.

Page Level vs. Folder Level

When deciding whether to run DI Analyze at the page level or folder level:

Page level — Better for processing efficiency. A multithreaded Activity Processing service can hand each page to Document Intelligence concurrently, significantly speeding up large multipage documents.
Folder level — Better for page-spanning structure awareness. When tables or paragraphs span multiple pages, folder-level processing allows DI Layout-based operations (like AI Extract) to account for this.
Hard limitation — AI Separate is a page-level operation. To use DI Layout with AI Separate, DI Analyze must run at the page level.

If unsure, start at the page level. All DI Layout-based operations are available when DI data is present at the page level.

DI Layout Quoting Method

The new DI Layout Quoting Method injects DI Analyze results into extraction operations. It supports three output formats:

Markdown — Text layout only. Preferred for simpler scenarios. LLMs interpret markdown well, and table structures detected by DI Analyze are formatted as HTML within the markdown output.
JSON — Text layout and spatial location data.
HTML — Text layout and spatial location data. Preferred for complex scenarios requiring the most accurate spatial locations.

Accurate spatial location data is critical for Grooper's ability to align an LLM's response back to the document and highlight results in the Document Viewer — which in turn is critical for reviewers to verify AI Extract results.

The Include Row Bounds property (HTML output only) adds a page number and bounds for each table row, which can improve spatial grounding.

The Scope property allows injection of content from a specific region of the document (identified by a Data Element with a location) rather than the full document. This is useful for injecting document-level content — like a table header that appears only once — into transaction-level extraction operations.

Spatial Grounding

New Fill Method: Spatial Grounding

Spatial Grounding is an AI-powered Fill Method that assigns a page number and bounding location to each extracted field. It should be run after AI Extract in an extraction workflow.

How it works:

Injects the extracted data plus location data (via Layout Objects, DI Layout, or JSON File quoting methods).
Asks the LLM to output a page number and bounds for each field.
Optionally infers bounds for sections and table rows from field locations.

Why use it? By dividing the work into two simpler tasks — AI Extract captures field values; Spatial Grounding locates them — both tasks achieve higher accuracy. Spatial Grounding further refines zones using OCR character positions.

⚠	Spatial Grounding consumes significantly more tokens than AI Extract. Consider using a less expensive model to control costs.

VLM Analyze

New Activity: VLM Analyze

VLM Analyze analyzes images using vision-language models (VLMs) via chat completion. It runs at the page level only and saves the resulting JSON for use in downstream data extraction.

Works with OpenAI models and open-source Qwen models.
A JSON schema and instructions define what gets extracted. The schema must be of type "object."
AI-enabled features such as AI Extract can use VLM Analyze data via the JSON File quoting method.

FYI

Use the AI Helper button in the JSON Schema editor to have an AI generate the schema for you.

VLM Analyze is well-suited for scenarios where standard OCR-based text extraction is insufficient — for example, analyzing photos for damage assessment, detecting signatures or stamps, identifying watermarks, or any task requiring visual understanding of the image content rather than just its text.

Locating Results with Bounds

To enable highlighting of VLM Analyze results in the Grooper Document Viewer, include a bounds object in the JSON schema. VLM Analyze will return normalized bounding box coordinates (0–1, relative to the image dimensions), which Grooper converts to inches for use in the Document Viewer.

Use the Selector property to define where bounds specifications appear in the schema. The selector ..bounds will select all bounds objects in the result and works well in most cases.

⚠	Bounds coordinates produced by a VLM are not always pixel perfect. They will be close but may not perfectly overlap with the actual content.

Review Improvements

New Command: AI Correct

A new AI Correct command is available in the Data Grid for Data Viewers. It arms Review operators with AI-powered data correction capabilities.

How it works:

An AI is presented with the current document data and the operator's natural language instructions for editing field values.
It generates JSON patch operations.
The patch operations are applied to the document data.

AI Correct works at any level in the Data Model and can process all records in a collection (multi-instance Data Sections or Data Table rows) in parallel.

Example: In one scenario, Grooper had extracted the same error on 115 service lines. Using AI Correct with the instruction "Remove all $0.00 adjustments," 117 validation errors were reduced to 2 in 5–10 seconds — compared to 5–10 minutes if done manually.

New Command: Set All

A new Set All command is available for Data Field and Data Column cells in the Data Grid. It allows bulk editing of all instances of a multi-instance field at once — useful for clearing all instances or setting them to a static value.

Example: An EOB document contains an invalid NPI number on all 74 claims. Using Set ALL, the operator can clear all 74 instances instantly rather than visiting each claim individually.

Data Sections: Tabular View

Multi-instance Data Sections can now be viewed in a tabular view. A toolbar button toggles between standard and tabular view. As the selection moves from record to record in the tabular view, the paging control updates automatically.

Only Data Fields can be edited in tabular view. Data Tables and nested Data Sections are hidden.
Design users may optionally select a subset of Data Fields to display using the Data Section's Tabular View property.

Multiple Document Types: Multiple Tabs

Documents with multiple Document Types are now displayed in tabs in the Data Grid. Previously, data from each Content Type was listed sequentially — a significant improvement over the previous linear view that required scrolling to find Secondary Type data.

Fixed Header

The document header in the Data Grid (where tabs and the error list appear) is now fixed. Operators can navigate from error to error using the error list without losing sight of the error count or the Search Box and its paging control.

Customizable Error List

The Error List can now be docked to any side of the Data Grid by clicking the error count to toggle its position. The Error List auto-syncs with the current field focus.

New Property: Edit Rule

Data Fields and Data Columns have a new Edit Rule property. This allows Grooper to execute a Data Rule whenever a user edits a field value in the Data Grid during Review.

This feature gives designers more control over field edit events, and is an alternative to using a Data Model's Lookups configuration or a Validate Rule for lookup operations. Using Edit Rule for lookups avoids the costly compute time associated with validation events that fire outside a user's direct control.

Button Commands in Data Viewer

Designers can now add clickable buttons for common commands in the Data Grid UI. The new Button Command Types property on Data Models, Data Sections, and Data Tables allows commands to be displayed as toolbar buttons.

Data Table commands appear on the caption bar.
Row commands appear at the end of the row.

Field Search

A new Field Search capability allows searching field values using substring or regex search directly in the Data Grid.

For regex syntax, enclose the pattern in slashes: /\d{4}-764/
Matching results are highlighted with an orange border.
Navigate between hits using the toolbar or hotkeys (Alt + <, Alt + >).
Supports fielded search: for example, Amount: 15.00

Multi-Choice Control for Array Fields

Array fields with drop-down lists now display a Multi-Choice Control, allowing quick selection of multiple values.

Data Grid Progress Spinner

A progress spinner now displays in the Data Grid while background operations run (such as validation events triggered by field edits or commands). The spinner is located next to the Data Grid options dropdown button.

Improved Custom Value Type

The Custom Value Type has two new properties:

Type Name — A name for the type (e.g., SSN, EIN_Number) that makes it easier to identify what data the field represents.
Hint — Instructions to help users and LLMs understand correct syntax for the field. This is presented to users in the Data Grid when reviewing validation errors, and is included in schemas when using Generate Schema with extended properties enabled.

Miscellaneous Review Improvements

Disallow Confirm — A new property on Data Fields and Data Columns. When set to True, operators cannot override validation logic for that field using the Confirm command.
Reviewer Field — A new property on Data Fields. When set, the field is automatically populated with the active username when Review opens. If an array field, all review usernames are recorded.
New Command: Split (Multi-Instance Data Sections) — A new Split command is available for multi-instance Data Sections in the Data Grid (accessed by right-clicking the Data Section's caption). It uses an extractor to find section instances during Review.

Grooper Web Services (GWS)

Grooper Web Services (GWS) is a new set of Grooper REST API endpoints. GWS is installed as a separate website by the Grooper Web Client installer. The installer now creates two IIS sites: /Grooper for the Grooper UI and /GWS for Grooper APIs.

FYI

Eventually GWS will fully replace the initial Grooper REST API offered by API Services. However, API Services will continue to function in this version. The GWS website exposes API documentation on its home page, covering all available endpoints.

GWS Endpoint Collections

AI Assistant related:

/assistants — Endpoints for development using AI Assistants. Use this API to implement a chat client that allows users to interact with Grooper's AI Assistants.
/bot — Endpoints that integrate AI Assistants with Microsoft Azure Bot Services. These endpoints are called by the Azure Bot service. Do not call these endpoints directly.

Search related:

/search — Endpoints for executing document searches. Use this API to query Grooper search indexes using natural language, full text, or metadata searches.

Document processing related:

/batches — Endpoints to access and manage Batches in Grooper.
/documents — Endpoints to access and manage documents in Grooper.
/processes — Endpoints to retrieve information about published Batch Processes and their steps.

Miscellaneous:

/nodes — Endpoints to manage nodes in the Grooper node tree. Provides low-level access to the Grooper Repository's tree structure. Use with caution.
/commands — Endpoints to execute commands on Grooper nodes, including Batches, documents (Batch Folders), or other node types.

Content Type Relationships

Content Type Relationships expose related Data Models to code expressions and other features. They are defined by three properties on a Content Model, Content Category, or Document Type:

Child Of — Specifies a parent Content Type that will always be assigned to a parent Batch Folder in the Batch. Makes all Data Elements from the parent document available in the child document's expression environment. Use for multi-level Batch structures where parent-child Batch Folder relationships are required.
- Example: A Benefits Change Form document configured as "Child Of" a Personnel File will have access to all Personnel File Data Model fields in its expressions — including using Employee_Name in its Export Mappings.
Sibling Of — Specifies one or more Content Types that are assigned alongside the same Batch Folder as Secondary Types. Makes the Data Elements of those sibling Content Types directly accessible in the expression environment using the syntax TypeName.FieldName.
- Note: "Sibling" here refers to sibling Content Types assigned to the same Batch Folder — not sibling Batch Folders within a Batch. There is no automatic event linkage between fields on different types; if a user edits sibling data, fields depending on it will not automatically recalculate.
Relative Of — A more flexible option that defines related Content Types whose Data Elements are exposed in the expression environment, regardless of whether they are parent documents or Secondary Types. Use when the related Content Type could be either a parent or a sibling.

Miscellaneous

New Activities

Fill Data

Fill Data executes one or more fill methods to populate or enrich data on a document. It loads existing document data, runs all fill methods with a specific name at any level in the Data Model, applies optional post-processing rules, and optionally flags the document if data is invalid.

Use case: When data elements are populated at import time, Extract cannot be used (it always overwrites existing data). Fill Data provides an alternative: set Run Child Extractors to False on the Data Model, add a fill method that only fills desired elements, and add a Fill Data activity to the process.

Pick

Pick uses AI to choose the "controlling version" of a Document Type in a Batch — for example, selecting the most authoritative copy from four versions of a loan application in a mortgage file. The AI considers document dates, completeness, presence of signatures, and official stamps or seals.

Tip: Use the Multi-Quote quoting method to inject document content, extracted data, or VLM Analyze output (capturing signatures, stamps, etc.) into the Pick operation.

Detect Language

A new and improved Detect Language activity uses large language models to determine the language of text on a document. Because modern LLMs excel at natural language processing across multiple languages, this activity reliably identifies a document's native language with little to no setup.

The detected language is stored as the document's (Batch Folder's) Culture property.

Note: The previous Detect Language activity still exists in Grooper 2025 under the name "Detect Language (Legacy)."

Route

See Route above.

VLM Analyze

See VLM Analyze above.

New Fill Methods

Run Child Extractors

Run Child Extractors is a new Fill Method that runs extraction logic for child elements. It supports filtering to selectively run extraction logic for specific child elements, which is useful when only a subset of fields needs to be extracted.

Fill Method Collection

Fill Method Collection conditionally executes a list of Fill Methods, enabling complex data extraction workflows. For example, a workflow could use different extraction approaches for small and large documents, with multiple fallback strategies and spatial grounding steps configured conditionally.

Fill Descendants

See AI-Enabled Data Section Extraction above.

Spatial Grounding

See Spatial Grounding above.

New Quoting Methods

Multi Quote

Multi Quote combines multiple quoting strategies, allowing the AI to be presented with content from multiple regions or multiple types of input simultaneously. This is ideal for complex extraction scenarios where a single quoting strategy does not provide sufficient context.

JSON File

Quotes using a JSON file attached to the Batch Folder or Batch Page. Primary use case is to hand data created by VLM Analyze to an LLM. See VLM Analyze above.

DI Layout

See DI Layout Quoting Method above.

New Data Action: Concat

The Concat Data Action combines adjacent records in a collection based on a configurable trigger expression. It is designed to resolve cases where a section instance was incorrectly extracted as two separate partial instances (for example, when an EOB claim spans across two pages and the section header is repeated on page 2).

How it works:

Iterates the collection in reverse order, evaluating the Trigger expression for each pair of adjacent records.
If the Trigger returns true, the two records are merged: child fields from the second record are copied to the first (preserving non-blank values), child collections are merged, and the second record is removed.

FYI

Concat was designed to resolve issues with AI Collection Reader, but its uses extend beyond that. It will concatenate any section instances or table rows according to its Trigger condition.

New Commands

Batch Folder > Set Field Value

Sets a single-instance Data Field value on a document.
Supports adding and removing values from an array field.
- This was added as a prototype "tagging" mechanism in the Search page. Adding a "Tag" field with its Value Type set to String Array allows users to enter simple tags on documents while researching them in the Search page.

Text Document > Insert Page Breaks

Inserts page breaks into text documents before or after a line that matches a regex pattern.
Useful for paginating large text documents for formatting and usability purposes.

Grooper Root > Run Import

Allows Design page users to submit import jobs without leaving the Design page, useful for testing import configurations.
This command should only be used for testing. Large-scale production imports should still be managed from the Imports page or by Import Watcher schedules.

XML File > Split

Splits an XML file into new documents using XPath selectors.
A new child document is created for each selected node.

XML File > Condition XML

Conditions an XML file by stripping out unwanted XML elements selected by an XPath selector.

HTTP Link > Load Content

Loads a document imported via HTTP Import.
Allows webpages to be imported sparsely and then loaded multithreaded.

HTML Document > Condition HTML

Provides cleanup and normalization options for HTML documents.
See above for more information.

HTML Document > Convert to PDF

Converts an HTML page to a PDF document for standard Grooper processing.
This command uses a simple open-source toolkit to render the PDF. It may not render all styling depending on the HTML document.

HTML Document > Convert to Text

Converts an HTML page to a TXT document, str

@@ Line 657: / Line 657: @@
 ** Note: "Sibling" here refers to sibling Content Types assigned to the same Batch Folder — not sibling Batch Folders within a Batch. There is no automatic event linkage between fields on different types; if a user edits sibling data, fields depending on it will not automatically recalculate.
 * '''Relative Of''' — A more flexible option that defines related Content Types whose Data Elements are exposed in the expression environment, regardless of whether they are parent documents or Secondary Types. Use when the related Content Type could be either a parent or a sibling.
-== Route ==
-=== New Activity: Route ===
-The Route activity routes Batch Folders (or Batches) to new Batches based on their Content Type, enabling branching workflows based on initial classification.
-Routing rules are defined by '''Route Definitions''', each specifying:
-* A Content Type to match.
-* A destination Batch Process.
-* An optional Boolean trigger expression for conditional routing.
-Additional options:
-* Items can be moved or cloned into the target Batch. If no route matches, the item remains in its current Batch.
-* Routed Batches can be started in a paused state for review.
-* The '''Include Sibling Types''' option adds all sibling Content Types not already present in the Secondary Type list.
-* Data Actions can copy or transform Data Fields during routing to support scenarios where source and target Content Types have different Data Models.
-Batches have a new '''Pending To Step''' property that supports running Route at the Batch level. Since a new Batch Process cannot be applied while a step is running on the existing one, this property defers the process change: the next time the Batch is completed or resumed, the new process and current step are applied and the Pending To Step value is cleared.
-=== New Concept: Nested Batches (Experimental) ===
-Batches can now have other Batches as their children in specific circumstances. These child Batches are called '''nested Batches'''.
-* Currently, the only way to create a nested Batch is with the Route activity using a Route Definition with its Method set to '''Convert'''. This converts the document (Batch Folder) to a Batch and routes it to the target Batch Process. Currently, this only works at Folder Level 1 in the parent Batch.
-* The Batches Filter has a new '''Include Nested Batches''' option to show nested Batches in the Batch List.
-'''Use case:''' A ZIP file containing 1,000 documents is imported as a Batch. The unzipped documents are classified as Check, EOB, or Mail — each needing to route to a different Batch Process. Without nested Batches, routing separates documents from their parent, making it impossible to monitor the overall import progress or roll back to a previous step. With the Convert method, documents are converted to nested Batches and remain in place.
-{{attn-box|Nested Batches are still in the experimental stages. More work is needed to ensure operational rules are in place to prevent unsafe Batch deletion and improper task processing.}}
 == Miscellaneous ==