What's New in Grooper 2025: Difference between revisions

From Grooper Wiki
No edit summary
 
(42 intermediate revisions by the same user not shown)
Line 85: Line 85:
* '''Downside?''' It will increase the size of the index.
* '''Downside?''' It will increase the size of the index.


=== What are some benefits to AI Assistants? ===
AI Assistants provide users with a new way to interact with documents and other connected resources such as databases.
* Users can search for documents and their data using natural language.
* Provides on-demand access to data inside documents without requiring a Data Model and extraction logic to be configured in advance.
* Provides near-instant time-to-value. Minimal processing is required in Grooper before users can start chatting with a single document or across large document sets.
* Reduces the need to extract everything up front, allowing users to gain insights into documents without complicated extraction workflows.
* Extends access to external data sources, including databases and web services.
=== Additional AI Assistant and Chat developments ===
==== Chained Retrieval ====
==== Chained Retrieval ====


Line 127: Line 138:
* "N" is currently set to 3 (the last 3 user and assistant message pairs). This may be exposed as a configurable property in a future version.
* "N" is currently set to 3 (the last 3 user and assistant message pairs). This may be exposed as a configurable property in a future version.


=== What are some benefits to AI Assistants? ===
AI Assistants provide users with a new way to interact with documents and other connected resources such as databases.
* Users can search for documents and their data using natural language.
* Provides on-demand access to data inside documents without requiring a Data Model and extraction logic to be configured in advance.
* Provides near-instant time-to-value. Minimal processing is required in Grooper before users can start chatting with a single document or across large document sets.
* Reduces the need to extract everything up front, allowing users to gain insights into documents without complicated extraction workflows.
* Extends access to external data sources, including databases and web services.


=== Chat Console Improvements ===
==== Chat Console Improvements ====


==== Streaming Chat Completions ====
'''Streaming Chat Completions'''


The Chat console now supports streaming chat completions, displaying responses one word at a time. This gets responses to users faster and makes it more apparent when the chatbot is processing a large response or if something has gone wrong. Footnote links are added at the end of the streamed response.
The Chat console now supports streaming chat completions, displaying responses one word at a time. This gets responses to users faster and makes it more apparent when the chatbot is processing a large response or if something has gone wrong. Footnote links are added at the end of the streamed response.


==== Markdown Support ====
'''Markdown Support'''
 
The Chat console now renders Markdown in chat responses, significantly improving the readability of LLM output.
The Chat console now renders Markdown in chat responses, significantly improving the readability of LLM output.


Line 202: Line 203:
** '''Instruction editors''' — Generate instructions for AI Extract.
** '''Instruction editors''' — Generate instructions for AI Extract.


== Search Page Improvements ==
== OAuth Support ==
 
[[OAuth Setup|See the OAuth Setup article for more information.]]
 
OAuth is an authentication method that allows third-party applications web access without sharing passwords.


=== Saved Queries ===
{{fyi-box|Microsoft Entra ID (formerly Azure Active Directory) is the only supported OAuth provider at this time.}}
 
Benefits of OAuth:
* '''Security''' — Users do not share their passwords with third-party applications. OAuth safely encrypts transmission of data between servers, making document links secure when connecting AI Assistants to chat clients via Azure Bot Services.
* '''Simplified logins''' — Users can log into multiple applications with existing accounts. In Grooper's case, with a Microsoft Entra ID account.
* '''Integrations''' — OAuth is the security standard for app-to-app communication. Securing Grooper with OAuth enables new integration options, including using Azure Bot Services to extend AI Assistants to external chat channels.
 
Both the Grooper website and the GWS website can be configured with OAuth authentication. Both rely on settings in the <code>web.config</code> file, configured with values from the Azure Portal. If no <code>ida:ClientId</code> setting is present, authentication works the same as in previous versions of Grooper.
 
* '''Grooper and OAuth''' — When the Grooper website is configured to use OAuth, users log in using their Entra ID credentials.
** Previous login methods are still supported. Windows authentication remains the default.
** OAuth is ''required'' if you are extending an AI Assistant to an external channel like Teams via Azure Bot Services and want to provide users with document links in chat responses.
* '''GWS and OAuth''' — GWS uses OAuth client credentials to communicate with Azure Bot Services.
** ''Required'' for extending AI Assistants to external channels via Azure Bot Services.
** If providing document links in chat responses, ''both'' the Grooper website and GWS website must be secured with OAuth.
 
''Additional setup is required to configure OAuth authentication. You must register Grooper as an application in Microsoft Entra ID and configure each website's web.config file. Full instructions are coming soon.''
 
=== OAuth Service Login for Exchange ===
 
Exchange CMIS Connections now have an '''OAuth Service Login''' method for connecting Grooper to Exchange servers (Outlook inboxes).
 
* Implements the OAuth 2.0 Client Credentials flow for secure server-to-server authentication.
* Allows the application (Grooper) to authenticate using a client ID and client secret without requiring user interaction, then obtain an access token to interact with Exchange APIs on Grooper's behalf.
* The previous '''Exchange OAuth''' method (a user login method requiring a Microsoft account) is still available and is fine for testing, but OAuth Service Login is preferred for production scenarios.
 
== Document indexing and search improvements ==
 
=== Search Page Improvements ===
 
The following improvements were made to the [[Search Page]] user interface throughout version 2025's development.
 
==== Saved Queries ====


Search queries are now stored in a Grooper database table ("Query") instead of the user's browser cache. Saved searches persist across browsers and machines.
Search queries are now stored in a Grooper database table ("Query") instead of the user's browser cache. Saved searches persist across browsers and machines.
Line 214: Line 251:
* The panel can be collapsed to save screen space.
* The panel can be collapsed to save screen space.


=== Search Parameter Editors ===
==== Search Parameter Editors ====


Editors have been added for the '''Filter''', '''Select''', and '''Order By''' parameters, making it easier for users unfamiliar with the search syntax to configure these parameters. Look for the "more" icon at the end of each property.
Editors have been added for the '''Filter''', '''Select''', and '''Order By''' parameters, making it easier for users unfamiliar with the search syntax to configure these parameters. Look for the "more" icon at the end of each property.
Line 229: Line 266:
A '''Show All''' toggle in the Filter Editor reveals all fields in the search index, giving users a simple way to browse available fields and construct basic queries.
A '''Show All''' toggle in the Filter Editor reveals all fields in the search index, giving users a simple way to browse available fields and construct basic queries.


=== AI Helper: Generate Filter ===
==== AI Helper: Generate Filter ====


The Search page has a new '''AI Helper''' button for the Filter parameter. This replaces the previous AI Query Helper button and works significantly better — it focuses solely on generating a filter rather than building the whole query, and it injects more information including values for drop-down list fields.
The Search page has a new '''AI Helper''' button for the Filter parameter. This replaces the previous AI Query Helper button and works significantly better — it focuses solely on generating a filter rather than building the whole query, and it injects more information including values for drop-down list fields.


=== Sort by Column Header ===
==== Sort by Column Header ====


Users can now click a column header in the search results list to quickly sort results by that field.
Users can now click a column header in the search results list to quickly sort results by that field.


=== Result Set Command Permissions ===
==== Result Set Command Permissions ====


Result Set commands can now be secured using Permission Sets configurations, allowing administrators to control which users have access to specific commands on the Search page:
Result Set commands can now be secured using Permission Sets configurations, allowing administrators to control which users have access to specific commands on the Search page:
Line 246: Line 283:
* '''Download''' — Currently enabled for all users.
* '''Download''' — Currently enabled for all users.


=== Miscellaneous Search Page Improvements ===
==== Miscellaneous Search Page Improvements ====


* '''Case-insensitive string comparisons''' — String comparisons in Filters are now case-insensitive by default (the previous case-sensitive mode was difficult to use in practice). This is implemented by adding a lowercase normalizer for string fields. Requires API version 2025-09-01 or higher.
* '''Case-insensitive string comparisons''' — String comparisons in Filters are now case-insensitive by default (the previous case-sensitive mode was difficult to use in practice). This is implemented by adding a lowercase normalizer for string fields. Requires API version 2025-09-01 or higher.
* '''Remembered column widths''' — The Search page now remembers adjusted column widths in the results list, persisted in the user's browser cache across searches and page visits.
* '''Remembered column widths''' — The Search page now remembers adjusted column widths in the results list, persisted in the user's browser cache across searches and page visits.


== Search Indexing Improvements ==
=== Search Indexing Improvements ===


=== Indexes Tab ===
The following improvements were made to how documents are added to a search index and how search indexes are managed throughout version 2025's development.
 
==== Indexes Tab ====


A new '''Indexes''' tab has been added to the Grooper Root node in the Design page. It is visible when the Grooper Repository has an AI Search option enabled.
A new '''Indexes''' tab has been added to the Grooper Root node in the Design page. It is visible when the Grooper Repository has an AI Search option enabled.
Line 264: Line 303:
* If an Index Name Prefix is in use, toggle the view to see all indexes displayed with full names.
* If an Index Name Prefix is in use, toggle the view to see all indexes displayed with full names.


=== Index Name Prefix ===
==== Index Name Prefix ====


AI Search has a new '''Index Name Prefix''' property. This prevents name collisions when multiple Grooper repositories share a single AI Search service.
AI Search has a new '''Index Name Prefix''' property. This prevents name collisions when multiple Grooper repositories share a single AI Search service.
Line 273: Line 312:
This enables more efficient use of Azure resources by allowing multiple environments or departments to share a single AI Search service.
This enables more efficient use of Azure resources by allowing multiple environments or departments to share a single AI Search service.


=== Large Document Search Indexing ===
==== Large Document Search Indexing ====


Large documents would occasionally fail during search indexing due to how embeddings values were collected. Several improvements have been made:
Large documents would occasionally fail during search indexing due to how embeddings values were collected. Several improvements have been made:
Line 281: Line 320:
* Fixed additional internal issues.
* Fixed additional internal issues.


=== Chunking Method ===
==== Chunking Method ====


A new '''Chunking Method''' configuration is available under '''Vector Search Options''' in an Indexing Behavior. This controls how chunks are created when collecting vector embeddings from large documents.
A new '''Chunking Method''' configuration is available under '''Vector Search Options''' in an Indexing Behavior. This controls how chunks are created when collecting vector embeddings from large documents.
Line 333: Line 372:
In simple cases, the Document Type name alone may be sufficient. In more nuanced cases, meaningful and distinct descriptions will help the LLM make the correct choice.
In simple cases, the Document Type name alone may be sufficient. In more nuanced cases, meaningful and distinct descriptions will help the LLM make the correct choice.


{{attn-box|Classification decisioning using LLM Classifier is expected to improve over time as LLM models improve.}}
{{fyi-box|Classification decisioning using LLM Classifier is expected to improve over time as LLM models improve.}}


=== New Classification Method: Search Classifier (Experimental) ===
=== New Classification Method: Search Classifier (Experimental) ===
Line 381: Line 420:
** '''Transaction Quoting''' — Preprocesses the transaction content.
** '''Transaction Quoting''' — Preprocesses the transaction content.
** '''Document Quoting''' — Selects content outside of the transaction (such as a page-level table header) to be included in the extraction context.
** '''Document Quoting''' — Selects content outside of the transaction (such as a page-level table header) to be included in the extraction context.
=== New Data Action: Concat ===
The Concat Data Action combines adjacent records in a collection based on a configurable trigger expression. It is designed to resolve cases where a section instance was incorrectly extracted as two separate partial instances (for example, when an EOB claim spans across two pages and the section header is repeated on page 2).
'''How it works:'''
# Iterates the collection in reverse order, evaluating the Trigger expression for each pair of adjacent records.
# If the Trigger returns true, the two records are merged: child fields from the second record are copied to the first (preserving non-blank values), child collections are merged, and the second record is removed.
{{fyi-box|Concat was designed to resolve issues with AI Collection Reader, but its uses extend beyond that. It will concatenate any section instances or table rows according to its Trigger condition.}}


=== Choosing the Right AI Section Extract Method ===
=== Choosing the Right AI Section Extract Method ===
Line 414: Line 443:
# Set the Trigger property to <code>False</code> on each Data Section so they do not run individually.
# Set the Trigger property to <code>False</code> on each Data Section so they do not run individually.
# Add Fill Descendants at the Data Model level.
# Add Fill Descendants at the Data Model level.
=== New Fill Method: Fill Method Collection ===
Fill Method Collection conditionally executes a list of Fill Methods, enabling complex data extraction workflows. For example, a workflow could use different extraction approaches for small and large documents, with multiple fallback strategies and spatial grounding steps configured conditionally.


== Azure Document Intelligence ==
== Azure Document Intelligence ==
Line 494: Line 519:
{{attn-box|Spatial Grounding consumes significantly more tokens than AI Extract. Consider using a less expensive model to control costs.}}
{{attn-box|Spatial Grounding consumes significantly more tokens than AI Extract. Consider using a less expensive model to control costs.}}


== VLM Analyze ==
== Route Activity ==
 
=== New Activity: Route ===
 
The Route activity routes Batch Folders (or Batches) to new Batches based on their Content Type, enabling branching workflows based on initial classification.
 
Routing rules are defined by '''Route Definitions''', each specifying:
* A Content Type to match.
* A destination Batch Process.
* An optional Boolean trigger expression for conditional routing.
 
Additional options:
* Items can be moved or cloned into the target Batch. If no route matches, the item remains in its current Batch.
* Routed Batches can be started in a paused state for review.
* The '''Include Sibling Types''' option adds all sibling Content Types not already present in the Secondary Type list.
* Data Actions can copy or transform Data Fields during routing to support scenarios where source and target Content Types have different Data Models.
 
Batches have a new '''Pending To Step''' property that supports running Route at the Batch level. Since a new Batch Process cannot be applied while a step is running on the existing one, this property defers the process change: the next time the Batch is completed or resumed, the new process and current step are applied and the Pending To Step value is cleared.
 
=== New Concept: Nested Batches (Experimental) ===
 
Batches can now have other Batches as their children in specific circumstances. These child Batches are called '''nested Batches'''.
 
* Currently, the only way to create a nested Batch is with the Route activity using a Route Definition with its Method set to '''Convert'''. This converts the document (Batch Folder) to a Batch and routes it to the target Batch Process. Currently, this only works at Folder Level 1 in the parent Batch.
* The Batches Filter has a new '''Include Nested Batches''' option to show nested Batches in the Batch List.
 
'''Use case:''' A ZIP file containing 1,000 documents is imported as a Batch. The unzipped documents are classified as Check, EOB, or Mail — each needing to route to a different Batch Process. Without nested Batches, routing separates documents from their parent, making it impossible to monitor the overall import progress or roll back to a previous step. With the Convert method, documents are converted to nested Batches and remain in place.
 
{{attn-box|Nested Batches are still in the experimental stages. More work is needed to ensure operational rules are in place to prevent unsafe Batch deletion and improper task processing.}}
 
== Content Type Relationships ==
 
Content Type Relationships expose related Data Models to code expressions and other features. They are defined by three properties on a Content Model, Content Category, or Document Type:
 
* '''Child Of''' — Specifies a parent Content Type that will always be assigned to a parent Batch Folder in the Batch. Makes all Data Elements from the parent document available in the child document's expression environment. Use for multi-level Batch structures where parent-child Batch Folder relationships are required.
** ''Example:'' A Benefits Change Form document configured as "Child Of" a Personnel File will have access to all Personnel File Data Model fields in its expressions — including using <code>Employee_Name</code> in its Export Mappings.
* '''Sibling Of''' — Specifies one or more Content Types that are assigned alongside the same Batch Folder as Secondary Types. Makes the Data Elements of those sibling Content Types directly accessible in the expression environment using the syntax <code>TypeName.FieldName</code>.
** Note: "Sibling" here refers to sibling Content Types assigned to the same Batch Folder — not sibling Batch Folders within a Batch. There is no automatic event linkage between fields on different types; if a user edits sibling data, fields depending on it will not automatically recalculate.
* '''Relative Of''' — A more flexible option that defines related Content Types whose Data Elements are exposed in the expression environment, regardless of whether they are parent documents or Secondary Types. Use when the related Content Type could be either a parent or a sibling.
 
== VLM Analyze Activity ==


=== New Activity: VLM Analyze ===
=== New Activity: VLM Analyze ===
Line 518: Line 583:
== Review Improvements ==
== Review Improvements ==


=== AI Correct ===
=== New Command: AI Correct ===


A new '''AI Correct''' command is available in the Data Grid for Data Viewers. It arms Review operators with AI-powered data correction capabilities.
A new '''AI Correct''' command is available in the Data Grid for Data Viewers. It arms Review operators with AI-powered data correction capabilities.
Line 530: Line 595:


'''Example:''' In one scenario, Grooper had extracted the same error on 115 service lines. Using AI Correct with the instruction "Remove all $0.00 adjustments," 117 validation errors were reduced to 2 in 5–10 seconds — compared to 5–10 minutes if done manually.
'''Example:''' In one scenario, Grooper had extracted the same error on 115 service lines. Using AI Correct with the instruction "Remove all $0.00 adjustments," 117 validation errors were reduced to 2 in 5–10 seconds — compared to 5–10 minutes if done manually.
=== New Command: Set All ===
A new '''Set All''' command is available for Data Field and Data Column cells in the Data Grid. It allows bulk editing of all instances of a multi-instance field at once — useful for clearing all instances or setting them to a static value.
'''Example:''' An EOB document contains an invalid NPI number on all 74 claims. Using Set ALL, the operator can clear all 74 instances instantly rather than visiting each claim individually.


=== Data Sections: Tabular View ===
=== Data Sections: Tabular View ===
Line 549: Line 620:


The Error List can now be docked to any side of the Data Grid by clicking the error count to toggle its position. The Error List auto-syncs with the current field focus.
The Error List can now be docked to any side of the Data Grid by clicking the error count to toggle its position. The Error List auto-syncs with the current field focus.
=== New Command: Set ALL ===
A new '''Set ALL''' command is available for Data Field and Data Column cells in the Data Grid. It allows bulk editing of all instances of a multi-instance field at once — useful for clearing all instances or setting them to a static value.
'''Example:''' An EOB document contains an invalid NPI number on all 74 claims. Using Set ALL, the operator can clear all 74 instances instantly rather than visiting each claim individually.


=== New Property: Edit Rule ===
=== New Property: Edit Rule ===
Line 568: Line 633:
* Data Table commands appear on the caption bar.
* Data Table commands appear on the caption bar.
* Row commands appear at the end of the row.
* Row commands appear at the end of the row.
=== New Command: Split (Multi-Instance Data Sections) ===
A new '''Split''' command is available for multi-instance Data Sections in the Data Grid (accessed by right-clicking the Data Section's caption). It uses an extractor to find section instances during Review.


=== Field Search ===
=== Field Search ===
Line 601: Line 662:
* '''Disallow Confirm''' — A new property on Data Fields and Data Columns. When set to True, operators cannot override validation logic for that field using the Confirm command.
* '''Disallow Confirm''' — A new property on Data Fields and Data Columns. When set to True, operators cannot override validation logic for that field using the Confirm command.
* '''Reviewer Field''' — A new property on Data Fields. When set, the field is automatically populated with the active username when Review opens. If an array field, all review usernames are recorded.
* '''Reviewer Field''' — A new property on Data Fields. When set, the field is automatically populated with the active username when Review opens. If an array field, all review usernames are recorded.
* '''New Command: Split (Multi-Instance Data Sections)''' — A new '''Split''' command is available for multi-instance Data Sections in the Data Grid (accessed by right-clicking the Data Section's caption). It uses an extractor to find section instances during Review.


== Grooper Web Services (GWS) ==
== Grooper Web Services (GWS) ==
Line 626: Line 688:
* '''/commands''' — Endpoints to execute commands on Grooper nodes, including Batches, documents (Batch Folders), or other node types.
* '''/commands''' — Endpoints to execute commands on Grooper nodes, including Batches, documents (Batch Folders), or other node types.


== OAuth Support ==
== Improved Security ==
 
=== HTTPS Now Required ===
 
HTTPS is now mandatory. A trusted TLS/SSL certificate must be configured before installing or upgrading Grooper.
 
* Opening the Grooper web app over HTTP will fail or leave the application unnavigable.
* Existing deployments using HTTP must be updated to HTTPS.
* '''Exception:''' When accessed via localhost addresses, Grooper runs in "dev mode" and is allowed to operate over HTTP.
 
Supported certificate options:


OAuth is an authentication method that allows third-party applications web access without sharing passwords.
* '''Self-Signed Certificate''' — Suitable for dev/test environments only. Requires manual trust installation on each client machine.
* '''Internal Certificate Authority''' (most typical) — Issued by your organization's internal PKI (e.g., Active Directory Certificate Services). Automatically trusted on domain-joined or managed devices. Ensure the internal root CA certificate is deployed on all client machines via Group Policy or device management tools.
* '''Public Certificate Authority''' — Trusted by all browsers and devices. Necessary only for internet-facing deployments. Examples: DigiCert, GlobalSign, Let's Encrypt.


{{fyi-box|Microsoft Entra ID (formerly Azure Active Directory) is the only supported OAuth provider at this time.}}
=== Additional Security HTTP Headers ===


Benefits of OAuth:
Additional security HTTP headers have been enabled. System administrators can inspect these in the <code>httpProtocol > customHeaders</code> section of the Grooper web.config file.
* '''Security''' — Users do not share their passwords with third-party applications. OAuth safely encrypts transmission of data between servers, making document links secure when connecting AI Assistants to chat clients via Azure Bot Services.
* '''Simplified logins''' — Users can log into multiple applications with existing accounts. In Grooper's case, with a Microsoft Entra ID account.
* '''Integrations''' — OAuth is the security standard for app-to-app communication. Securing Grooper with OAuth enables new integration options, including using Azure Bot Services to extend AI Assistants to external chat channels.


Both the Grooper website and the GWS website can be configured with OAuth authentication. Both rely on settings in the <code>web.config</code> file, configured with values from the Azure Portal. If no <code>ida:ClientId</code> setting is present, authentication works the same as in previous versions of Grooper.
A new <code>contentSecurityLevel</code> key controls the strictness of the Content-Security-Policy header:


* '''Grooper and OAuth''' — When the Grooper website is configured to use OAuth, users log in using their Entra ID credentials.
{|class="wikitable"
** Previous login methods are still supported. Windows authentication remains the default.
!Level!!Behavior
** OAuth is ''required'' if you are extending an AI Assistant to an external channel like Teams via Azure Bot Services and want to provide users with document links in chat responses.
|-
* '''GWS and OAuth''' — GWS uses OAuth client credentials to communicate with Azure Bot Services.
|'''Off'''||No CSP applied. Not recommended allows HTMLViewer to execute external scripts.
** ''Required'' for extending AI Assistants to external channels via Azure Bot Services.
|-
** If providing document links in chat responses, ''both'' the Grooper website and GWS website must be secured with OAuth.
|'''Low'''||Least restrictive. HTMLViewer works fully, including external images, styles, and fonts.
|-
|'''Medium'''||Balanced. HTMLViewer works, but external images, styles, and fonts are blocked.
|-
|'''High'''||Most restrictive. HTMLViewer is fully blocked. Maximizes security but may break features.
|}


''Additional setup is required to configure OAuth authentication. You must register Grooper as an application in Microsoft Entra ID and configure each website's web.config file. Full instructions are coming soon.''
These presets can be overridden with a custom Content-Security-Policy using the <code>contentSecurityPolicy</code> key.


=== OAuth Service Login for Exchange ===
=== Import Filter ===


Exchange CMIS Connections now have an '''OAuth Service Login''' method for connecting Grooper to Exchange servers (Outlook inboxes).
A new '''Import Filter''' configuration on the Grooper Root specifies a comma-separated list of permitted file types for import into the Grooper Repository. The filter is enforced during file upload operations.


* Implements the OAuth 2.0 Client Credentials flow for secure server-to-server authentication.
Default permitted types: <code>.tif, .tiff, .jpg, .jpeg, .jp2, .png, .doc, .docx, .xls, .xlsx, .ppt, .pptx, .pdf, .msg, .eml, .txt, .log, .json, .zip</code>
* Allows the application (Grooper) to authenticate using a client ID and client secret without requiring user interaction, then obtain an access token to interact with Exchange APIs on Grooper's behalf.
* The previous '''Exchange OAuth''' method (a user login method requiring a Microsoft account) is still available and is fine for testing, but OAuth Service Login is preferred for production scenarios.


== Content Type Relationships ==
=== Design Page Download Security ===


Content Type Relationships expose related Data Models to code expressions and other features. They are defined by three properties on a Content Model, Content Category, or Document Type:
Node downloads from the Design page now require an explicit user click via a displayed download link. This reduces the risk of unintended or automated downloads and aligns with modern browser security best practices.


* '''Child Of''' — Specifies a parent Content Type that will always be assigned to a parent Batch Folder in the Batch. Makes all Data Elements from the parent document available in the child document's expression environment. Use for multi-level Batch structures where parent-child Batch Folder relationships are required.
== Miscellaneous ==
** ''Example:'' A Benefits Change Form document configured as "Child Of" a Personnel File will have access to all Personnel File Data Model fields in its expressions — including using <code>Employee_Name</code> in its Export Mappings.
* '''Sibling Of''' — Specifies one or more Content Types that are assigned alongside the same Batch Folder as Secondary Types. Makes the Data Elements of those sibling Content Types directly accessible in the expression environment using the syntax <code>TypeName.FieldName</code>.
** Note: "Sibling" here refers to sibling Content Types assigned to the same Batch Folder — not sibling Batch Folders within a Batch. There is no automatic event linkage between fields on different types; if a user edits sibling data, fields depending on it will not automatically recalculate.
* '''Relative Of''' — A more flexible option that defines related Content Types whose Data Elements are exposed in the expression environment, regardless of whether they are parent documents or Secondary Types. Use when the related Content Type could be either a parent or a sibling.


== Route ==
=== New Activities ===


=== New Activity: Route ===
===== Fill Data =====


The Route activity routes Batch Folders (or Batches) to new Batches based on their Content Type, enabling branching workflows based on initial classification.
Fill Data executes one or more fill methods to populate or enrich data on a document. It loads existing document data, runs all fill methods with a specific name at any level in the Data Model, applies optional post-processing rules, and optionally flags the document if data is invalid.


Routing rules are defined by '''Route Definitions''', each specifying:
'''Use case:''' When data elements are populated at import time, Extract cannot be used (it always overwrites existing data). Fill Data provides an alternative: set ''Run Child Extractors'' to False on the Data Model, add a fill method that only fills desired elements, and add a Fill Data activity to the process.
* A Content Type to match.
* A destination Batch Process.
* An optional Boolean trigger expression for conditional routing.


Additional options:
===== Pick =====
* Items can be moved or cloned into the target Batch. If no route matches, the item remains in its current Batch.
* Routed Batches can be started in a paused state for review.
* The '''Include Sibling Types''' option adds all sibling Content Types not already present in the Secondary Type list.
* Data Actions can copy or transform Data Fields during routing to support scenarios where source and target Content Types have different Data Models.


Batches have a new '''Pending To Step''' property that supports running Route at the Batch level. Since a new Batch Process cannot be applied while a step is running on the existing one, this property defers the process change: the next time the Batch is completed or resumed, the new process and current step are applied and the Pending To Step value is cleared.
Pick uses AI to choose the "controlling version" of a Document Type in a Batch — for example, selecting the most authoritative copy from four versions of a loan application in a mortgage file. The AI considers document dates, completeness, presence of signatures, and official stamps or seals.


=== New Concept: Nested Batches (Experimental) ===
'''Tip:''' Use the Multi-Quote quoting method to inject document content, extracted data, or VLM Analyze output (capturing signatures, stamps, etc.) into the Pick operation.


Batches can now have other Batches as their children in specific circumstances. These child Batches are called '''nested Batches'''.
===== Detect Language =====


* Currently, the only way to create a nested Batch is with the Route activity using a Route Definition with its Method set to '''Convert'''. This converts the document (Batch Folder) to a Batch and routes it to the target Batch Process. Currently, this only works at Folder Level 1 in the parent Batch.
A new and improved Detect Language activity uses large language models to determine the language of text on a document. Because modern LLMs excel at natural language processing across multiple languages, this activity reliably identifies a document's native language with little to no setup.
* The Batches Filter has a new '''Include Nested Batches''' option to show nested Batches in the Batch List.


'''Use case:''' A ZIP file containing 1,000 documents is imported as a Batch. The unzipped documents are classified as Check, EOB, or Mail — each needing to route to a different Batch Process. Without nested Batches, routing separates documents from their parent, making it impossible to monitor the overall import progress or roll back to a previous step. With the Convert method, documents are converted to nested Batches and remain in place.
*<li class="fyi-bullet">The detected language is stored as the document's (Batch Folder's) Culture property.


{{attn-box|Nested Batches are still in the experimental stages. More work is needed to ensure operational rules are in place to prevent unsafe Batch deletion and improper task processing.}}
''Note: The previous Detect Language activity still exists in Grooper 2025 under the name "Detect Language (Legacy)."''


== Miscellaneous ==
===== Route =====


=== New Features ===
See [[#Route|Route]] above.


==== New Activity: Fill Data ====
===== VLM Analyze =====


Fill Data executes one or more fill methods to populate or enrich data on a document. It loads existing document data, runs all fill methods with a specific name at any level in the Data Model, applies optional post-processing rules, and optionally flags the document if data is invalid.
See [[#VLM Analyze|VLM Analyze]] above.


'''Use case:''' When data elements are populated at import time, Extract cannot be used (it always overwrites existing data). Fill Data provides an alternative: set ''Run Child Extractors'' to False on the Data Model, add a fill method that only fills desired elements, and add a Fill Data activity to the process.
=== New Fill Methods ===


==== New Fill Method: Run Child Extractors ====
===== Run Child Extractors =====


Run Child Extractors is a new Fill Method that runs extraction logic for child elements. It supports filtering to selectively run extraction logic for specific child elements, which is useful when only a subset of fields needs to be extracted.
Run Child Extractors is a new Fill Method that runs extraction logic for child elements. It supports filtering to selectively run extraction logic for specific child elements, which is useful when only a subset of fields needs to be extracted.


==== New Activity: Pick ====
==== Fill Method Collection ====


Pick uses AI to choose the "controlling version" of a Document Type in a Batch — for example, selecting the most authoritative copy from four versions of a loan application in a mortgage file. The AI considers document dates, completeness, presence of signatures, and official stamps or seals.
Fill Method Collection conditionally executes a list of Fill Methods, enabling complex data extraction workflows. For example, a workflow could use different extraction approaches for small and large documents, with multiple fallback strategies and spatial grounding steps configured conditionally.


Use the Multi-Quote quoting method to inject document content, extracted data, or VLM Analyze output (capturing signatures, stamps, etc.) into the Pick operation.
===== Fill Descendants =====


==== New Quoting Method: Multi Quote ====
See [[#New Fill Method: Fill Descendants|AI-Enabled Data Section Extraction]] above.


Multi Quote combines multiple quoting strategies, allowing the AI to be presented with content from multiple regions or multiple types of input simultaneously. This is ideal for complex extraction scenarios where a single quoting strategy does not provide sufficient context.
===== Spatial Grounding =====


==== New Classification Method: Search Classifier ====
See [[#Spatial Grounding|Spatial Grounding]] above.


See [[#New Classification Method: Search Classifier (Experimental)|AI-Enabled Separation and Classification]] above.
=== New Quoting Methods ===


==== New Fill Method: Fill Descendants ====
===== Multi Quote =====


See [[#New Fill Method: Fill Descendants|AI-Enabled Data Section Extraction]] above.
Multi Quote combines multiple quoting strategies, allowing the AI to be presented with content from multiple regions or multiple types of input simultaneously. This is ideal for complex extraction scenarios where a single quoting strategy does not provide sufficient context.


==== New Activity: Detect Language ====
===== JSON File =====


A new and improved Detect Language activity uses large language models to determine the language of text on a document. Because modern LLMs excel at natural language processing across multiple languages, this activity reliably identifies a document's native language with little to no setup.
Quotes using a JSON file attached to the Batch Folder or Batch Page. Primary use case is to hand data created by VLM Analyze to an LLM. See [[#VLM Analyze|VLM Analyze]] above.


{{fyi-box|The detected language is stored as the document's (Batch Folder's) Culture property.}}
===== DI Layout =====


''Note: The previous Detect Language activity still exists in Grooper 2025 under the name "Detect Language (Legacy)."''
See [[#DI Layout Quoting Method|DI Layout Quoting Method]] above.


==== New Commands ====
=== New Commands ===


'''Batch Folder > Set Field Value'''
'''Batch Folder > Set Field Value'''
Line 771: Line 833:


'''HTML Document > Convert to Text'''
'''HTML Document > Convert to Text'''
* Converts an HTML page to a TXT document, str
* Converts an HTML page to a TXT document, stripping unnecessary HTML elements.
* Useful for webpages that present as plain text files, such as [https://www.govinfo.gov/content/pkg/CFR-2024-title2-vol1/html/CFR-2024-title2-vol1.htm this page from the US Code of Federal Regulations on govinfo.gov].
 
'''Data Container > Generate Schema'''
* Generates a JSON schema file representing the Data Element structure of a Data Model, Data Section, or Data Table and its descendants.
* The schema is saved as a JSON file in the Local Resources Folder associated with the Data Model.
* Optionally includes extended field properties: computed value formulas, expected value formulas, validation formulas, required conditions, and read-only flags.
* Aware of the JSON Data Mapping behavior.
 
'''Data Container > Generate Descriptions'''
* Uses generative AI to create or update human-readable descriptions for Data Elements in a Data Model.
* The Overwrite option determines whether existing descriptions are replaced or only missing ones are filled in.
* Generated descriptions enhance tooltips in the Data Grid, enrich exported JSON schemas, and provide clearer instructions for LLM-based extraction.
 
'''Data Container > Import Descriptions'''
* Populates or updates the Description for each Data Element by importing from a JSON schema.
* Flexible Overwrite options allow replacing all descriptions or only filling in missing ones.
* Designed to work in conjunction with Generate Schema: export the schema, enrich descriptions using external AI tools, then import the descriptions back into Grooper.
 
'''Data Action > Run Command'''
* Executes a command on every instance of a Data Element using Data Rules.
* Useful for automating operations that would normally need to be performed manually from the Data Grid UI.
* ''Example use case:'' Create a custom command in an Object Library that performs a long-running lookup, then execute it automatically after Extract using a Data Rule configured with Run Command.
 
'''Content Type > Submit Job'''
* Runs an Activity on all documents with the Content Type assigned.
* Similar to the Submit Job command on the Search page.
* ''Example:'' Run Extract on all documents with an "Invoice" Document Type.
 
==== Project Archiving Commands ====
 
Two new Project archiving commands are available by right-clicking a Project and opening the Archive flyout:
 
* '''Archive Project''' — Saves a copy of the Project's current contents as a ZIP stored locally on the Project node. Archives are automatically named with a timestamp; an optional Prefix property can be prepended.
* '''Restore from Archive''' — Restores a Project from a previously saved archive, fully replacing its contents. An '''Archive Current''' option saves the Project's current state before restoring.
 
==== Save/Load Preset Commands ====
 
New Save and Load Preset commands allow property settings to be saved and loaded for any Grooper object. Accessed by right-clicking the object's property grid.
 
=== JSON Functionality ===
 
Several new JSON features have been integrated into Grooper 2025.
 
'''JSON Metadata Export (Full and Simple)'''
 
The JSON Metadata Merge/Export Format now supports two layouts:
* '''Full''' — The legacy export, including field values, location information, confidence scores, and more.
* '''Simple''' — A compact JSON file containing extracted values only.
 
'''New Attachment Type: JSON File'''
 
A new JSON File Attachment Type enables JSON-specific commands and features. JSON files imported into Grooper are attached to Batch Folders using this type.
 
'''New Commands: Load Data and Split'''
 
* '''JSON File > Load Data''' — Loads data from a JSON file into a compatible Data Model. Maps JSON nodes and values to corresponding Data Sections and Data Fields using a JPath expression set on their Import Source property. Data Section collections are populated with one instance per matching JSON node.
* '''JSON File > Split''' — Splits a JSON file into child documents (Batch Folders) based on a JPath selector. One child document is created for each matching item.
 
'''New Behavior: JSON Data Mapping'''
 
JSON Data Mapping defines JSON generation options for the JSON Metadata Export Format, allowing customization of which Data Elements are included in the output. Elements can be selected from inherited Data Models, Secondary Type Data Models, or Data Models from parent documents in a Batch hierarchy.
 
Multiple Content Types' Data Model outputs can be combined using three merge modes:
* '''Combine''' — Merges properties from all Data Models into a single JSON object.
* '''Nested''' — Creates an array of JSON objects, nesting each Data Model's output under a property named for its code name.
* '''Nested Secondary''' — Adds primary model properties at the root level, with secondary model outputs nested under their code names.
 
'''New Schema Importer: JSON Schema Importer'''
 
Generates a Data Model corresponding to the properties and structure defined in a JSON Schema file. Properties become Data Fields, Data Sections, or Data Tables depending on their type. Arrays of objects are imported as Data Tables or multi-instance sections. Enum and oneOf definitions are imported as field choice lists. Grooper-specific extended properties (computed values, required conditions, etc.) are imported if enabled.
 
=== New Data Action: Concat ===
 
The Concat Data Action combines adjacent records in a collection based on a configurable trigger expression. It is designed to resolve cases where a section instance was incorrectly extracted as two separate partial instances (for example, when an EOB claim spans across two pages and the section header is repeated on page 2).
 
'''How it works:'''
# Iterates the collection in reverse order, evaluating the Trigger expression for each pair of adjacent records.
# If the Trigger returns true, the two records are merged: child fields from the second record are copied to the first (preserving non-blank values), child collections are merged, and the second record is removed.
 
{{fyi-box|Concat was designed to resolve issues with AI Collection Reader, but its uses extend beyond that. It will concatenate any section instances or table rows according to its Trigger condition.}}
 
=== New AI Configuration Object: Data Generator ===
 
Data Generator is a new AI configuration object that defines generative AI settings for LLM-powered features, configured via various features' Generator property. Key settings include:
 
* '''Model''' — Determines which LLM model is used.
* '''Temperature, TopP, Presence Penalty, Frequency Penalty''' — Standard LLM parameter controls.
* '''Reasoning Effort''' — Specifies computational effort and depth of reasoning for reasoning-enabled models.
* '''Service Tier''' — Specifies quality-of-service level for APIs that support it.
* '''Verbosity''' — Controls how verbose the output should be.
* '''Use Structured Output''' — Enables structured output mode for JSON responses. Generally more reliable, enforcing how the LLM responds and how Grooper parses its response.
* '''Always Inject Schema''' — When enabled alongside Structured Output, always includes the schema in every prompt for improved consistency and accuracy (at the cost of higher token usage).
* '''Max Tries''' — How many times Grooper will attempt to re-issue a request if an error occurs.
 
===== GPT-5 Model Parameter Support =====
 
Support for gpt-5 model parameters has been added to the Parameters settings of LLM-enabled features: Reasoning Effort, Service Tier, and Verbosity. These settings apply to gpt-5 models only.
 
=== User Experience Improvements ===
 
==== Upload Documents from the Batches Page ====
 
[[File:WhatsNew-04-Upload.png|right|class=simpleborder simpleshadow|300px]]
 
A new '''Upload''' button is available in the context toolbar at the top of the Batches page.
 
[[File:WhatsNew-04-Upload-2.png|right|class=simpleborder simpleshadow|300px]]
 
This button allows users to upload one or more files and select a Batch Process. Grooper will create a paused Batch with each file attached.
 
* This is the easiest way to perform transactional document processing in Grooper.
* Users processing only a few local documents no longer need to configure an Import Provider from the Imports page.
 
==== Search Text in Any Document Viewer ====
 
[[File:WhatsNew-04-Document-Search.png|right|class=simpleborder simpleshadow|500px]]
 
Text can now be searched in any document open in the Document Viewer using the '''Document Searcher'''. Press the '''Search Document''' button to open a search box at the bottom of the image. Type a search term and Grooper will highlight all matches. Multiple results can be navigated using the arrow controls.
 
{{fyi-box|Regex searches are supported. Enter a pattern between two forward slashes, e.g. <code>/regex pattern/</code>}}
 
==== "Reports" Tab Replaces Content Type "Summary" Tab ====
 
[[File:WhatsNew-04-Reports.png|right|class=simpleborder simpleshadow|900px]]
 
The Summary tab on Content Types has been replaced with a Reports tab. The Summary tab had become cluttered and limited the ability to add new information. The Reports tab allows users to select exactly what details they want to review.
 
Available reports:
* Circular Expressions
* Data Elements
* Descendants
* Expressions
* Property Overrides
* Validation Rules
 
{{fyi-box|Data Model Variables have been added to the Expressions report.}}
 
==== New Generative AI Commands for Convert Data Activity ====
 
Three new AI commands simplify the process of configuring Data Model conversions in the Convert Data activity:
 
* '''Create Convert Actions''' — Creates a set of Data Actions based on a natural language prompt.
* '''Create Copy Actions''' — Adds a set of Copy actions to an Action List based on a prompt.
* '''Create Child Actions''' — Adds Data Actions to a Copy or Append action based on a prompt.
 
{{attn-box|Source Type and Target Type must be configured for these commands to function.}}
 
=== Efficiency Improvements ===
 
==== Activity Processing Improvements ====
 
Several changes have been made to Grooper Activity Processing services to improve efficiency:
 
* Reduced overhead for idle Activity Processing services.
* New '''Idle Sleep Time''' property controls the time to wait between each polling cycle.
* Improved internal query behavior when services are idle and polling for tasks.
 
==== Dispose Batch Improvements ====
 
The Dispose Batch activity now includes a '''Remove Job History''' property.
 
* Increases performance when archiving Batches for long-term storage.
* Keeps the Grooper database free of unnecessary processing history clutter.
 
=== Other Changes ===
 
==== Export Improvements ====
 
* '''Filename property''' — An optional expression that generates the filename for an exported file.
* '''Trigger property''' — All Export Formats now have a Trigger property, allowing selective generation of output files (for example, conditionally including PDF and/or TIFF versions based on client criteria).
* '''ZIP Format changes''' — Now supports two sources for files in a ZIP: child attachments (via '''Include Child Attachments''') and Export Formats (via '''Included Export Formats'''). A new '''Custom Filename''' expression allows individual files in the ZIP to be named precisely.
* '''Max Retries''' — When configured, Grooper will reattempt an export if the Export activity encounters an error. Useful for recovering from transient network interruptions. Default is 0 (no retries).
 
==== XML Processing and Transform Improvements ====
 
Several improvements have been made to XML processing in Grooper 2025.
 
'''XML Transform activity:'''
* Now supports XML namespaces.
* A new AI transform helper uses an LLM to generate an XSLT transform. Access it by adding XML Transform to a Batch Process, navigating to the step's XSLT Editor tab, and clicking the AI helper button.
 
'''New XML file commands:'''
* '''XML File > Split''' — Splits an XML file into new documents using XPath selectors. One child document is created per selected node.
* '''XML File > Condition XML''' — Strips unwanted XML elements from an XML file.
 
'''XML Schema Importer improvements:'''
* Added support for references to other schemas. All XSD resource files must be in the same folder in Grooper.
* Improved support for non-attribute fields.
* Now imports comments for Data Elements and enumeration values (List Values).
 
==== New Reclassify Mode: Primary ====
 
A new '''Primary''' reclassify mode for the Classify activity sets the primary Content Type while leaving Secondary Types intact. The default '''Overwrite''' mode clears Secondary Types.
 
==== Data Conversion for Related Content Types ====
 
When changing a document's type to a related Content Type (one that shares a Data Model in the same Content Model), Grooper can now preserve the data the two types have in common. Enable this using the '''Convert Existing Data''' property found on the Classify activity, the Assign Document Type command, and the Edit Type Assignment command.
 
==== New Value Type: GeoPoint ====
 
GeoPoint is a new Value Type for representing geographic fields, storing a pair of latitude/longitude coordinates (latitude: -90 to 90; longitude: -180 to 180). The Data Grid displays a maps icon for fields of this type, launching Google Maps for single points or geojson.io for arrays of GeoPoints.
 
GeoPoint enables the Search page's '''Geo Distance''' and '''Geo Intersects''' filter types for location-based search queries.
 
==== Token Usage Statistics ====
 
LLM-based extract methods now log token consumption — input tokens, output tokens, and total tokens — for all activities that utilize an LLM, including Classify (LLM Classifier), Separate (AI Separate), Extract (AI Extract), and Fill Data (AI Extract).
 
==== Batch-Level Data Access for Next Step Expressions ====
 
The Next Step Expression on Batch Process Steps can now access field values. If the Batch Process has its Content Type property set, that Content Type's data is exposed in a property named <code>Data</code>.
 
''Example:'' <code>If(Data.InvoiceAmount > 5000, ReviewL2, ReviewL1)</code> — if the Batch's InvoiceAmount field is greater than 5000, the document routes to ReviewL2; otherwise to ReviewL1.
 
{{attn-box|The Data property can only access Batch-level fields. It cannot access child Batch Folder data.}}
 
==== Data Lookup: Populate Collections ====
 
The Data Lookup action can now populate Data Tables and multi-instance Data Sections. New properties include '''Target Collection''' and '''Clear Existing'''. When Target Collection is set, lookup results populate the collection with one record per row in the result set. Disable Clear Existing to append records rather than replacing them.
 
==== Ask AI and AI Schema Extractor ====
 
The Ask AI extractor has been split into two separate Value Extractors:
 
* '''Ask AI''' — Dedicated to generating natural language responses from an LLM. Use for summarization, evaluation, and other responses best expressed in human-readable text.
* '''AI Schema Extractor''' — Dedicated to generating responses that conform to a JSON schema. Use for highly structured output such as extracting tables, lists, or records with multiple fields. Enables Structured Output for OpenAI models that support it. A new '''Selector''' property allows an array to be selected as the output.
 
==== System Maintenance Commands Split ====
 
The System Maintenance command has been split into two separate commands:
 
* '''Rebuild Indexes''' — Rebuilds database indexes that exceed a configurable fragmentation threshold.
* '''Database Cleanup''' — Purges old statistics, event logs, jobs, tasks, machines, and similar data.
 
The System Maintenance Service has been redesigned accordingly.
 
{{attn-box|Run the System Maintenance commands or System Maintenance Service regularly. Without regular maintenance, performance will degrade due to index fragmentation and historical data buildup.}}
 
==== AI Search Usage Data ====
 
AI Search usage data is now visible in Grooper. Index data (usage metrics and limits) is visible for each Indexing Behavior.
 
==== LLM Fine-Tuning Enhancements ====
 
* The JSONL viewer now provides a UI for editing messages.
* The Build Fine Tuning File command now saves generated JSONL files in the Local Resources Folder.
* The Submit Fine Tuning Job and Delete Fine Tuned Model commands can now be executed on Resource Files with a <code>.jsonl</code> extension.
 
==== New Hide Numeric Suffix Property for Content Types ====
 
A new '''Hide Numeric Suffix''' property for Content Types hides the numbers added at the end of a document's title (e.g., "Invoice (1)" becomes "Invoice"). This is useful for footnote links generated by AI Assistants when generating a custom title using a Content Type's Caption property.
 
==== Keep Low Confidence Fields Property ====
 
A new '''Keep Low Confidence Fields''' property for Data Fields and Data Columns retains extraction results that fall below the Minimum Confidence threshold rather than discarding them. The field is flagged for human verification, allowing reviewers to see all extraction results — even low-confidence ones.
 
==== New Labeled OMR Direction Filter Properties ====
 
Two new direction filter properties are available for Labeled OMR:
 
* '''Box Label Direction''' — Specifies the directions (North/South/East/West) in which checkboxes should be relative to their label. ''Example:'' If checkboxes are always to the left of their label, set Box Label Direction to West.
* '''Group Label Direction''' — Specifies the directions in which checkbox groups should be relative to a header label. ''Example:'' If a group of Yes/No checkboxes always appears below a header label, set Group Label Direction to South.
 
==== Properties Can Be Copied and Pasted Across Object Types ====
 
Properties may now be copied and pasted across different object types in Grooper.
 
==== Improved Table Schema Format Options for AI Extraction ====
 
Data Tables now have a '''Schema Format''' configuration in their AI Extract Table Options, controlling how the table is represented in the schema sent to an LLM:
 
* '''ObjectArray''' — Default. Behavior matches previous versions.
* '''Tabular''' — Reduces token usage compared to ObjectArray. A middle ground between ObjectArray and CSV.
* '''CSV''' — Most compact. Maximally reduces token usage for large tables.
 
Structured output is strongly recommended when using Tabular or CSV formats.
 
{{fyi-box|AI Extract Table Options appear only when the Data Table's Extract Method is set to "AI Table Reader" or its parent Data Model or Data Section has an "AI Extract" fill method configured.}}
 
==== Document Collapse Functions Now Support Azure DI Data ====
 
Document collapse functions (the Batch Folder > Collapse command and the Merge activity when Clear On Completion is enabled) now preserve DI JSON files for each page when child pages are removed, merging them to the parent Batch Folder alongside other Grooper-generated artifacts.

Latest revision as of 14:00, 10 June 2026

Grooper version 2025 is here!

  • Learn about new and improved features below.
  • When available, follow any links to extended articles on a topic.
  • Need help installing Grooper? Check out our Install and Setup article.

AI Assistants and the Chat Page

Full AI Assistant article

What is an AI Assistant?

AI Assistants are Grooper's conversational AI personas. They define a role to be used in Grooper Chat sessions. Each AI Assistant has access to a collection of user-defined resources.

Normally, conversational AIs ("chatbots") only have access to whatever they were trained on. User-defined resources extend the AI Assistant's ability to answer questions on domain-specific information contained in documents, databases, or retrieved from a web service.

AI Assistants use retrieval-augmented generation (RAG) — a technique that queries a knowledge base outside the LLM's core training data before generating a response. This extends the LLM's capabilities to a specific domain (like a corpus of documents) without the need to retrain the model.

FYI

AI Assistants are a replacement for the "AI Analyst" object. AI Analysts were Grooper's first attempt at a conversational AI. AI Assistants are a substantial improvement. They are able to access document content and data quicker, answer questions across larger document sets (even an entire Grooper Repository), and have access to more knowledge resources, such as information obtained from a database.

How does a user interact with an AI Assistant?

In Grooper

Users access AI Assistants using the Grooper Chat page. From here, users can select an AI Assistant previously configured in Grooper Design, start new conversations, or continue conversations they have previously started.

Visit this article for more.

Outside of Grooper

Users can also extend AI Assistants to external applications, including Teams, Slack, or custom-built applications. This allows Grooper assistants to be used in multiple channels.

There are two ways to extend a Grooper AI Assistant:

  1. Azure Bot Services
    • Microsoft's Azure Bot Framework allows AI Assistants to be exposed to a multitude of applications called "channels."
      • Channels include Teams, Slack, email, SMS, and more.
      • More information on Azure Bot Service channel support can be found here.
    • Communication is secured with OAuth client credentials. Users have further control over whether and how documents are linked in AI responses.
    • Integrating Grooper with Azure Bot Services requires setup in Grooper, in Azure, and in your own server infrastructure. For a quick reference, visit the Azure Bot Service article.
    • The AI Assistant's Bot Connector settings configure the Azure Bot integration. This includes a Bot Id (the client ID for OAuth Client Credentials authentication) and a Bot Password (the client secret).
    • Document Links settings control how documents are referenced in AI responses: None (no hyperlinks), Direct (public download links), or InApp (opens the document in the Grooper UI).
  2. Grooper Web Services (GWS) REST API
    • GWS is a new API set in Grooper.
    • The /assistants endpoints were specifically created for developers who want to interact with AI Assistants via web calls.
    • This allows developers to use AI Assistants in their own applications.
    • See below for more information on GWS.

What resources can AI Assistants connect to?

AI Assistants can connect to the following resources:

  • Search Index References — Allows the AI Assistant to retrieve document text content from an Azure AI Search index. Both metadata search and vector searches are supported. Vector queries enable chatting across hundreds or thousands of documents in a large Grooper Repository.
  • Table References — Allows the AI Assistant to retrieve data from database tables using SQL queries. Because the retrieval plan executes SQL statements, if Grooper has appropriate rights in the connected database, the AI Assistant may also write data to the database.
  • Web Service References — Allows the AI Assistant to retrieve data from APIs using web service calls. The web service must be described by a RAML (RESTful API Modeling Language) definition in the Web Service Reference configuration.

The AI Assistant's retrieval plan determines which of these resources to use when responding to a chat. When a user message is received, an intermediate LLM operation generates the retrieval plan — the LLM analyzes the conversation and chooses one or more retrieval actions. This allows users to query vast amounts of document text (using vector searches), extracted data (stored as metadata in a Search Index Reference), and information from external sources (SQL tables and web services) — all with a natural language prompt. No complex syntax required.

Built-in Retrieval Tools

In addition to user-defined resources, AI Assistants include several built-in retrieval tools (which can be optionally disabled):

  • Ask User — Gets more information from the user when needed to complete the retrieval plan.
  • Help Search — Performs a vector search against Grooper Help topics.
  • Wiki Search — Performs a vector search against Grooper Wiki articles.
  • Load Schema — Retrieves schemas for configured resources.
  • Load Web Page — Fetches the content of a web page. Issues an HTTP GET request and injects the response into the context. If HTML is received, the content is cleaned significantly: script, style, meta, link, and comment elements are removed, most attributes are stripped, and browser-like HTTP headers are included to help avoid captchas and robot detection.

Search Index Subsets

Subsets allow a single search index to be logically divided into multiple sub-indexes using an OData filter. This is useful because every Azure AI Search service has a limit on the number of search indexes.

  • AI Assistants can reference any number of subsets to control what knowledge they have access to.
  • Saves AI Search resources when a document set can be divided by a meaningful attribute (such as Document Type) where all subsets share the same field schema.

Vector Index Metadata

Document metadata (such as Data Fields in a Data Model) can now be vector indexed alongside the document's full text.

  • Why do it? It improves the accuracy of chunked indexes.
  • Downside? It will increase the size of the index.

What are some benefits to AI Assistants?

AI Assistants provide users with a new way to interact with documents and other connected resources such as databases.

  • Users can search for documents and their data using natural language.
  • Provides on-demand access to data inside documents without requiring a Data Model and extraction logic to be configured in advance.
  • Provides near-instant time-to-value. Minimal processing is required in Grooper before users can start chatting with a single document or across large document sets.
  • Reduces the need to extract everything up front, allowing users to gain insights into documents without complicated extraction workflows.
  • Extends access to external data sources, including databases and web services.

Additional AI Assistant and Chat developments

Chained Retrieval

Chained retrieval enables an AI Assistant to execute multi-step retrieval plans. Sometimes the answer to a user's question requires first retrieving content from one resource, then using that content to determine what to retrieve next.

  • Controlled using the Maximum Retrieval Depth property on the AI Assistant.
  • Example: "Show me an invoice for ACME Parts and include their line items from the accounting database." — The first retrieval step returns the invoice document and its invoice number. The second step uses that invoice number to retrieve the corresponding line items from the accounting database.

Footnotes and Hyperlinks

To be effective, AI Assistants must refer users to the sources of their information. Grooper uses a custom URL schema to reference in-app resources:

  • grooper://documents/docId — Refers to a document.
  • grooper://rel/Help/TopicName — Refers to a help topic.
  • grooper://sources/id — Refers to an injected source (system message).

Footnotes are generated in a list at the bottom of each response. The Grooper Help index has been rebuilt; topics are now HTML instead of JSON, and all hyperlinks in the HTML use the grooper:// schema.

Chat History Tab

AI Assistants have a Chat History tab in Grooper Design that lets Design users view all messages.

  • Messages can be filtered by user and date.
  • Messages can be viewed in standard view or JSON view.
  • System messages can be hidden or shown using the Show System Messages toggle.
  • Messages can be deleted from this interface.

Usage Data and Access Control

  • Usage data — Shows total tokens consumed for a conversation. Individual messages from the assistant and system have tooltips showing input and output token usage. (Embeddings tokens consumed during vector search are not included.)
  • Result set — Now displays documents referenced in the conversation.
  • Retrieval plan — Is now included as a system message. The retrieval plan generator also knows the current date and time, which allows simpler queries and eliminates the need for the LLM to know platform-specific date/time functions.
  • Access Control — AI Assistant access can be controlled through Access Control Lists. Each AI Assistant has an Access List property to define Windows users and groups. Only AI Assistants a user has access to will appear on the Chat page.
  • Enabled property — AI Assistants can be enabled or disabled with the Enabled property.

Context Window Management

As a conversation grows, it becomes impractical to keep the entire conversation in the context window. Grooper currently applies the following rules:

  • The last N user and assistant messages are kept in scope.
  • All system messages referenced by those messages are kept in scope.
  • "N" is currently set to 3 (the last 3 user and assistant message pairs). This may be exposed as a configurable property in a future version.


Chat Console Improvements

Streaming Chat Completions

The Chat console now supports streaming chat completions, displaying responses one word at a time. This gets responses to users faster and makes it more apparent when the chatbot is processing a large response or if something has gone wrong. Footnote links are added at the end of the streamed response.

Markdown Support The Chat console now renders Markdown in chat responses, significantly improving the readability of LLM output.

HTTP Import

HTTP Import is a new Import Provider in Grooper 2025. It allows users to import website content into Grooper Batches. HTTP Import can be used to import:

  • Individual webpages
  • Documents hosted on a website and accessible from a URL
  • Entire websites

Mechanisms to select links using CSS and filter pages using regular expressions are included in the HTTP Import configuration. A new HTTP Link document type supports sparse ingestion, allowing webpages to be imported first and loaded multithreaded afterward.

Websites are a great resource for AI Assistants, serving as one of many knowledge resources that can be used to answer questions from the Chat page.

FYI

Use case example: The HTTP Import provider was used internally to index publicly available legal and regulatory content — including the California Code of Regulations, Texas Administrative Code, Code of Federal Regulations (Title 29), Oklahoma Statutes, and the DOL Wage & Hour Division website. These were indexed and connected to an AI Assistant designed to answer questions about payroll laws and regulations.

HTML Conditioning Commands

Several new HTTP and HTML commands are available in Grooper 2025 for conditioning HTML documents for further processing. These commands are particularly useful for preparing HTML documents for use with an AI Assistant.

  • HTTP Link > Load Content — Allows webpages to be imported into Grooper sparsely and then loaded multithreaded.
  • HTML Document > Condition HTML — Provides several cleanup and normalization options for webpages.
    • The Body Selector uses CSS selectors to match an element to replace the HTML body, removing unnecessary text content before feeding webpages to an AI Assistant. For example, a <body> containing <header>, <main>, and <footer> elements can be reduced to just the content of <main>, discarding the header and footer.
    • The Removal Selector uses CSS selectors to remove specific HTML elements — useful for stripping marketing sidebars (<aside>), navigation bars (<nav>), and other repetitive content.
    • The Site URL can be prepended to relative links in the HTML page for a better viewing experience in the Document Viewer. This ensures the browser can resolve CSS, inline images, and other linked resources.
    • Attribute Rules and Wrap Rules assist in styling HTML elements.
      • These rules were developed for use cases involving XML documents converted to HTML using the XML Transform activity.
      • Attribute Rules add attributes to existing HTML elements.
      • Wrap Rules wrap text in an HTML element. Text is matched with regular expressions, then wrapped in an HTML element of your choosing.
  • HTML Document > Convert to PDF — Converts the HTML page to a PDF document, which Grooper can then process like any other PDF.
  • HTML Document > Convert to Text — Converts the HTML page to a TXT document. Useful for webpages that present as text files (for example, this page from the US Code of Federal Regulations on govinfo.gov). Removes unnecessary HTML elements and leaves plain text.

Improved HTML Viewer

Highlighting in the Document Viewer's HTML Viewer has been improved. This enhances the user experience when reviewing footnote sources on the Chat page.

AI Productivity Helpers

Full article on AI productivity helpers

Grooper 2024 introduced a set of "AI productivity helpers." These features use a large language model (LLM) to assist Grooper Design users in building Grooper assets. They can assist with regular expressions, SQL queries for Database Lookups, and even creating full Data Models.

You must enable the LLM Connector option in your Grooper Repository to use these tools.

List of AI Productivity Helpers

  • AI Generated Schema Importer — Helps create Data Models quickly by generating Data Elements from a natural language prompt. For example, entering "Create a Data Model for invoice processing" will create unconfigured Data Sections, Data Fields, and Data Tables related to invoice processing.
  • AI Expression Helper — Helps users craft regular expressions for the Pattern Match extractor.
  • Db Lookup Helper — Helps users craft SQL queries for Database Lookups.
  • XSLT Helper — Found in the XML Transform activity's XSLT Tester. Generates an XSLT transform from a natural language prompt.
  • AI Helper — Appears throughout Grooper wherever there is a text editor. Potential uses include:
    • Lexicon and Local List editors — Generate lists for List Match extractors.
    • Description editors — Generate field descriptions to assist AI Extract, or descriptions for any other Grooper node.
    • Code expression editors (Calculated Value, Default Value, Should Submit, etc.) — Generate expressions from natural language prompts.
    • List editors — Generate list entries.
    • Instruction editors — Generate instructions for AI Extract.

OAuth Support

See the OAuth Setup article for more information.

OAuth is an authentication method that allows third-party applications web access without sharing passwords.

FYI

Microsoft Entra ID (formerly Azure Active Directory) is the only supported OAuth provider at this time.

Benefits of OAuth:

  • Security — Users do not share their passwords with third-party applications. OAuth safely encrypts transmission of data between servers, making document links secure when connecting AI Assistants to chat clients via Azure Bot Services.
  • Simplified logins — Users can log into multiple applications with existing accounts. In Grooper's case, with a Microsoft Entra ID account.
  • Integrations — OAuth is the security standard for app-to-app communication. Securing Grooper with OAuth enables new integration options, including using Azure Bot Services to extend AI Assistants to external chat channels.

Both the Grooper website and the GWS website can be configured with OAuth authentication. Both rely on settings in the web.config file, configured with values from the Azure Portal. If no ida:ClientId setting is present, authentication works the same as in previous versions of Grooper.

  • Grooper and OAuth — When the Grooper website is configured to use OAuth, users log in using their Entra ID credentials.
    • Previous login methods are still supported. Windows authentication remains the default.
    • OAuth is required if you are extending an AI Assistant to an external channel like Teams via Azure Bot Services and want to provide users with document links in chat responses.
  • GWS and OAuth — GWS uses OAuth client credentials to communicate with Azure Bot Services.
    • Required for extending AI Assistants to external channels via Azure Bot Services.
    • If providing document links in chat responses, both the Grooper website and GWS website must be secured with OAuth.

Additional setup is required to configure OAuth authentication. You must register Grooper as an application in Microsoft Entra ID and configure each website's web.config file. Full instructions are coming soon.

OAuth Service Login for Exchange

Exchange CMIS Connections now have an OAuth Service Login method for connecting Grooper to Exchange servers (Outlook inboxes).

  • Implements the OAuth 2.0 Client Credentials flow for secure server-to-server authentication.
  • Allows the application (Grooper) to authenticate using a client ID and client secret without requiring user interaction, then obtain an access token to interact with Exchange APIs on Grooper's behalf.
  • The previous Exchange OAuth method (a user login method requiring a Microsoft account) is still available and is fine for testing, but OAuth Service Login is preferred for production scenarios.

Document indexing and search improvements

Search Page Improvements

The following improvements were made to the Search Page user interface throughout version 2025's development.

Saved Queries

Search queries are now stored in a Grooper database table ("Query") instead of the user's browser cache. Saved searches persist across browsers and machines.

A new Saved Queries side panel has been added to the left side of the Search page:

  • Saved queries appear in the list after pressing the save button.
  • Queries can be selected from the panel and run using the search button.
  • Queries can be renamed and deleted from the panel.
  • The panel can be collapsed to save screen space.

Search Parameter Editors

Editors have been added for the Filter, Select, and Order By parameters, making it easier for users unfamiliar with the search syntax to configure these parameters. Look for the "more" icon at the end of each property.

The Filter Editor supports the following component types:

  • Comparison — Compare a field against a value or list of values.
  • Group — Combine multiple conditions using AND / OR.
  • Lambda — Match field values inside collections.
  • IsMatch — Text search inside the document or a field.
  • GeoDistance — Match GeoPoint fields based on distance from a reference point.
  • GeoIntersects — Match GeoPoint[] fields based on intersection with a reference polygon.

A Show All toggle in the Filter Editor reveals all fields in the search index, giving users a simple way to browse available fields and construct basic queries.

AI Helper: Generate Filter

The Search page has a new AI Helper button for the Filter parameter. This replaces the previous AI Query Helper button and works significantly better — it focuses solely on generating a filter rather than building the whole query, and it injects more information including values for drop-down list fields.

Sort by Column Header

Users can now click a column header in the search results list to quickly sort results by that field.

Result Set Command Permissions

Result Set commands can now be secured using Permission Sets configurations, allowing administrators to control which users have access to specific commands on the Search page:

  • Submit Job — Enabled only if the user has access to the Jobs page.
  • Assistant Chat — Enabled only if the user has access to the Chat page.
  • Create Batch — Enabled only if the user has the right to create Batch nodes (defined in Node Permissions).
  • Download — Currently enabled for all users.

Miscellaneous Search Page Improvements

  • Case-insensitive string comparisons — String comparisons in Filters are now case-insensitive by default (the previous case-sensitive mode was difficult to use in practice). This is implemented by adding a lowercase normalizer for string fields. Requires API version 2025-09-01 or higher.
  • Remembered column widths — The Search page now remembers adjusted column widths in the results list, persisted in the user's browser cache across searches and page visits.

Search Indexing Improvements

The following improvements were made to how documents are added to a search index and how search indexes are managed throughout version 2025's development.

Indexes Tab

A new Indexes tab has been added to the Grooper Root node in the Design page. It is visible when the Grooper Repository has an AI Search option enabled.

This tab provides a centralized view of all search indexes with the following capabilities:

  • Navigate directly to related Content Types.
  • Inspect the full index definition.
  • Monitor usage metrics and limits for the search service.
  • Delete orphaned search indexes.
  • If an Index Name Prefix is in use, toggle the view to see all indexes displayed with full names.

Index Name Prefix

AI Search has a new Index Name Prefix property. This prevents name collisions when multiple Grooper repositories share a single AI Search service.

  • A prefix of prod- with an index named invoices results in a full index name of prod-invoices.
  • A prefix of hr- with an index named tax-documents results in hr-tax-documents.

This enables more efficient use of Azure resources by allowing multiple environments or departments to share a single AI Search service.

Large Document Search Indexing

Large documents would occasionally fail during search indexing due to how embeddings values were collected. Several improvements have been made:

  • Updated the internal tokenizer with a newer model to produce accurate token counts.
  • Added logic to enhance capabilities when requesting embeddings in bulk.
  • Fixed additional internal issues.

Chunking Method

A new Chunking Method configuration is available under Vector Search Options in an Indexing Behavior. This controls how chunks are created when collecting vector embeddings from large documents.

  • Currently one Chunking Method is available: Fixed Chunker, which allows control over the size of fixed chunks.
  • If no Chunking Method is set, a non-chunked index is created.

AI-Enabled Separation and Classification

New Separation Provider: AI Separate

AI Separate is a new LLM-based document separation method that requires significantly less configuration than traditional approaches.

How it works:

  1. The LLM is presented with the current page plus N adjacent pages.
  2. It determines whether the middle page is the start of a new document.
  3. If so, a Batch Folder is inserted at that page.

Key properties:

  • Instructions — Guides the LLM's decisioning when evaluating whether a page begins a new document.
  • Window Extent — Controls how many adjacent pages are analyzed alongside the evaluated page. A value of "1" means three total pages are presented to the LLM (one before, the evaluated page, and one after).
  • Document Types — When configured, allows AI Separate to classify documents during separation. The LLM selects the most appropriate Document Type from the list.
  • Include Reason — When enabled, the LLM provides a reason for its separation decision on each page. This is valuable for troubleshooting and refining instructions.
  • Include Page Metadata — Enables instruction types based on page metadata, including page dimensions, description, values, and barcodes.

Separation decisioning using AI Separate is expected to improve over time as LLM models improve.

New Activities: Mark Attachments and Attach

Attachments are whole documents that are part of another (host) document — for example, an exhibit attached to a legal document, or a check attached to an EOB form. Because AIs can make mistakes during separation, sometimes a document that should be treated as an attachment is separated as a standalone document instead. The Mark Attachments and Attach activities work together to resolve this.

Mark Attachments defines Attachment Rules that specify which Document Types should be considered attachments, which Document Types they should be attached to (their hosts), and whether the host comes before or after the attachment.

  • When the Model property is configured, Mark Attachments operates in AI-enhanced mode: an LLM compares each document to its neighbors and determines whether it should be attached to the previous document, the next document, or not at all.
  • When the Model property is unconfigured, Mark Attachments operates in rule-based mode: documents are attached exactly as described by the Attachment Type, Host Content Type, and Direction configuration.
  • Mark Attachments only sets markers; it does not physically move documents. The Attach activity must follow it.

Attach performs the actual document attachment based on the markers set by Mark Attachments. Documents can be attached in one of two ways:

  • Nested under the host document, creating a parent-child document relationship.
  • Pages appended to the host document.

New Classification Method: LLM Classifier

LLM Classifier is a new AI-powered classification method that classifies documents by asking a large language model to select the Document Type from a list.

Configuration steps:

  1. Set the Content Model's Classification Method to LLM Classifier.
  2. Add descriptions to individual Document Types as needed to guide the model's decision.

In simple cases, the Document Type name alone may be sufficient. In more nuanced cases, meaningful and distinct descriptions will help the LLM make the correct choice.

FYI

Classification decisioning using LLM Classifier is expected to improve over time as LLM models improve.

New Classification Method: Search Classifier (Experimental)

The Search Classifier classifies documents using a search index. Document Types are assigned by finding similar documents in the index using vector search.

  • Requires an Indexing Behavior configured for the Content Model with Vector Search enabled.
  • Requires documents to be present in the search index before the Classify activity can run. Some manual effort is required to seed the index with examples of each Document Type.
  • Classification is expected to improve over time as corrected examples accumulate in the search index.
  • For more complex cases, the Search Classifier will likely need supplemental LLM pre-processing (to remove entity names, addresses, etc.) and post-processing (to verify results and break ties).

Search Classifier has not been tested against real-world document sets. Its efficacy has not been proven in production scenarios. It is largely an untested prototype at this point.

AI-Enabled Data Section Extraction

Grooper 2025 introduces several new AI-based Section Extract Methods for extracting Data Sections. These complement the existing AI Extract fill method.

AI Section Reader

AI Section Reader is a generative AI-based extract method for single-instance Data Sections (a single record on the document). It works nearly identically to AI Extract in terms of configuration (Model, quoting strategy, instructions, data element filter, etc.).

The key difference is timing: AI Section Reader executes at extract time (as a Section Extract Method), whereas AI Extract (as a Fill Method) executes after extraction completes. AI Section Reader also produces detailed diagnostics including schema, chat logs, operation logs, and performance metrics.

FYI

Design users may find it useful to temporarily configure AI Section Reader on a Data Section during testing, since its diagnostics allow unit testing of Data Section extraction without running AI Extract on the full Data Model.

AI Collection Reader

AI Collection Reader is a generative AI-based extract method for multi-instance Data Sections (repeating records on a document). It extends AI Section Reader's capabilities to handle documents of any length by dividing the document into chunks of N pages and processing each chunk independently in parallel.

  • Includes options for chunk size and maximum threads.
  • Produces detailed diagnostics.

Issues can occur when sections break across a page or chunk boundary. The Concat Data Action (see below) helps resolve this.

AI Transaction Detection

AI Transaction Detection is a custom-built Section Extract Method designed for repeating transaction-based sections, such as employee records in payroll reports or claims on an EOB form. It differs from AI Collection Reader in that it detects transaction boundaries first, then runs extraction on each individual transaction in parallel.

This approach handles transactions that span multiple pages better than AI Collection Reader, which splits the document into fixed-size page chunks.

Configuration involves three areas:

  • Generator — LLM model and generative AI settings.
  • Boundary Detector — Instructions for detecting transaction anchors (static text labels or regex patterns that identify the start of each record).
  • Data Extraction — Extraction settings for each detected transaction, including two quoting methods:
    • Transaction Quoting — Preprocesses the transaction content.
    • Document Quoting — Selects content outside of the transaction (such as a page-level table header) to be included in the extraction context.

Choosing the Right AI Section Extract Method

Grooper now has four AI-enabled Section Extract Methods (plus AI Extract as a Fill Method):

  • AI Extract — Still the best option for small documents.
  • AI Section Reader — Use this only when the Data Section must be extracted at extract time rather than fill time. Also useful during testing to view diagnostics and unit test Data Section extraction.
  • AI Collection Reader — A general-purpose multi-instance Section Extract Method. Use for larger documents with multiple repeating sections.
  • AI Transaction Detection — Tailor-made for transaction-based sections such as employee records in payroll reports or claims on an EOB.

New Fill Method: Fill Descendants

The Fill Descendants method was created to increase Extract efficiency when using AI Extract on multiple Data Sections in a Data Model. Fill Descendants executes fill methods (such as AI Extract) on descendant Data Elements in parallel, using multiple threads to perform prefetch operations instead of just one.

In one tested scenario with eleven AI Extract Data Sections, extraction time dropped from approximately 5 minutes to 25 seconds — nearly a 12x speedup.

How to use it:

  1. Create two or more Data Sections configured with AI Extract.
  2. Set the Trigger property to False on each Data Section so they do not run individually.
  3. Add Fill Descendants at the Data Model level.

Azure Document Intelligence

About Azure Document Intelligence

Azure Document Intelligence (formerly Form Recognizer) is an AI-powered document processing service in Microsoft Azure. It uses the Azure Read OCR engine for base OCR results and uses prebuilt or custom models to extract text content, layout elements (tables, columns, sections), style information, and semantic elements (key-value pairs, labeled fields).

Grooper's current integration focuses on two prebuilt models:

  • prebuilt-read — High-accuracy machine print and handwritten OCR.
  • prebuilt-layout — Extracts document structure in addition to text: identifies tables, paragraphs, headings and sections; preserves reading order and spatial relationships; detects lines and OMR checkboxes.

All models in your Document Intelligence service will be available, but these two are the primary focus.

Connecting to Azure Document Intelligence

  1. Create a Document Intelligence resource in the Azure portal and note the API key and resource name.
  2. From Grooper Design, add the Azure Document Intelligence Repository Option to the Grooper Root and enter the API key and resource name.

FYI

Azure Document Intelligence also has a "Use GCS" option for installations that want to use Grooper Cloud Services for Document Intelligence features instead of their own Azure resource.

New OCR Engine: Azure DI OCR

Azure DI OCR is a new OCR Engine option available in OCR Profiles, powered by Azure Document Intelligence.

New Activity: DI Analyze

DI Analyze runs Azure Document Intelligence image analysis on a document or page and saves the JSON output for later use.

  • Primary use — AI-enabled extraction. The new DI Layout Quoting Method uses DI Analyze results to inject document structure into extraction operations (such as AI Extract).
  • Secondary use — OCR. If DI Analyze JSON output is present, the Azure DI OCR engine uses the saved results instead of making a second call to Azure.

If using both DI Analyze and Azure DI OCR in the same workflow, always run DI Analyze before Recognize to avoid duplicating calls (and costs) to Azure.

DI Analyze also supports an Correct Orientation property that rotates pages based on detected layout orientation using the predominant angle of text lines on the page.

FYI

To use orientation correction, run Split Pages before DI Analyze. Orientation correction applies to Batch Pages only — pages in files attached to Batch Folders must be split first.

Page Level vs. Folder Level

When deciding whether to run DI Analyze at the page level or folder level:

  • Page level — Better for processing efficiency. A multithreaded Activity Processing service can hand each page to Document Intelligence concurrently, significantly speeding up large multipage documents.
  • Folder level — Better for page-spanning structure awareness. When tables or paragraphs span multiple pages, folder-level processing allows DI Layout-based operations (like AI Extract) to account for this.
  • Hard limitation — AI Separate is a page-level operation. To use DI Layout with AI Separate, DI Analyze must run at the page level.

If unsure, start at the page level. All DI Layout-based operations are available when DI data is present at the page level.

DI Layout Quoting Method

The new DI Layout Quoting Method injects DI Analyze results into extraction operations. It supports three output formats:

  • Markdown — Text layout only. Preferred for simpler scenarios. LLMs interpret markdown well, and table structures detected by DI Analyze are formatted as HTML within the markdown output.
  • JSON — Text layout and spatial location data.
  • HTML — Text layout and spatial location data. Preferred for complex scenarios requiring the most accurate spatial locations.

Accurate spatial location data is critical for Grooper's ability to align an LLM's response back to the document and highlight results in the Document Viewer — which in turn is critical for reviewers to verify AI Extract results.

The Include Row Bounds property (HTML output only) adds a page number and bounds for each table row, which can improve spatial grounding.

The Scope property allows injection of content from a specific region of the document (identified by a Data Element with a location) rather than the full document. This is useful for injecting document-level content — like a table header that appears only once — into transaction-level extraction operations.

Spatial Grounding

New Fill Method: Spatial Grounding

Spatial Grounding is an AI-powered Fill Method that assigns a page number and bounding location to each extracted field. It should be run after AI Extract in an extraction workflow.

How it works:

  1. Injects the extracted data plus location data (via Layout Objects, DI Layout, or JSON File quoting methods).
  2. Asks the LLM to output a page number and bounds for each field.
  3. Optionally infers bounds for sections and table rows from field locations.

Why use it? By dividing the work into two simpler tasks — AI Extract captures field values; Spatial Grounding locates them — both tasks achieve higher accuracy. Spatial Grounding further refines zones using OCR character positions.

Spatial Grounding consumes significantly more tokens than AI Extract. Consider using a less expensive model to control costs.

Route Activity

New Activity: Route

The Route activity routes Batch Folders (or Batches) to new Batches based on their Content Type, enabling branching workflows based on initial classification.

Routing rules are defined by Route Definitions, each specifying:

  • A Content Type to match.
  • A destination Batch Process.
  • An optional Boolean trigger expression for conditional routing.

Additional options:

  • Items can be moved or cloned into the target Batch. If no route matches, the item remains in its current Batch.
  • Routed Batches can be started in a paused state for review.
  • The Include Sibling Types option adds all sibling Content Types not already present in the Secondary Type list.
  • Data Actions can copy or transform Data Fields during routing to support scenarios where source and target Content Types have different Data Models.

Batches have a new Pending To Step property that supports running Route at the Batch level. Since a new Batch Process cannot be applied while a step is running on the existing one, this property defers the process change: the next time the Batch is completed or resumed, the new process and current step are applied and the Pending To Step value is cleared.

New Concept: Nested Batches (Experimental)

Batches can now have other Batches as their children in specific circumstances. These child Batches are called nested Batches.

  • Currently, the only way to create a nested Batch is with the Route activity using a Route Definition with its Method set to Convert. This converts the document (Batch Folder) to a Batch and routes it to the target Batch Process. Currently, this only works at Folder Level 1 in the parent Batch.
  • The Batches Filter has a new Include Nested Batches option to show nested Batches in the Batch List.

Use case: A ZIP file containing 1,000 documents is imported as a Batch. The unzipped documents are classified as Check, EOB, or Mail — each needing to route to a different Batch Process. Without nested Batches, routing separates documents from their parent, making it impossible to monitor the overall import progress or roll back to a previous step. With the Convert method, documents are converted to nested Batches and remain in place.

Nested Batches are still in the experimental stages. More work is needed to ensure operational rules are in place to prevent unsafe Batch deletion and improper task processing.

Content Type Relationships

Content Type Relationships expose related Data Models to code expressions and other features. They are defined by three properties on a Content Model, Content Category, or Document Type:

  • Child Of — Specifies a parent Content Type that will always be assigned to a parent Batch Folder in the Batch. Makes all Data Elements from the parent document available in the child document's expression environment. Use for multi-level Batch structures where parent-child Batch Folder relationships are required.
    • Example: A Benefits Change Form document configured as "Child Of" a Personnel File will have access to all Personnel File Data Model fields in its expressions — including using Employee_Name in its Export Mappings.
  • Sibling Of — Specifies one or more Content Types that are assigned alongside the same Batch Folder as Secondary Types. Makes the Data Elements of those sibling Content Types directly accessible in the expression environment using the syntax TypeName.FieldName.
    • Note: "Sibling" here refers to sibling Content Types assigned to the same Batch Folder — not sibling Batch Folders within a Batch. There is no automatic event linkage between fields on different types; if a user edits sibling data, fields depending on it will not automatically recalculate.
  • Relative Of — A more flexible option that defines related Content Types whose Data Elements are exposed in the expression environment, regardless of whether they are parent documents or Secondary Types. Use when the related Content Type could be either a parent or a sibling.

VLM Analyze Activity

New Activity: VLM Analyze

VLM Analyze analyzes images using vision-language models (VLMs) via chat completion. It runs at the page level only and saves the resulting JSON for use in downstream data extraction.

  • Works with OpenAI models and open-source Qwen models.
  • A JSON schema and instructions define what gets extracted. The schema must be of type "object."
  • AI-enabled features such as AI Extract can use VLM Analyze data via the JSON File quoting method.

FYI

Use the AI Helper button in the JSON Schema editor to have an AI generate the schema for you.

VLM Analyze is well-suited for scenarios where standard OCR-based text extraction is insufficient — for example, analyzing photos for damage assessment, detecting signatures or stamps, identifying watermarks, or any task requiring visual understanding of the image content rather than just its text.

Locating Results with Bounds

To enable highlighting of VLM Analyze results in the Grooper Document Viewer, include a bounds object in the JSON schema. VLM Analyze will return normalized bounding box coordinates (0–1, relative to the image dimensions), which Grooper converts to inches for use in the Document Viewer.

Use the Selector property to define where bounds specifications appear in the schema. The selector ..bounds will select all bounds objects in the result and works well in most cases.

Bounds coordinates produced by a VLM are not always pixel perfect. They will be close but may not perfectly overlap with the actual content.

Review Improvements

New Command: AI Correct

A new AI Correct command is available in the Data Grid for Data Viewers. It arms Review operators with AI-powered data correction capabilities.

How it works:

  1. An AI is presented with the current document data and the operator's natural language instructions for editing field values.
  2. It generates JSON patch operations.
  3. The patch operations are applied to the document data.

AI Correct works at any level in the Data Model and can process all records in a collection (multi-instance Data Sections or Data Table rows) in parallel.

Example: In one scenario, Grooper had extracted the same error on 115 service lines. Using AI Correct with the instruction "Remove all $0.00 adjustments," 117 validation errors were reduced to 2 in 5–10 seconds — compared to 5–10 minutes if done manually.

New Command: Set All

A new Set All command is available for Data Field and Data Column cells in the Data Grid. It allows bulk editing of all instances of a multi-instance field at once — useful for clearing all instances or setting them to a static value.

Example: An EOB document contains an invalid NPI number on all 74 claims. Using Set ALL, the operator can clear all 74 instances instantly rather than visiting each claim individually.

Data Sections: Tabular View

Multi-instance Data Sections can now be viewed in a tabular view. A toolbar button toggles between standard and tabular view. As the selection moves from record to record in the tabular view, the paging control updates automatically.

  • Only Data Fields can be edited in tabular view. Data Tables and nested Data Sections are hidden.
  • Design users may optionally select a subset of Data Fields to display using the Data Section's Tabular View property.

Multiple Document Types: Multiple Tabs

Documents with multiple Document Types are now displayed in tabs in the Data Grid. Previously, data from each Content Type was listed sequentially — a significant improvement over the previous linear view that required scrolling to find Secondary Type data.

Fixed Header

The document header in the Data Grid (where tabs and the error list appear) is now fixed. Operators can navigate from error to error using the error list without losing sight of the error count or the Search Box and its paging control.

Customizable Error List

The Error List can now be docked to any side of the Data Grid by clicking the error count to toggle its position. The Error List auto-syncs with the current field focus.

New Property: Edit Rule

Data Fields and Data Columns have a new Edit Rule property. This allows Grooper to execute a Data Rule whenever a user edits a field value in the Data Grid during Review.

This feature gives designers more control over field edit events, and is an alternative to using a Data Model's Lookups configuration or a Validate Rule for lookup operations. Using Edit Rule for lookups avoids the costly compute time associated with validation events that fire outside a user's direct control.

Button Commands in Data Viewer

Designers can now add clickable buttons for common commands in the Data Grid UI. The new Button Command Types property on Data Models, Data Sections, and Data Tables allows commands to be displayed as toolbar buttons.

  • Data Table commands appear on the caption bar.
  • Row commands appear at the end of the row.

Field Search

A new Field Search capability allows searching field values using substring or regex search directly in the Data Grid.

  • For regex syntax, enclose the pattern in slashes: /\d{4}-764/
  • Matching results are highlighted with an orange border.
  • Navigate between hits using the toolbar or hotkeys (Alt + <, Alt + >).
  • Supports fielded search: for example, Amount: 15.00

Multi-Choice Control for Array Fields

Array fields with drop-down lists now display a Multi-Choice Control, allowing quick selection of multiple values.

Data Grid Progress Spinner

A progress spinner now displays in the Data Grid while background operations run (such as validation events triggered by field edits or commands). The spinner is located next to the Data Grid options dropdown button.

Improved Custom Value Type

The Custom Value Type has two new properties:

  • Type Name — A name for the type (e.g., SSN, EIN_Number) that makes it easier to identify what data the field represents.
  • Hint — Instructions to help users and LLMs understand correct syntax for the field. This is presented to users in the Data Grid when reviewing validation errors, and is included in schemas when using Generate Schema with extended properties enabled.

Miscellaneous Review Improvements

  • Disallow Confirm — A new property on Data Fields and Data Columns. When set to True, operators cannot override validation logic for that field using the Confirm command.
  • Reviewer Field — A new property on Data Fields. When set, the field is automatically populated with the active username when Review opens. If an array field, all review usernames are recorded.
  • New Command: Split (Multi-Instance Data Sections) — A new Split command is available for multi-instance Data Sections in the Data Grid (accessed by right-clicking the Data Section's caption). It uses an extractor to find section instances during Review.

Grooper Web Services (GWS)

Grooper Web Services (GWS) is a new set of Grooper REST API endpoints. GWS is installed as a separate website by the Grooper Web Client installer. The installer now creates two IIS sites: /Grooper for the Grooper UI and /GWS for Grooper APIs.

FYI

Eventually GWS will fully replace the initial Grooper REST API offered by API Services. However, API Services will continue to function in this version. The GWS website exposes API documentation on its home page, covering all available endpoints.

GWS Endpoint Collections

AI Assistant related:

  • /assistants — Endpoints for development using AI Assistants. Use this API to implement a chat client that allows users to interact with Grooper's AI Assistants.
  • /bot — Endpoints that integrate AI Assistants with Microsoft Azure Bot Services. These endpoints are called by the Azure Bot service. Do not call these endpoints directly.

Search related:

  • /search — Endpoints for executing document searches. Use this API to query Grooper search indexes using natural language, full text, or metadata searches.

Document processing related:

  • /batches — Endpoints to access and manage Batches in Grooper.
  • /documents — Endpoints to access and manage documents in Grooper.
  • /processes — Endpoints to retrieve information about published Batch Processes and their steps.

Miscellaneous:

  • /nodes — Endpoints to manage nodes in the Grooper node tree. Provides low-level access to the Grooper Repository's tree structure. Use with caution.
  • /commands — Endpoints to execute commands on Grooper nodes, including Batches, documents (Batch Folders), or other node types.

Improved Security

HTTPS Now Required

HTTPS is now mandatory. A trusted TLS/SSL certificate must be configured before installing or upgrading Grooper.

  • Opening the Grooper web app over HTTP will fail or leave the application unnavigable.
  • Existing deployments using HTTP must be updated to HTTPS.
  • Exception: When accessed via localhost addresses, Grooper runs in "dev mode" and is allowed to operate over HTTP.

Supported certificate options:

  • Self-Signed Certificate — Suitable for dev/test environments only. Requires manual trust installation on each client machine.
  • Internal Certificate Authority (most typical) — Issued by your organization's internal PKI (e.g., Active Directory Certificate Services). Automatically trusted on domain-joined or managed devices. Ensure the internal root CA certificate is deployed on all client machines via Group Policy or device management tools.
  • Public Certificate Authority — Trusted by all browsers and devices. Necessary only for internet-facing deployments. Examples: DigiCert, GlobalSign, Let's Encrypt.

Additional Security HTTP Headers

Additional security HTTP headers have been enabled. System administrators can inspect these in the httpProtocol > customHeaders section of the Grooper web.config file.

A new contentSecurityLevel key controls the strictness of the Content-Security-Policy header:

Level Behavior
Off No CSP applied. Not recommended — allows HTMLViewer to execute external scripts.
Low Least restrictive. HTMLViewer works fully, including external images, styles, and fonts.
Medium Balanced. HTMLViewer works, but external images, styles, and fonts are blocked.
High Most restrictive. HTMLViewer is fully blocked. Maximizes security but may break features.

These presets can be overridden with a custom Content-Security-Policy using the contentSecurityPolicy key.

Import Filter

A new Import Filter configuration on the Grooper Root specifies a comma-separated list of permitted file types for import into the Grooper Repository. The filter is enforced during file upload operations.

Default permitted types: .tif, .tiff, .jpg, .jpeg, .jp2, .png, .doc, .docx, .xls, .xlsx, .ppt, .pptx, .pdf, .msg, .eml, .txt, .log, .json, .zip

Design Page Download Security

Node downloads from the Design page now require an explicit user click via a displayed download link. This reduces the risk of unintended or automated downloads and aligns with modern browser security best practices.

Miscellaneous

New Activities

Fill Data

Fill Data executes one or more fill methods to populate or enrich data on a document. It loads existing document data, runs all fill methods with a specific name at any level in the Data Model, applies optional post-processing rules, and optionally flags the document if data is invalid.

Use case: When data elements are populated at import time, Extract cannot be used (it always overwrites existing data). Fill Data provides an alternative: set Run Child Extractors to False on the Data Model, add a fill method that only fills desired elements, and add a Fill Data activity to the process.

Pick

Pick uses AI to choose the "controlling version" of a Document Type in a Batch — for example, selecting the most authoritative copy from four versions of a loan application in a mortgage file. The AI considers document dates, completeness, presence of signatures, and official stamps or seals.

Tip: Use the Multi-Quote quoting method to inject document content, extracted data, or VLM Analyze output (capturing signatures, stamps, etc.) into the Pick operation.

Detect Language

A new and improved Detect Language activity uses large language models to determine the language of text on a document. Because modern LLMs excel at natural language processing across multiple languages, this activity reliably identifies a document's native language with little to no setup.

  • The detected language is stored as the document's (Batch Folder's) Culture property.

Note: The previous Detect Language activity still exists in Grooper 2025 under the name "Detect Language (Legacy)."

Route

See Route above.

VLM Analyze

See VLM Analyze above.

New Fill Methods

Run Child Extractors

Run Child Extractors is a new Fill Method that runs extraction logic for child elements. It supports filtering to selectively run extraction logic for specific child elements, which is useful when only a subset of fields needs to be extracted.

Fill Method Collection

Fill Method Collection conditionally executes a list of Fill Methods, enabling complex data extraction workflows. For example, a workflow could use different extraction approaches for small and large documents, with multiple fallback strategies and spatial grounding steps configured conditionally.

Fill Descendants

See AI-Enabled Data Section Extraction above.

Spatial Grounding

See Spatial Grounding above.

New Quoting Methods

Multi Quote

Multi Quote combines multiple quoting strategies, allowing the AI to be presented with content from multiple regions or multiple types of input simultaneously. This is ideal for complex extraction scenarios where a single quoting strategy does not provide sufficient context.

JSON File

Quotes using a JSON file attached to the Batch Folder or Batch Page. Primary use case is to hand data created by VLM Analyze to an LLM. See VLM Analyze above.

DI Layout

See DI Layout Quoting Method above.

New Commands

Batch Folder > Set Field Value

  • Sets a single-instance Data Field value on a document.
  • Supports adding and removing values from an array field.

FYI

This was added as a prototype "tagging" mechanism in the Search page. Adding a "Tag" field with its Value Type set to String Array allows users to enter simple tags on documents while researching them in the Search page.

Text Document > Insert Page Breaks

  • Inserts page breaks into text documents before or after a line that matches a regex pattern.
  • Useful for paginating large text documents for formatting and usability purposes.

Grooper Root > Run Import

  • Allows Design page users to submit import jobs without leaving the Design page, useful for testing import configurations.

This command should only be used for testing. Large-scale production imports should still be managed from the Imports page or by Import Watcher schedules.

XML File > Split

  • Splits an XML file into new documents using XPath selectors.
  • A new child document is created for each selected node.

XML File > Condition XML

  • Conditions an XML file by stripping out unwanted XML elements selected by an XPath selector.

HTTP Link > Load Content

  • Loads a document imported via HTTP Import.
  • Allows webpages to be imported sparsely and then loaded multithreaded.

HTML Document > Condition HTML

HTML Document > Convert to PDF

  • Converts an HTML page to a PDF document for standard Grooper processing.

This command uses a simple open-source toolkit to render the PDF. It may not render all styling depending on the HTML document.

HTML Document > Convert to Text

Data Container > Generate Schema

  • Generates a JSON schema file representing the Data Element structure of a Data Model, Data Section, or Data Table and its descendants.
  • The schema is saved as a JSON file in the Local Resources Folder associated with the Data Model.
  • Optionally includes extended field properties: computed value formulas, expected value formulas, validation formulas, required conditions, and read-only flags.
  • Aware of the JSON Data Mapping behavior.

Data Container > Generate Descriptions

  • Uses generative AI to create or update human-readable descriptions for Data Elements in a Data Model.
  • The Overwrite option determines whether existing descriptions are replaced or only missing ones are filled in.
  • Generated descriptions enhance tooltips in the Data Grid, enrich exported JSON schemas, and provide clearer instructions for LLM-based extraction.

Data Container > Import Descriptions

  • Populates or updates the Description for each Data Element by importing from a JSON schema.
  • Flexible Overwrite options allow replacing all descriptions or only filling in missing ones.
  • Designed to work in conjunction with Generate Schema: export the schema, enrich descriptions using external AI tools, then import the descriptions back into Grooper.

Data Action > Run Command

  • Executes a command on every instance of a Data Element using Data Rules.
  • Useful for automating operations that would normally need to be performed manually from the Data Grid UI.
  • Example use case: Create a custom command in an Object Library that performs a long-running lookup, then execute it automatically after Extract using a Data Rule configured with Run Command.

Content Type > Submit Job

  • Runs an Activity on all documents with the Content Type assigned.
  • Similar to the Submit Job command on the Search page.
  • Example: Run Extract on all documents with an "Invoice" Document Type.

Project Archiving Commands

Two new Project archiving commands are available by right-clicking a Project and opening the Archive flyout:

  • Archive Project — Saves a copy of the Project's current contents as a ZIP stored locally on the Project node. Archives are automatically named with a timestamp; an optional Prefix property can be prepended.
  • Restore from Archive — Restores a Project from a previously saved archive, fully replacing its contents. An Archive Current option saves the Project's current state before restoring.

Save/Load Preset Commands

New Save and Load Preset commands allow property settings to be saved and loaded for any Grooper object. Accessed by right-clicking the object's property grid.

JSON Functionality

Several new JSON features have been integrated into Grooper 2025.

JSON Metadata Export (Full and Simple)

The JSON Metadata Merge/Export Format now supports two layouts:

  • Full — The legacy export, including field values, location information, confidence scores, and more.
  • Simple — A compact JSON file containing extracted values only.

New Attachment Type: JSON File

A new JSON File Attachment Type enables JSON-specific commands and features. JSON files imported into Grooper are attached to Batch Folders using this type.

New Commands: Load Data and Split

  • JSON File > Load Data — Loads data from a JSON file into a compatible Data Model. Maps JSON nodes and values to corresponding Data Sections and Data Fields using a JPath expression set on their Import Source property. Data Section collections are populated with one instance per matching JSON node.
  • JSON File > Split — Splits a JSON file into child documents (Batch Folders) based on a JPath selector. One child document is created for each matching item.

New Behavior: JSON Data Mapping

JSON Data Mapping defines JSON generation options for the JSON Metadata Export Format, allowing customization of which Data Elements are included in the output. Elements can be selected from inherited Data Models, Secondary Type Data Models, or Data Models from parent documents in a Batch hierarchy.

Multiple Content Types' Data Model outputs can be combined using three merge modes:

  • Combine — Merges properties from all Data Models into a single JSON object.
  • Nested — Creates an array of JSON objects, nesting each Data Model's output under a property named for its code name.
  • Nested Secondary — Adds primary model properties at the root level, with secondary model outputs nested under their code names.

New Schema Importer: JSON Schema Importer

Generates a Data Model corresponding to the properties and structure defined in a JSON Schema file. Properties become Data Fields, Data Sections, or Data Tables depending on their type. Arrays of objects are imported as Data Tables or multi-instance sections. Enum and oneOf definitions are imported as field choice lists. Grooper-specific extended properties (computed values, required conditions, etc.) are imported if enabled.

New Data Action: Concat

The Concat Data Action combines adjacent records in a collection based on a configurable trigger expression. It is designed to resolve cases where a section instance was incorrectly extracted as two separate partial instances (for example, when an EOB claim spans across two pages and the section header is repeated on page 2).

How it works:

  1. Iterates the collection in reverse order, evaluating the Trigger expression for each pair of adjacent records.
  2. If the Trigger returns true, the two records are merged: child fields from the second record are copied to the first (preserving non-blank values), child collections are merged, and the second record is removed.

FYI

Concat was designed to resolve issues with AI Collection Reader, but its uses extend beyond that. It will concatenate any section instances or table rows according to its Trigger condition.

New AI Configuration Object: Data Generator

Data Generator is a new AI configuration object that defines generative AI settings for LLM-powered features, configured via various features' Generator property. Key settings include:

  • Model — Determines which LLM model is used.
  • Temperature, TopP, Presence Penalty, Frequency Penalty — Standard LLM parameter controls.
  • Reasoning Effort — Specifies computational effort and depth of reasoning for reasoning-enabled models.
  • Service Tier — Specifies quality-of-service level for APIs that support it.
  • Verbosity — Controls how verbose the output should be.
  • Use Structured Output — Enables structured output mode for JSON responses. Generally more reliable, enforcing how the LLM responds and how Grooper parses its response.
  • Always Inject Schema — When enabled alongside Structured Output, always includes the schema in every prompt for improved consistency and accuracy (at the cost of higher token usage).
  • Max Tries — How many times Grooper will attempt to re-issue a request if an error occurs.
GPT-5 Model Parameter Support

Support for gpt-5 model parameters has been added to the Parameters settings of LLM-enabled features: Reasoning Effort, Service Tier, and Verbosity. These settings apply to gpt-5 models only.

User Experience Improvements

Upload Documents from the Batches Page

A new Upload button is available in the context toolbar at the top of the Batches page.

This button allows users to upload one or more files and select a Batch Process. Grooper will create a paused Batch with each file attached.

  • This is the easiest way to perform transactional document processing in Grooper.
  • Users processing only a few local documents no longer need to configure an Import Provider from the Imports page.

Search Text in Any Document Viewer

Text can now be searched in any document open in the Document Viewer using the Document Searcher. Press the Search Document button to open a search box at the bottom of the image. Type a search term and Grooper will highlight all matches. Multiple results can be navigated using the arrow controls.

FYI

Regex searches are supported. Enter a pattern between two forward slashes, e.g. /regex pattern/

"Reports" Tab Replaces Content Type "Summary" Tab

The Summary tab on Content Types has been replaced with a Reports tab. The Summary tab had become cluttered and limited the ability to add new information. The Reports tab allows users to select exactly what details they want to review.

Available reports:

  • Circular Expressions
  • Data Elements
  • Descendants
  • Expressions
  • Property Overrides
  • Validation Rules

FYI

Data Model Variables have been added to the Expressions report.

New Generative AI Commands for Convert Data Activity

Three new AI commands simplify the process of configuring Data Model conversions in the Convert Data activity:

  • Create Convert Actions — Creates a set of Data Actions based on a natural language prompt.
  • Create Copy Actions — Adds a set of Copy actions to an Action List based on a prompt.
  • Create Child Actions — Adds Data Actions to a Copy or Append action based on a prompt.

Source Type and Target Type must be configured for these commands to function.

Efficiency Improvements

Activity Processing Improvements

Several changes have been made to Grooper Activity Processing services to improve efficiency:

  • Reduced overhead for idle Activity Processing services.
  • New Idle Sleep Time property controls the time to wait between each polling cycle.
  • Improved internal query behavior when services are idle and polling for tasks.

Dispose Batch Improvements

The Dispose Batch activity now includes a Remove Job History property.

  • Increases performance when archiving Batches for long-term storage.
  • Keeps the Grooper database free of unnecessary processing history clutter.

Other Changes

Export Improvements

  • Filename property — An optional expression that generates the filename for an exported file.
  • Trigger property — All Export Formats now have a Trigger property, allowing selective generation of output files (for example, conditionally including PDF and/or TIFF versions based on client criteria).
  • ZIP Format changes — Now supports two sources for files in a ZIP: child attachments (via Include Child Attachments) and Export Formats (via Included Export Formats). A new Custom Filename expression allows individual files in the ZIP to be named precisely.
  • Max Retries — When configured, Grooper will reattempt an export if the Export activity encounters an error. Useful for recovering from transient network interruptions. Default is 0 (no retries).

XML Processing and Transform Improvements

Several improvements have been made to XML processing in Grooper 2025.

XML Transform activity:

  • Now supports XML namespaces.
  • A new AI transform helper uses an LLM to generate an XSLT transform. Access it by adding XML Transform to a Batch Process, navigating to the step's XSLT Editor tab, and clicking the AI helper button.

New XML file commands:

  • XML File > Split — Splits an XML file into new documents using XPath selectors. One child document is created per selected node.
  • XML File > Condition XML — Strips unwanted XML elements from an XML file.

XML Schema Importer improvements:

  • Added support for references to other schemas. All XSD resource files must be in the same folder in Grooper.
  • Improved support for non-attribute fields.
  • Now imports comments for Data Elements and enumeration values (List Values).

New Reclassify Mode: Primary

A new Primary reclassify mode for the Classify activity sets the primary Content Type while leaving Secondary Types intact. The default Overwrite mode clears Secondary Types.

Data Conversion for Related Content Types

When changing a document's type to a related Content Type (one that shares a Data Model in the same Content Model), Grooper can now preserve the data the two types have in common. Enable this using the Convert Existing Data property found on the Classify activity, the Assign Document Type command, and the Edit Type Assignment command.

New Value Type: GeoPoint

GeoPoint is a new Value Type for representing geographic fields, storing a pair of latitude/longitude coordinates (latitude: -90 to 90; longitude: -180 to 180). The Data Grid displays a maps icon for fields of this type, launching Google Maps for single points or geojson.io for arrays of GeoPoints.

GeoPoint enables the Search page's Geo Distance and Geo Intersects filter types for location-based search queries.

Token Usage Statistics

LLM-based extract methods now log token consumption — input tokens, output tokens, and total tokens — for all activities that utilize an LLM, including Classify (LLM Classifier), Separate (AI Separate), Extract (AI Extract), and Fill Data (AI Extract).

Batch-Level Data Access for Next Step Expressions

The Next Step Expression on Batch Process Steps can now access field values. If the Batch Process has its Content Type property set, that Content Type's data is exposed in a property named Data.

Example: If(Data.InvoiceAmount > 5000, ReviewL2, ReviewL1) — if the Batch's InvoiceAmount field is greater than 5000, the document routes to ReviewL2; otherwise to ReviewL1.

The Data property can only access Batch-level fields. It cannot access child Batch Folder data.

Data Lookup: Populate Collections

The Data Lookup action can now populate Data Tables and multi-instance Data Sections. New properties include Target Collection and Clear Existing. When Target Collection is set, lookup results populate the collection with one record per row in the result set. Disable Clear Existing to append records rather than replacing them.

Ask AI and AI Schema Extractor

The Ask AI extractor has been split into two separate Value Extractors:

  • Ask AI — Dedicated to generating natural language responses from an LLM. Use for summarization, evaluation, and other responses best expressed in human-readable text.
  • AI Schema Extractor — Dedicated to generating responses that conform to a JSON schema. Use for highly structured output such as extracting tables, lists, or records with multiple fields. Enables Structured Output for OpenAI models that support it. A new Selector property allows an array to be selected as the output.

System Maintenance Commands Split

The System Maintenance command has been split into two separate commands:

  • Rebuild Indexes — Rebuilds database indexes that exceed a configurable fragmentation threshold.
  • Database Cleanup — Purges old statistics, event logs, jobs, tasks, machines, and similar data.

The System Maintenance Service has been redesigned accordingly.

Run the System Maintenance commands or System Maintenance Service regularly. Without regular maintenance, performance will degrade due to index fragmentation and historical data buildup.

AI Search Usage Data

AI Search usage data is now visible in Grooper. Index data (usage metrics and limits) is visible for each Indexing Behavior.

LLM Fine-Tuning Enhancements

  • The JSONL viewer now provides a UI for editing messages.
  • The Build Fine Tuning File command now saves generated JSONL files in the Local Resources Folder.
  • The Submit Fine Tuning Job and Delete Fine Tuned Model commands can now be executed on Resource Files with a .jsonl extension.

New Hide Numeric Suffix Property for Content Types

A new Hide Numeric Suffix property for Content Types hides the numbers added at the end of a document's title (e.g., "Invoice (1)" becomes "Invoice"). This is useful for footnote links generated by AI Assistants when generating a custom title using a Content Type's Caption property.

Keep Low Confidence Fields Property

A new Keep Low Confidence Fields property for Data Fields and Data Columns retains extraction results that fall below the Minimum Confidence threshold rather than discarding them. The field is flagged for human verification, allowing reviewers to see all extraction results — even low-confidence ones.

New Labeled OMR Direction Filter Properties

Two new direction filter properties are available for Labeled OMR:

  • Box Label Direction — Specifies the directions (North/South/East/West) in which checkboxes should be relative to their label. Example: If checkboxes are always to the left of their label, set Box Label Direction to West.
  • Group Label Direction — Specifies the directions in which checkbox groups should be relative to a header label. Example: If a group of Yes/No checkboxes always appears below a header label, set Group Label Direction to South.

Properties Can Be Copied and Pasted Across Object Types

Properties may now be copied and pasted across different object types in Grooper.

Improved Table Schema Format Options for AI Extraction

Data Tables now have a Schema Format configuration in their AI Extract Table Options, controlling how the table is represented in the schema sent to an LLM:

  • ObjectArray — Default. Behavior matches previous versions.
  • Tabular — Reduces token usage compared to ObjectArray. A middle ground between ObjectArray and CSV.
  • CSV — Most compact. Maximally reduces token usage for large tables.

Structured output is strongly recommended when using Tabular or CSV formats.

FYI

AI Extract Table Options appear only when the Data Table's Extract Method is set to "AI Table Reader" or its parent Data Model or Data Section has an "AI Extract" fill method configured.

Document Collapse Functions Now Support Azure DI Data

Document collapse functions (the Batch Folder > Collapse command and the Merge activity when Clear On Completion is enabled) now preserve DI JSON files for each page when child pages are removed, merging them to the parent Batch Folder alongside other Grooper-generated artifacts.