AI Search and the Search Page

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

2024

AI Search is a Repository Option that enables Grooper's document search and retrieval features in the Search page. Once enabled, Indexing Behaviors can be added to Content Types (such as stacks Content Models), which will allow users to submit documents to a search index. Once indexed, documents can be retrieved by full text and metadata searches in the Search Page.

The Search Page allows users to leverage AI Search indexes to query indexed documents. Both full text and metadata searches are supported, with feature rich querying and filtering capabilities. Users can interact with search results in several ways. They can view documents in the Document Viewer, review documents' extracted data, create new inventory_2 Batches from the result set, submit processing jobs, start a conversation with an psychology AI Assistant and more.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2025). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

PLEASE NOTE: If you plan to following along with examples in this article by using the provided Grooper ZIP files, be sure to upload the following Project ZIP file into Grooper before uploading the Batch ZIP file.

About

AI Search is a Grooper Repository Option that enables document indexing for content stored in a Grooper Repository. Using Microsoft Azure AI Search, Grooper indexes documents and their data so users can search and retrieve them from the Grooper Search Page.

Once all the requisite components are configured, documents can be indexed according to their Content Type and searched using the Search Page’s query and filtering options.

With Grooper’s AI Search, users can quickly index documents and extracted data, making document collections easier to search, retrieve, and understand.

About Microsoft Azure AI Search

Grooper's AI Search functionality is built using Azure AI Search services. Azure AI Search is a cloud-based search-as-a-service solution provided by Microsoft Azure. It has allowed our developers to build a sophisticated search experience into Grooper. Here are some key features and capabilities:

Full-Text Search: Azure AI Search supports full-text search with capabilities like faceting, filtering, and scoring, allowing users to search through large volumes of text efficiently.
Customizable Indexing: Developers can define custom indexes tailored to their specific data schema. This flexibility allows for a more relevant and precise search experience.
Scalability: The service can scale up or down based on the workload, making it suitable for applications of all sizes.
Security and Compliance: Azure AI Search ensures data security and compliance with industry standards, offering features like role-based access control (RBAC), data encryption, and integration with Active Directory.

Need to set up your own AI Search service in Azure? Check out our quickstart guide to get started

External links

Create an AI Search service in the Azure portal
Azure AI Search full documentation
Pricing - Note: Grooper's AI Search integration does not currently use any of the "Premium Features" that incur additional costs beyond the hourly/monthly pricing models.
How to choose a service tier
Service limits - Note: This documents each tier's limits, such as number of indexes, partition storage, vector index storage and maximum number of documents.

The setup

Integrating Azure AI Search with Grooper will require a few setup steps:

Create an Azure AI Search Service
- This is the only step done outside of Grooper.
Configure the AI Search Repository Option
Configure an Indexing Behavior on a Content Type
Create the search index
Index documents and data from Grooper

Once these steps are complete, you will be able to leverage the Search Page, a powerful interface to allow you to retrieve documents and data from Grooper.

Create an Azure AI Search service

You will need the Azure AI Search service's "URL" and either "Primary admin key" or "Secondary admin key" to Configure the AI Search Repository Option. These values can be found by accessing the Azure Search service from the Azure portal.

Check the "Configure authentication" section of the "Create an Azure AI Search service in the portal" linked above.

Quickstart guide

If not done so already, create an Azure account.
Go to your Azure portal
If not done so already, create a "Subscription" and "Resource Group."
Click the "Create a resource" button
Search for "Azure AI Search"
Find "Azure AI Search" in the list click the "Create" button.
Follow the prompts to create the Azure AI Search resource.
- The pricing tier defaults to "Standard". There is a "Free" option if you're just wanting to try this feature out.
Go to the resource.
In the "Essentials" panel, copy the "Url" value. You will need this in Grooper.
In the left-hand navigation panel, expand "Settings" and select "Keys"
Copy the "Primary admin key" or "Secondary admin key." You will need this in Grooper.

Additional info

The following article from Microsoft instructs users how to create a Search Service.
Microsoft's full Azure AI Search documentation is found here.
You will need the Azure AI Search service's "URL" and either "Primary admin key" or "Secondary admin key" to connect Grooper to the Azure AI Search service by adding a Grooper AI Search "Repository Option" to the Grooper Repository. URL and primary/secondary admin key values can be found in Azure by searching for the Search service in the Azure portal (portal.azure.com)

Walkthrough

This walk through is assuming the use of a free trial subscription.
- You may need to alter your choices depending on your needs. Click the "start a trial subscription" link.
On the subsequent page, click the "Try Azure for free" button.
Sign in to your Microsoft account.
Check the first check box to agree to the subscription agreement as it is required to move forward, then click the "Next" button.
Identity verification needs to be done by phone and credit or debit card.
- First, enter your telephone information, then click your preferred method of verification.
Once received, enter the provided "Verification code" in the appropriate field, then click the "Verify code" button.
Next, we need to provide credit or debit card information.
- As stated, this will not charge your card. Provide the appropriate card information, then click the "Sign up" button.
Once you have established a subscription, navigate to the Azure portal page, then click the "AI Search" button.
Within the "Azure AI service - AI Search" page, click the "Create search service" button.
We need to provide several key configuration elements on the "Create a search service" page.
- First, choose a "Location". You may want to double check your pricing tier, so click the "Change Pricing Tier" link.
On the "Select Pricing Tier" page, choose a Pricing Tier that is best for your application, then click the "Select" button.
Back on the "Create a search service" page, provide a unique name in the "Service name" field.
Next, click the "Create new" link and provide a unique name for the "Resource group" field.
- Once complete, click the "Review + create" button.
Verify the configured properties, then click the "Create" button.
You will be taken to an "Overview" of your newly created search service.
- Take note of the information given here, then click the "Go to resource" button.
You will be taken to an "Overview" of the Azure AI Search service.
- There are several key pieces of information given here, but the most critical for our purposes is the "Url". Copy this information as we will later need it in Grooper.
Click the drop-down arrow to the left of the "Settings" node to expand its contents, then click on the "Keys" node.
Within the "Keys" object, take note of the "API Access control" options.
- The default choice of "API keys" will suffice for our purposes.
- Copy the "Primary admin key" as we will later need it in Grooper. Under certain circumstances, however, you may wish to use the "Secondary admin key".

Configure the AI Search Repository Option

To search documents in Grooper, we use Azure's AI Search service. In order to connect to an Azure AI Search service, the AI Search option must be added to the list of Repository Options in Grooper. Here, users will enter the Azure AI Search URL endpoint and an admin's API key. Both of these can be obtained from the Microsoft Azure portal once you have added an Azure AI Search resource.

With AI Search added to your Grooper Repository, you will be able to add an Indexing Behavior to one or more Content Types, create search indexes, index documents and search them using the Search Page.

Select the Root node in the Node Tree, then click the ellipsis button for the Options property.
In the "Options" window, click the "Add" button, then select "AI Search" from the drop-down menu.
You'll notice an AI Search option is added to the collection.
The Search service resource from Azure should have displayed a URL.
- Put that URL into the URL property of the AI Search option. The same Search service resource from Azure should also have displayed an API key. Put that API key into the API Key property of the AI Search option.
Click the "OK" button in the "Options" window.
Click the "Save" button to save the changes made to the Root node.

Configure an Indexing Behavior on a Content Type

An Indexing Behavior allows documents (folder Batch Folders) to be indexed via AI Search. Once indexed, users can search for and retrieve documents from the Search Page.

Before indexing documents, you must add an Indexing Behavior to the Content Types you want to index. Most typically, this will be done on a Content Model. All child Document Types will inherit the Indexing Behavior and its configuration.

Expand the node tree and select the "AI-Search" Content Model from the supplied Project, then click the ellipsis button for the Behaviors property.
- Before indexing documents, you must add an Indexing Behavior to the Content Types you want to index.
  - Most typically, this will be done on a Content Model.
- All child Document Types will inherit the Indexing Behavior and its configuration.
- More complicated Content Models may require Indexing Behaviors configured on multiple Content Types.
In the "Behaviors" window click the "Add" button, then select "Indexing Behavior" from the drop-down list.
- An Indexing Behavior is a Content Type Behavior designed to enable the ability for inherited Content Types to be indexed via the AI Search functionality.
You will notice an Indexing Behavior is added to the collection.
Before configuring anything, let's step through the properties of an Indexing Behavior to get a sense of what they do. The Name property determines the name of the index in Azure.
- The default given is a normalization of the name of the selected Content Type.
- Be aware, Azure has some naming rules for its index names and metadata field names.
- Names must be all lower case.
- They can only contain letters, numbers, dashes or underscores.
- They cannot contain consecutive dashes or underscores.
- And they may be between 2 and 128 characters long.
- Included Elements determines what Data Elements are added to the search index.
Built in Fields determines what Built in Fields are added to the search index.
- Note, if you leverage any of these built in fields and also want to use Included Elements there cannot be naming conflicts between the Included Elements and the Built in Fields.
  - Typically using "All" built in fields is fine.
Let's take a look at each built in field. First is the Content. This is the full text content of the Batch Folder.
- This would be the text generated by the Recognize activity.
- The Attachment Name is the document's attachment filename.
  - This would be the original name of the file as it existed before being acquired by Grooper.
Type Name is the name of the Batch Folder's classified Content Type.
- Page Count is simply the number of pages within the indexed Batch Folder.
Flag Message is the flag message associated with the document.
- This would include auto-generated messages like whether or not "required" fields were empty, a type of validation error, or even null.
- Path is the path in the "Batches" folder of the node tree where the Batch Folder exists.
Computed Fields is a collection of computed fields.
- In this collection you can leverage expressions to dynamically generate data to be indexed.
- Usage stats will only show data once the index is created and has indexed data.
  - It is general statistical summary of usage of the index.
Vector Search enables or disables vector search, which facilitates retrieval operations during chat sessions on the Chat Page.
- If set, this property has sub-properties including an embeddings LLM model to choose.
- You will need an LLM Connector Repository Option on the Root node to leverage this property.
- Page Limit determines the maximum number of pages to include when indexing the full text content of a document.
Flatten specifies that the search index should be flattened.
- "Flattening" a search index generally refers to the process of transforming a hierarchical or nested data structure into a flat, non-hierarchical structure.
  - In the context of a search index, this could involve several different actions depending on the specific needs and the data structure being indexed.
- Auto Index requires some consideration.
  - If set to True, it specifies that the Indexing Service should automatically add new documents to this search index, remove deleted documents from the index and update "changed" documents present in the search index.
  - When set to False, the default setting, the Indexing Service will not add new documents to the search index.
    - However, it will still remove deleted documents and update any "changed" documents already in the search index.
    - A "changed" document is one whose index metadata changes.
      - If the data of any of the Included Elements or the Built in Fields change, the Indexing Service will update the documents index data. We will discuss this property and the associated Indexing Service later.
HTML Body Selector allows you to insert CSS objects to select sub-elements of the content of the Batch Folder instead of the entirety of its text content.
Access List is a collection of users of whom can access the index.
- If none are set, any and all users in your active directory can access the index.
- Use this to limit who can access this index if it contains sensitive information.
- Generators is a collection of large language models that can generate electronic documents based on prompts.
With an understanding of the properties of the Indexing Behavior, lets configure this behavior for our purposes. The default given to the Name property will work for our needs.
- Feel free to change it, but follow the naming convention rules for an Azure index when doing so.
- When done naming the index, click the ellipsis button for the Included elements property.
In the "Included Elements", window alt-click the Data Model check box to select it and all its child Data Elements.
- When finished, click the "OK" button to close the "Included Elements" window.
Click the "OK" button in the "Behaviors" window to close it.
Click the "Save" button to save the changes made to the "AI-Search" Content Model.

⚠	The Name property determines the index's name in Azure. This name: Must be all lower case. Can only contain letters, numbers, dashes (`-`) or underscores (`_`) Cannot contain consecutive dashes or underscores. May be between 2 and 128 characters long.

Create the search index

This will create the search index in Azure. Without the search index created, documents can't be added to an index. This only needs to be done once per index.

Expand the Node Tree and select the "AI-Search" Content Model from the provided Project.
Right-click the "AI-Search" Content Model, and from the pop-out menu choose "Search" then "Create Search Index".
Click "Execute" in the "Create Search Index" window.
If you go to the Advanced tab of the Root node, you will notice an "AI Search.json" file.
- This file is created upon the first successful execution of the "Create Search Index" object command of a Content Type.
- Any other executions of that command will add to this file.
- Double click the file to view its contents in a new browser tab.
In the JSON you will see a key-value pair.
- The "Key" is the name of the search index that was established in the Indexing Behavior.
- The "Value" is the node ID of the Content Type with the Indexing Behavior that was used to create the named search index.
While the steps requisite to create the search service from Grooper are complete, it can be helpful to understand what this has done in Azure.
- To see the change reflected, navigate to the Azure portal page in your browser, then click on the Search service.
Expand "Search management", then Select "Indexes".
You will see the search index we created from Grooper listed within Azure.

IMPORTANT: AISearch.json

The first time you execute the "Create Search Index" object command against a Content Type, not only will a search index be added to Azure, but a file called "AISearch.json" will also be created and attached to the Grooper Root node. It contains simple key-value-pairs of the name of the search index and the GUID of the Content Type it belongs to. You can view this file on the Advanced tab of the Grooper Root. The search indexes you can select from the Search page are the entries from this file. You will also see search indexes listed on the Indexes tab of the Grooper Root node.

Each subsequent execution of the "Create Search Index" command will add key-value-pair entries to the "AISearch.json" file as well as create a search index in Azure.

Anytime you use the "Delete Search Index" object command against a Content Type it will remove its corresponding entry from the "AISearch.json" file on the Grooper Root node as well as remove the search index from Azure.

WARNING
If you delete a Content Type that a search index was created against before using the "Delete Search Index" command it will "strand" its entry in the "AISearch.json" file as well as in Azure. You will have to open the "AISearch.json" file by going to the Advanced tab of the Grooper Root and double clicking the file. From there you can remove the key-value-pair entry for the Content Type that no longer exists then save the changes. You will also have to manually delete the search index from your Azure portal.

WARNING
If you delete a search index from Azure that was created from Grooper before using the "Delete Search Index" command it can cause issues.

The "Delete Search Index" object command of a Content Type will not function.
There will be a "stranded" entry in the "AISearch.JSON" file.
- This means a user could attempt to select and use the search index from the Search page of Grooper, which will cause an error.

You will have to manually remove the entry from the "AISearch.json" file on the Grooper Root node to rectify this issue.

Add documents to a search index

With the search index created you can now add documents and their data to the search index. Documents must be classified in Grooper before they can be indexed. Only Document Types/Content Types inheriting an Indexing Behavior are eligible for indexing.

There are four ways to index documents:

"Add to Index" Batch Folder Object Command - This is a right-click command applied to a single document. It will index a single document. This method is best for one-off testing and submitting small numbers of documents to an index.
"Submit Indexing Job" Content Type Object Command - This is a right-click command executed from the Content Model/Content Type configured with the Indexing Behavior. It will index all documents currently in the Grooper Repository that inherit the Content Type's Indexing Behavior. This method is useful to index a large number of documents that already exist somewhere in a Grooper Repository.
Execute Activity with "Add to Index" command - This gives us a way to index documents in a Batch Process. When configured with an "Add to Index" command, the Execute activity will apply that command to each document in a Batch (therefore indexing them). This is a great way to automate document indexing at a specific point within a Batch Process's flow.
Indexing Service - This is the "set it and forget it" method for document indexing. The Indexing Service is a Grooper service that runs in the background, periodically polling the Grooper Repository for new documents that need to be indexed, documents whose data has changed to update the index and documents that have been deleted to be removed from the index. This is a great way to automated document indexing in the background.

"Add to Index" Batch Folder Object Command

This is the most manual way of adding to the search index. This is an object command done on a per document basis (or via multi-selecting) in a Batch Viewer.

With the search index created you can now add data to the search index.
- There are four ways to index document data from Grooper.
- In this lesson we will focus on the first, which is using the "Add to Index" object command on a classified Batch Folder.
- Documents must be classified in Grooper before they can be indexed.
- Only Content Types inheriting an Indexing Behavior are eligible for indexing.
- Expand the Node Tree and select the provided Batch, then click the Viewer tab.
Notice the Batch Folders are classified as Document Types within the Content Model from the provided Project.
- These documents also have extracted data from the Data Model of the same Content Model.
- When we index this classified Batch Folder it will submit its metadata as well as its extracted document data.
Right-click a Batch Folder in the Batch Viewer, and from the pop-out menu select "Search" then "Add to Index".
Click the "Execute" button in the "Add to Index" window.
- Because there is no data in the search index yet, this will add all the selected Batch Folder's metadata and extracted document data to the search index.
- Were you to perform this exact same operation again after this data existed in the search index, nothing would happen as it would determine that there was no difference between what was being submitted and what already existed in the search index.
  - This is important to know because the only time data will be added or removed from the search index is when there is a difference between the source data in Grooper and what exists in the search index.
To see how this has affected Azure, go to your Azure portal and select your search index.
- Expand the "Search Management" node and click "Indexes", then click on the search index.
From within the search index we can perform searches.
- In the future we will be doing all of this from Grooper, but it can be useful to understand how the destination infrastructure is affected.
- Leave the search query blank, which is equivalent to a completely open search, then click the "Search" button.
Notice the indexed data in the Results panel as JSON format.
- Only one Batch Folder's data was added using the "Add to Index" object command, so only its data will be displayed.

"Submit Indexing Job" Content Type Object Command

This is another manual approach as it also involves an object command. Because this command is applied to a Content Type, however, it will index all documents that are classified as that Content Type or inherit from it.

Please visit the Grooper Command Console article for information on installing appropriate services.

Before we submit a job, we need to make sure an Activity Processing service is running for our Grooper repository.
- To verify, click the Machines folder in the Node Tree.
- With your machine selected, you should see an instance of a Grooper Activity Processing service is running.
- If this service is not installed and running, nothing will happen after we submit the indexing job.
With that check done, let's execute the "Submit Indexing Job" object command of a Content Type, which in our case will be the a Content Model.
- Expand the Node Tree and select the "AI-Search" Content Model from the provided Project.
Right-click the "AI-Search" Content Model, and from the pop-out menu select "Search" then "Submit Indexing Job".
- Any and all Batch Folders that exist within this Grooper repository that are classified as Document Types that inherit from this Content Model will be considered for this job.
Notice the "Search Index Statistics" section in the "Submit Indexing Job" window.
- The provided Batch has 25 Batch Folders in it.
- One Batch Folder's document data has been submitted to the search index so the "Indexed Documents" property is showing "1".
- There are 25 Batch Folders from the provided Batch that are classified as Document Types that belong to this Content Model.
- Given that one has already been indexed, the "Added Documents" property will display "24".
- No Batch Folder's data that has already been indexed has changed, so "Updated Documents" will display "0".
- No indexed Batch Folders have been deleted, so the "Deleted Documents" property will display "0".
With an understanding of the "Search Index Statistics" we can now click the "Execute" button to submit an indexing job.
- The Activity Processing service will pick this job up and process all of its tasks in the background.
- Feel free to go to the Jobs page to verify its status.
Once the job is complete, if you go to your search index in Azure, you'll notice there is much more data in the search index now as twenty-four more documents' data has been added.

Execute Activity with "Add to Index" command

This is an automated approach as it will create an Indexing Job as part of a Batch Process. This will perform the exact same command as the "Add to Index" object command explained earlier. When an Execute step is reached in a Batch Process a job will be created with a task for each document in scope.

If the document in scope does not need to be added, updated, or deleted, no task will be created. If that is true for all documents in scope, no job will be created.
An Activity Processing Service will need to be installed and running for the given repository in order for the job to be picked up and worked.

Expand the Node Tree and select the "Execute" Batch Process Step. Notice the Activity property is set to "Execute".
Click the ellipsis button on the Steps property to understand its configuration a little more.
In the "Steps" window you will notice an "Execute Command" action has been added to the collection.
Notice the "Object Type" is "Batch Folder".
Also notice the "Command" is "Add to Index".
- To summarize we're looking at a Batch Process Step using the Execute activity which leverages an Execute Command action pointing at Batch Folders using the "Add to Index" command.
- This will perform the exact same command as the "Add to Index" object command of a Batch Folder, but instead it will be automated as part of a Batch Process.
- When an Execute step is reached in a Batch Process a job will be created with a task for each document in scope.
- If the document in scope does not need to be added, updated, or deleted, no task will be created.
- If that is true for all documents in scope, no job will be created.

Indexing Service (Our best practice method)

This is the most automated way to index documents. The Indexing Service will periodically poll the Grooper database to determine if classified documents that inherit from a Content Type with an Indexing Behavior need to be added, updated, or deleted. If it does, it will submit an Indexing Job with tasks for each document that needs to be added, updated, or deleted.

Keep in mind how the Auto Index property of an Indexing Behavior affects this service.

If set: will add, update, and/or remove documents from the index
If not set: will only remove deleted documents or update changed documents already in the search index

Also keep in mind:

An Indexing Service will need to be installed and running for the given repository in order for the job to be "Indexing Job" to be created.
An Activity Processing Service will need to be installed and running for the given repository in order for the job to be picked up and worked.

Please visit the Grooper Command Console article for information on installing the appropriate services.

Expand the Node Tree and select the "AI-Search" Content Model from the provided Project, then click the ellipsis button for the Behaviors property.
Let's review the Auto Index property.
- This is the most automated way to index documents.
- An Indexing Service will periodically poll the Grooper database to determine if classified documents that inherit from a Content Type with an Indexing Behavior need to be added, updated, or deleted.
  - If it does, it will submit an "Indexing Job" with tasks for each document that needs to be added, updated, or deleted.
- If Auto Index is enabled, or True, additions, updates, or deletions of document data will be recognized.
- If disabled, or false, only document deletions or changes will be recognized.
Enable the Auto Index property then click the "OK" button.
Click the "Save" button to save the changes made to the "AI-Search" Content Model.
Expand the Node Tree and select the Machines folder.
- Ensure a "Grooper Indexing Service" is installed and running.
- This is the service that will do the polling of the Grooper Database to check for additions, changes, or deletions to an established search index.
- This service will create a job with a task for each requisite Batch Folder.
You may want to give the properties of this service a review.
- They're pretty straight forward and basically have to do with things like how often do you want this service to poll the database, and or do you want the job created by the indexing service to be deleted upon completion, etcetera.
You will also need to ensure a Grooper Activity Processing Service is installed and running.
- This service will do the processing of the job with indexing tasks created by the Indexing Service.

Recreate search index after Data Model change

If the Data Model of a Content Model with a Search Behavior has been changed, you will need to update the search index of that Content Model.

It's very common that you may want to make add or remove Data Elements to your Data Model after you have created a search index and even indexed document data.
- This is not the same as changing the way currently existing and indexed Data Elements extract.
  - In that situation you can simply re-index newly extracted Batch Folders.
- A change in the data structure will require that you delete, recreate, and reindex the search index.
- To see this in action, let's make a change to our Data Model.
  - Expand the Node Tree and select the Data Model from the provided Project.
Let's add a Data Section for organizational purposes.
- Right-click the Data Model, and in the pop-out menu select "Add" then "Data Section".
Name the Data Section in the "Add" window, then click the "Execute" button.
Set the Scope property of the newly created Data Section to "SingleInstance", then set the Miss Disposition property to "DefaultToParent".
Click the "Save" button to save the changes made to the "Header Values" Data Section.
Select the four Data Fields within the Data Model, then left-click and drag them into the "Header Values" Data Section.
Click the "OK" button in the "Confirmation" window.
Right-click the "Header Values" Data Section, and in the pop-out menu select "Move Up" to put it above the "LineItems" Data Table.
Let's add a new Data Field to this new Data Section.
- Right-click the "Header Values" Data Section, and in the pop-out menu select "Add" then "Data Field".
Name the Data Field in the "Add" window, then click the "Execute" button.
Use the "Move Up" object command on the "PO Number" Data Field a few times to move it up beneath the "InvoiceNo" Data Field.
Set the Value Extractor property to "Labeled Value", then click the ellipsis button for the Value Extractor property set to "Labeled Value".
In the "Value Extractor" window set the Label Extractor property to "List Match", then click the ellipsis button for the Label Extractor property set to "List Match".
In the "Local Entries" field of the "Label Extractor" window enter the following:
- P.O. No.
- PO #
- When configuration is complete, click the OK" button.
In the "Value Extractor" window set the Value Extractor property to "Reference" and expand its subproperties.
- Point the Extractor property at the "VAL - Generic Text Segment - No Labels" Data Type.
When configuration is complete, click the "OK" button.
Click the "Save" button to save the changes made to the "PO Number" Data Field.
We need to re-run Extraction on our Batch Folders after having made a change to the Data Model.
- Expand the Node Tree and select the provided "AI-Search" Batch, then click the Viewer tab.
Select all of the Batch Folders in the Batch Viewer of the Viewer tab.
Right-click the selected Batch Folders, and in the pop-out menu select "Run Activity" then "Document Processing" then "Extract".
Click the "Execute" button in the "Extract" window.
With our Data Model adjusted, and our Batch Folders newly extracted, we need to delete the search index and adjust our Content Model's Indexing Behavior.
- Expand the Node Tree and select the "AI-Search" Content Model.
Right-click the "AI-Search" Content Model, and in the pop-out menu select "Search" then "Delete Search Index".
Click the "Execute" button in the "Delete Search Index" window.
- This will permanently delete the search index from Azure.
Click the ellipsis button for the Behaviors property.
Click the ellipsis button for the Included Elements property in the "Behaviors" window.
The previous Data Elements should still be selected, but we need to add the newly created Data Section and Data Field.
- Confirm all the Data Elements are selected, then click the "OK" button in the "Included Elements" window.
Click the "OK" button in the "Behaviors" window.
Click the "Save" button to save the changes made to the "AI-Search" Content Model.
We can now recreate our search index.
- Right-click the "AI-Search" Content Model, and in the pop-out menu select "Search" then "Create Search Index".
Click the "Execute" button in the "Create Search Index" window.
We can now reindex all associated Batch Folder's document data.
- Right-click the "AI-Search" Content Model, and in the pop-out menu select "Search" then "Submit Indexing Job".
Click the "Execute" button in the "Submit Indexing Job" window.
- Give the job a moment to be processed, or feel free to go to the Jobs page to watch the progress from there.
Once the indexing job is complete you can go to your index in the Azure portal and run an open search.
- You'll notice the "Header Values" hierarchy as well as the "PO Number" indexed field.

Search Page

The Search Page allows users to leverage AI Search indexes to query indexed documents. Both full text and metadata searches are supported, with feature rich querying and filtering capabilities. Users can interact with search results in several ways. They can view documents in the Document Viewer, review documents' extracted data, create new inventory_2 Batches from the result set, submit processing jobs, start a conversation with an psychology AI Assistant and more.

The Search page is Grooper's document search and retrieval interface.

Once documents are indexed, users can search for them from the Search page. Indexing documents makes searching across massive document sets lighting quick.
The Search page uses a powerful set of query and filtering capabilities to retrieve documents, using full text and metadata searching techniques.
A robust set of commands are available from the Search page. You can review document data, submit results for further processing, create Batches and more.

Search page UI overview

Understanding the layout of the Search Page is crucial to using it properly.

In the top right of the UI is a drop-down button which will allow you to choose which search index you'll be working with in the Search Page.
In the top-left of the UI is the Query Editor.
In the top-right of the Query Editor are several command buttons related to the written query.
Within the Query Editor are the different fields by which you can craft a query.
"Include hit highlights" will highlight query results in the Document Viewer.
The top-right portion of the UI is a Document Viewer.
- This will display documents from selected results.
At the bottom of the UI is the Result List.
- Documents returned by a query will be displayed here.
Step through pages from a selected result with the Page Navigator in the top-middle of the Result List.
In the top-right of the Result List are "hit" navigation buttons.
- Here you can step through returned results from a selection in the Result List.
To the right of the "hits" buttons is the command button.
- This gives several options for commands that can be applied to a result selection.
To the right of the command button is the view button.
- This will allow you to change how the Result List is displayed.
If queries are saved, they'll be displayed in the collapsable list on the left of the UI.

FYI

Pro Tip: In the Results List, a search field's default column width is determined by the Data Field/Data Column's "Display Width" value.

Querying

The Search Page allows you to build a search query using four components:

Search: This is the only required parameter. Here, you will enter your search terms, using the Lucene query syntax.
Filter: An optional filter to set inclusion/restriction criteria for documents returned, using the OData syntax.
- BE AWARE: When filtering date values, dates must be entered in the format yyyy-MM-dd.
Select: Optionally selects which fields you want displayed for each document.
Order By: Optionally orders the list of documents returned.

Please see the Anatomy of a search query section below for further details on building queries on the Search Page.

When using the Search page the first thing you'll want to do is select a search index from the drop-down in the top-right of the screen.
- With a search index selected, lets get some comfort with the interface by starting with an open query.
- Click the spyglass icon on the top-left to execute the search.
At the bottom of the screen you will see a list of results.
- Above this list to the right is a drop-down that lets you choose how the result list is displayed.
- The default is the "list" view. Select a result from the result list.
Above the result list and to the right of the query editor you will see your selection displayed in a Document Viewer.
Change to the "wide" view and notice the different style of the result list.
Change to "card" view and notice the different style of the result list.
The card view has the ability to expand container elements.
- This is quite useful to view extracted information, as the other views will display the contents of container elements in a serialized JSON structure, which is harder to read.
Let's get comfortable with the Lucene syntax of the Search box.
- For now, type a simple term like "Oklahoma", then execute the query.
  - This is known as a keyword search.
Notice above the result list there is a hit indicator.
- For this first result, there are three matches, and the first is highlighted.
- You can click the arrows of the hit indicator to step through the hits.
In the Document viewer you will notice the current hit selection is highlighted green, while the other results are highlighted gray.
Let's expand our keyword search.
- Type "Oklahoma City" into the Search box, and notice the results you get.
Typing multiple terms into the Search box will look for each term independently.
- It's equivalent to the "OR" boolean operator, which we'll see again later.
If you surround terms in double quotes it will look for the full term, and not the individual words.
- This is a quoted phrase.
We can also search for extracted values.
- For our purposes, the invoice number Data Field is within a Data Section.
- We first need to type out the parent Data Section, then follow it with a forward slash, then the name of the desired element.
- You will probably notice the intellisense pop up as you type, which can be quite useful.
- Once you have a Data Element listed you follow it with a colon, then a space, then the value you're wanting to find.
The asterisk is used for wildcard searches.
You can use an asterisk between elements in the search as well.
Regular expressions can be used in the Search box.
- You will need to surround the expression with a forward slash on both sides.
A tilde immediately after a term will invoke fuzzy matching.
- You must follow the tilde with a 1 or a 2.
  - This number after the tilde determines the character edit distance.
There are several boolean operators that can be used.
- First is the "AND" operator.
  - This is used when both terms are expected to be present.
Next is the "OR" boolean operator.
- This is used when either term is expected to be present, but not specifically both.
Finally is the "NOT" boolean operator.
- This is used to search for the presence of the first term, while excluding the second.
When a tilde is placed after a quoted phrase it will invoke what's known as "proximity searching".
- The number after the tilde is the amount of terms allowed between the contents of the phrase to allow for a valid result.
For searches with multiple terms, each term is weighted equally as important.
- However, if you follow a term with a carat with a number it will affect its "weight".
  - A number above one will increase the "weight" of the term or phrase and push it to the top of the result list.
  - A decimal number between zero and one will reduce the weight of the term or phrase.
The filter box uses OData syntax to allow you to tighten your searches.
- If a Data Element is within a container like a Data Section you will need to use the lambda expression syntax shown to use comparisons.
Select also uses OData syntax.
- This will change the information displayed in the result list.
- It can be useful to eliminate "noise" by being specific about the data elements that are selected and displayed.
Order By also uses OData syntax.
- This is quite useful when you have multiple results and need the result list to be ordered in a specific fashion.

Search Page commands

There are several new commands users can execute from the Search page. These commands give users a new way of starting and continuing work in Grooper. These commands can be divided into two sets of commands: "result set commands" and "document commands"

Search Query Commands

These commands are accessed above the query editor. They are used in conjunction with written queries.

Execute Query - This will execute the query written in the Search, Filter, Order By, and Select parameters.
Clear Query - This will clear all query parameters.
AI Generate - This will allow the user to leverage an AI to help them craft a query.
Save - This will allow users to store and retrieve queries they use frequently.
Copy - This will allow users to copy a link to the clipboard for the current query.

Result Set Commands

These commands can be accessed from a dropdown list in the Search page UI. They can be applied to the entire result set or a selection from the result set.

Create Batch - Creates a Batch from the result set and submits an Import Job to start processing it.
Submit Job - Submits a Processing Job for documents in the result set. This command is intended for "on demand" activity processing.
Analyst Chat - Select an AI Analyst to start a chat session with the result set.
Download - Download a document, generated from the result set. May be one of the following:
- Download PDF - Generates a single bookmarked PDF with optional search hit highlights.
- Download ZIP - Generates a ZIP file containing each document in the result set.
- Download CSV - Generates a CSV file from the result set's data fields.
- Download Custom - Generates a custom document using an "AI Generator"

Document Commands

These commands can be accessed from the Search page when right-clicking a document in the result list.

Go to Document - Navigates users with Design page permissions to that document in the Grooper node tree.
Review Document - Opens the document in a Review Viewer with a Data View, Folder View and Thumbnail View.
Copy Link - Creates a URL link to the document. When clicking the link users will be taken to a Review Viewer with a Data View, Folder View and Thumbnail View.

Several commands are available for the search query editor.
- Execute Query - This will execute the query written in the Search, Filter, Order By, and Select parameters.
- Clear Query - This will clear all query parameters.
- AI Generate - This will allow you to use AI to help craft a query.
- Save - This will allow users to store and retrieve queries they use frequently.
- Copy - This will allow users to copy a link to the clipboard for the current query.
Click the hamburger menu icon just above the result list on.
- A menu will expand with several commands available.
Analyst Chat lets you connect with an AI Analyst.
Create Batch let's you create a Batch from the result selection.
Download allows you to download the result selection.
Submit Job allows you to process the result selection through a defined Activity.
Right-click a result from the result list.
- Several object commands are available.
Copy Link will put the URL to the selected result into your clipboard.
Go to Document will jump to the Batch Folder in the Node Tree of your Design environment.
Review Document will open a Review page with the result selection with Data, Folder, and Thumbnail views available.

Saving search queries

Saving queries can be quite useful. With saved queries, you can have a list of queries you can reference later without having to rebuild them.

Write a query you think you'll use frequently, then click the "Save" button above the query editor.
In the "Save Query" window name the query, then click the "OK" button.
The side panel will pop out and you will see your saved query. You can click the query at any time to have it populate the query editor.
Click the arrow to close the side panel.
Click the arrow to open the side panel.

Anatomy of a search query

Documents are retrieved by executing a search query in the search page. These queries will return one or more documents in the search index that match the queries parameters. Queries are formed by editing the following fields in the Search page.

Search - This is for text searching. Document's full text and fields' text will be searched. Simple keyword and phrase search are supported as well as more advanced searches using the Lucene query syntax. Leave this blank to search all documents in the index.
Filter - This will filter out results from the Search query based on metadata field values using comparison operators ("greater than", "less than", "equal to", etc.).
Select - This will alter what field values are displayed in the result list.
Order By - This will allow users to re-order the result list based on a field value.

Search

The Search configuration searches the full text of each document in the index (including text in string fields). This uses the Lucene query syntax to return documents.

Basic Search features

The simplest types of searches are keyword searches and phrase searches.

Keyword search

Format example: searchTerm

Keyword searches search the full text of each document for a single search term. This includes the document full text content and any string fields in the index.

Be aware, keyword search terms must be full words in order to match. Full words are surrounded by a word boundary (a space, a punctuation mark or a dash -). To perform substring matching, use a wildcard search or a regex search.
- "Word boundaries" include spaces, punctuation marks and special characters like dashes -, dollar signs $, number signs #, and asterisks *.
Multiple search terms can be entered into the same query. Grooper will search for both terms on the document.
- This is effectively the same as a logical OR operation. searchTerm1 searchTerm2 and searchTerm1 OR searchTerm2 are equivalent searches.
- When multiple keywords are searched in this way, results are ordered by how many terms are hit on each document by default. Documents at the top of the list will have all search terms matched. Documents at the bottom will only have one term matched. Configuring the Order By field will override this default.
Special characters will need to be escaped with a slash. This includes the following:
+ - & | ! ( ) { } [ ] ^ " ~ * ? : \ /

Phrase search Format example: "searchTerm1 searchTerm2"

Because multiple search terms can be included in a single search, if you want to search a phrase (like "ACME Insurance") you must enclose the phrase in quotes. This will prevent matching documents which only contain one word in the phrase.

Field searching

Format example: fieldName:searchExpression

Search built in fields and extracted Data Model values. For example, Invoice_No:8* would return any document whose extracted "Invoice No" field started with the number "8"

Field searching is only supported for string values.
- You will receive an error when searching numeric value types (decimal, int, etc) or datetime values.
- However, you can use a filter to find documents using numeric fields and date fields. See the Filter section below for more information on filters.

Advanced Search features

Lucene also supports several advanced querying features, including wildcards, fuzzy matching, regex matching, and more.

Wildcard searches

Format example: searchTerm? and searchTerm*

Use ? for a single wildcard character and * for multiple wildcard characters.

Be aware, wildcards can only be used for prefix matching (wildcard at the end of a term) and infix matching (wildcard in the middle of a term). Wildcards cannot be used for suffix matching (wildcard at the beginning of a term). However, regex can be used for suffix matching. Examples below:
- Prefix matching (use wildcards): alpha* returns alphanumeric or alphabetical
- Infix matching (use wildcards): non*al returns non-numerical or nonsensical
- Suffix matching (use regex): /.*numeric/ returns alphanumeric

Fuzzy matching

Format example: searchTerm~

Fuzzy search can only be applied to single words (not phrases in quotes). Terms are matched based on a character edit distance of one to two characters. Azure's full fuzzy search documentation can be found here.

Azure's implementation of "fuzzy matching" is not the same as Grooper's. Terms are matched based on a character edit distance of 1-2.
- grooper~ or grooper~2 would match any word that was up to two characters different. For example, "trouper" "looper" "groop" or "grooperey".
- grooper~1 would match any word that was up to one character different. For example, "trooper" "groopr" or "groopers".
Be aware, in Lucene full syntax, the tilde (~) is used for both fuzzy search and proximity search. When placed after a quoted phrase, ~ invokes proximity search. When placed at the end of a term, ~ invokes fuzzy search.

Boolean operators

Format example: AND OR NOT

Boolean operators can help improve the precision of search query.

Regular expression matching

Format example: /regex/

Enclose a regex pattern in forward slashes to incorporate it into the Lucene query. For example, /[0-9]{3}[a-z]/

Lucene regex searches are matched against single words/terms.
Lucene regex does not use the Perl Compatible Regular Expressions (PCRE) library. Most notably, this means it does not use single-letter character classes, such as \d to match a single digit. Instead, enter the full character class in brackets, such as [0-9] to match a single digit.

Proximity Search

Format example: "searchTerm1 searchTerm2"~2

Proximity searches are used to find terms that are "near" each on a document. For example, "oil gas"~2 would find the terms "oil" and "gas" within two words of each other. So, it would return instances of "oil and gas" as well as "oil and natural gas".

Be aware, in Lucene full syntax, the tilde (~) is used for both fuzzy search and proximity search. When placed after a quoted phrase, ~ invokes proximity search. When placed at the end of a term, ~ invokes fuzzy search.

Boosted Search

Format example: searchTerm1^2 searchTerm2

Boosted search adjusts the default relevance scoring mechanism. By default, Grooper will return the "most relevant" results first in the results list. For searches with multiple terms, each term is weighted equally as important. For the search standard invoice the term "standard" would be weighted the same as "invoice". If you wanted the term "invoice" to carry more weight and have results with that term bubble to the top of the list, you could boost that term by a factor of two like this: standard invoice^2

You can also boost phrases.
The higher the boost value, the more relevant the term is relative to other terms.
You can dampen a term's relevance with a value less than one (for example, 0.50 would half the term's weight).

Full Lucene documentation

Azure's full documentation of Lucene query syntax can be found here.

Filter

First you search, then you filter. The Filter parameter specifies criteria for documents to be included or excluded from the search results. This gives users an excellent mechanism to further fine tune their search query. Commonly, users will want to filter a search set based on the field values. Both built in index fields and/or values extracted from a Data Model can be incorporated into the filter criteria.

Azure AI Search uses the OData syntax to define filter expressions. Azure's full OData syntax documentation can be found here: https://learn.microsoft.com/en-us/azure/search/search-query-odata-filter

BE AWARE: When filtering date values, dates must be entered in the format yyyy-MM-dd.

For example, this Filter specification would only return search results whose "invoiceDate" Data Field value was between January 1st, 2022 and less than January 31st, 2024:

invoiceDate ge 2022-01-01 and invoiceDate le 2024-01-31

This query therefore is filtering by the "invoiceData" field for results greater than or equal to 2022-01-01 and less than or equal to 2024-01-31.

BE AWARE: Comparison operators like "greater than" (gt) and "less than" (lt) rely on a field's value type. Make sure the Grooper Data Element's value type is set correctly before indexing documents. For example, if you expect to be able to filter a date value correctly, the Data Field's Value Type should be set to DateTime.

FYI

The Search Filter supports the search.in() function as well. You can see the full search.in documentation here.

Filter within a collection

You can also use Filters within a collection (Data Section or Data Table). See the following example for the syntax:

Collection_Element/any(x: x/Field_Element gt 10000)

This iterates through records in a collection, searching for fields that match the filter criteria.

In this example, for a "Collection_Element" give any record ("x") where the "Field_Element" is greater than 10,000.
- For multi-instance Data Sections, the Data Section is the "Collection_Element". The "Field_Element" is a Data Field in the Data Section. The expression will iterate through field values in the section records.
- For Data Tables, the Data Table is the "Collection_Element". The "Field_Element" is a Data Column in the Data Table. The expression will iterate through column values in the table's rows.
"x" is just a variable. You can make it any letter or word you want (alpha characters only).
You can search for documents that have a collection where "any" record or "all" records match the filter expression.

Full OData filter documentation

Azure's full OData filter syntax documentation can be found here.

Select

The Select parameter is an optional parameter to select which metadata fields are shown in the result list.

Select has nothing to do with what results are returned, only what fields are displayed.
This can be exceptionally helpful when navigating indexes with a large number of fields. If you have 30 fields but only want to view a handful of them, simply enter which fields you want visible in the Select editor.
You can select any of the built in fields or Data Elements in the index (defined in the Indexing Behavior).
Multiple fields can be selected using a comma separated list (e.g. Field1,Field2,Field3)

For example, to only display a "totalAmount" Data Field in the search results, you would enter totalAmount in the Select editor.

Full OData select documentation

Azure's full OData select syntax documentation can be found here.

Order By

Order By is an optional parameter that will define how the search results are sorted.

Any field in the index can be used to sort results.
The field's value type will determine how items are sorted.
- String values are sorted alphabetically.
- Datetime values are sorted by oldest or newest date.
- Numerical value types are sorted smallest to largest or largest to smallest.
Sort order can be ascending or descending.
- Add asc after the field's name to sort in ascending order. This is the default direction.
- Add desc after the field's name to sort in ascending order.
Multiple fields may be used to sort results.
- Separate each sort expression with a comma (e.g. Field1 desc,Field2)
- The leftmost field will be used to sort the full result list first, then it's sub-sorted by the next, then sub-sub-sorted by the next, and so on.

For example, to order the result list by a "poNumber" Data Field in the search results, you would enter poNumber in the Select editor.

Full OData orderby documentation

Azure's full OData orderby syntax documentation can be found here.

Creating Data Models for Search

The quality of the search experience depends on how care goes into the Data Model design. Data Models used as the basis for a search index should be designed with the search experience in mind.

Some key tips include:

Make sure fields are appropriately typed.
- Fields are typed by configuring their "Value Type" property.
- Example: Fields collecting dates should be DateTime fields and not String fields. This will allow users to search dates as dates and not strings.
Provide drop-down lists in all cases where it is possible.
- Drop-down lists are enabled by configuring a field's "List Values" property
Avoid unnecessary hierarchy.
- Hierarchy is created by adding Data Elements to Data Sections.
- Data Section names must be entered when doing fielded search or filters. Editing searches can be tedious when fields are unnecessarily buried in Data Sections (and nested Data Sections in particular).
- Example: Data_Section_1/Data_Section_2/Dat_Section_3/Data_Field
For complicated Data Models, it may even be preferable to have two versions of a Data Model: an "extract model" and a "search model"
- The extract model would be designed to best extract data from the document.
- The search model would be designed with a user's search experience in mind.
- The search model can be assigned to a document using Secondary Types and data can be copied over using Data Rules (using Copy actions) and the Convert Data activity.