2024:AI Search and the Search Page
|
2025 BETA |
This article covers new or changed functionality in the current or upcoming beta version of Grooper. Features are subject to change before version 2025's GA release. Configuration and functionality may differ from later beta builds and the final 2025 release. |
Glossary
About
Put simply, Azure AI Search will make it easier to store and retrieve your documents in Grooper. To understand how, let's first understand what Grooper has been.
Historically Grooper has been a transient platform for document processing:
- documents come in
- data is collected from those documents
- the data and documents are pushed out of Grooper to some place
It has never been a place to store documents and/or their data.
While it has been possible to keep Batches and their content in Grooper it has never been a best practice, nor has it been convenient to do so. You could, theoretically, devise some kind of hierarchical folder and naming convention by which you organize Batches in the node tree, but this is very time consuming and is probably not even that useful. Say you wanted to retrieve all "Invoices" that have a "Total Amount" over "$1,000.00". Without "indexing" the documents and their data, and the ability to "query" that index, this would be extremely time consuming at best, even if they're nicely organized. The criteria by which you organize something one day might not align with the method by which you choose to search for them later.
By using Grooper's implementation of Azure AI Search you will be able to quickly and efficiently index your documents and their data to allow for ease of retrieval as well as gain a deeper understanding of them.
Microsoft Azure AI Search
Azure AI Search, formerly known as Azure Cognitive Search, is a cloud-based search-as-a-service solution provided by Microsoft Azure. It has allowed our developers to build a sophisticated search experience into Grooper. Here are some key features and capabilities:
- Full-Text Search: Azure AI Search supports full-text search with capabilities like faceting, filtering, and scoring, allowing users to search through large volumes of text efficiently.
- Customizable Indexing: Developers can define custom indexes tailored to their specific data schema. This flexibility allows for a more relevant and precise search experience.
- Scalability: The service can scale up or down based on the workload, making it suitable for applications of all sizes.
- Security and Compliance: Azure AI Search ensures data security and compliance with industry standards, offering features like role-based access control (RBAC), data encryption, and integration with Active Directory.
- APIs and SDKs: Azure AI Search provides REST APIs and client libraries for various programming languages, making it easy to integrate with different types of applications.
Integration with Grooper
- API Integration: Grooper can leverage Azure AI Search's REST APIs to automate the indexing of documents and retrieval of search results. This integration can be built into Grooper's workflow to ensure seamless data processing and search capabilities.
- Security and Compliance: Both Grooper and Azure AI Search offer robust security features. Integrating these ensures that document processing and search operations are secure and compliant with industry standards.
- Indexing Processed Documents: Once Grooper processes and extracts data from documents, this data can be sent to Azure AI Search for indexing. This allows users to search through the processed data quickly and efficiently.
- Indexing is an intake process that loads content into Azure AI Search service and makes it searchable. Through Azure AI Search, inbound text is processed into tokens and stored in inverted indexes, and inbound vectors are stored in vector indexes. The document format that Azure AI Search can index is JSON.
- Querying Indexed Documents and Data: Once Azure AI Search has indexed documents and their data from Grooper, user's can leverage powerful query syntax like Lucene and OData to efficiently retrieve the information from their documents.
- Querying can happen once an index is populated with searchable content, when Grooper sends query requests to a search service and handles responses. All query execution is over a search index that you control.
How To
Using Azure AI Search will require a few setup steps:
- Create an Azure AI Search Service
- This is the only step done outside of Grooper.)
- Configure the AI Search Repository Option
- Create the Search Index
- Index Documents and Data from Grooper
- Use the Search Page
Create an Azure AI Search Service
- The following article from Microsoft instructs users how to create a Search Service:
- Microsoft's full AI Search documentation is found here:
- You will need the Azure AI Search service's "URL" and either "Primary admin key" or "Secondary admin key" for the next step. These values can be found by accessing the Azure Search service from the Azure portal (portal.azure.com).
Configure the AI Search Repository Option
To search documents in Grooper, we use Azure's AI Search service. In order to connect to an Azure AI Search service, the AI Search option must be added to the list of Repository Options in Grooper. Here, users will enter the Azure AI Search URL endpoint where calls are issued and an admin's API key. Both of these can be obtained from the Microsoft Azure portal once you have added an Azure AI Search resource.
With AI Search added to your Grooper Repository, you will be able to add an Indexing Behavior to one or more Content Types, create a search index, index documents and search them using the Search Page.
- Select the root object in the node tree.
- Click the ellipsis button on the Options property.
- Click the "Add" button in the "Options" window.
- Select AI Search from the drop-down menu.
- Enter the Azure AI Search URL into the URL property.
- Add your Azure AI Search API key to the API Key property.
- Click the "OK" button to close the "Options" window.
- Click the "Save" button to save all changes.
Configure an Indexing Behavior on a Content Type
Before indexing documents, you must add an Indexing Behavior to the Content Types you want to index. Most typically, this will be done on a Content Model. All child Document Types will inherit the Indexing Behavior and its configuration (More complicated Content Models may require Indexing Behaviors configured on multiple Content Types).
The Indexing Behavior defines:
General
- The index's Name in Azure.
- Which documents are added to the index.
- Only documents who are classified as the Indexing Behavior's Content Type OR any of its children Content Types will be indexed.
- In other words, when set on a Content Model only documents classified as one of its Document Types will be indexed.
- What Included Elements are added to the search index (including which Data Elements from a Data Model are included, if any).
- What Built in Fields are added to the search index. Note, if you leverage any of these built in fields and also want to use Included Elements there cannot be naming conflicts between the Included Elements and the Built in Fields. The Built in Fields are typical meta-data points including:
- Content: Index the full text content of the document. This would be the text generated by the Recognize activity.
- Attachment Name: Index the document's attachment filename. This would be the original name of the file as it existed before being acquired by Grooper.
- Type Name: Index the name of the document's Content Type.
- Page Count: Index the number of pages within the document.
- Flag Message: Index the flag message associated with the document. This would include auto-generated messages like whether or not "required" fields were empty, or a type of validation error.
- Path: Index the fully qualified UNC path of the storage location of the document.
- All: Enable all Built in Elements.
- Page Limit: The maximum number of page to include when indexing the full text content of a document.
- Flatten: Specifies that the search index should be flattened. "Flattening" a search index generally refers to the process of transforming a hierarchical or nested data structure into a flat, non-hierarchical structure. In the context of a search index, this could involve several different actions depending on the specific needs and the data structure being indexed.
- Auto Index: If set, specifies that the Indexing Service should automatically add new documents to this search index. When not set, the Indexing Service will still remove deleted documents and update changed documents already in the search index.
Search Page Options
- Access List: If set, specifies a restricted set of users who may search this index. If not set, all authenticated users may search this index.
- AI Analysts: An optional list of AI Analysts available for chat sessions regarding the search result set.
- Generators: An optional list of AI Generators to be available for generating documents from the search result set. This is a collection of LLM Models, Instructions, and Examples that define how an AI would structure said documents.
- Select the Content Model from the provided Project.
- Click the ellipsis button for the Behaviors property.
- In the "Behaviors" window click the "Add" button.
- Select Indexing Behavior from the drop-down menu.
- An Indexing Behavior will be added to the collection.
- For our purposes the bulk of the properties can be left to their default setting. The only thing we'll change is the Included Elements property. Click the ellipsis button for this property.
- In the "Included Elements" window ALT+LeftClick the Content Model to select it and all child elements.
- Click the "OK" button to close the "Included Elements" window.
- Click the "OK" button to close the "Behaviors" window.
- Be sure to save all changes.
Create the Search Index
This will create the search index in Azure. Without the search index created, documents can't be added to an index. This only needs to be done once per index.
- Right-click on the Content Model from the provided Project.
- Choose "Search > Create Search Index" from the pop-out menu.
- Click the "Execute" button in the "Create Search Index" window.
- If you navigate to your Azure AI Search resource...
- ...and go to your indexes...
- ...you will see a new index named after the Name property of the Indexing Behavior of the Content Type this command was used against.
Index Documents and Data from Grooper
With the search index created you can now add data to the search index. Documents must be classified in Grooper before they can be indexed. Only Document Types/Content Types inheriting an Indexing Behavior are eligible for indexing.
"Add to Index" Batch Folder Object Command
This is the most manual way of adding to the search index. This is an object command done on a per document basis (or via multi-selecting) in a Batch Viewer.
"Submit Indexing Job" Content Type Object Command
This is another manual approach as it also involves an object command. Because this command is applied to a Content Type, however, it will index all documents that are classified as that Content Type or inherit from it.
Execute Activity with "Add to Index" Command
This is an automated approach as it will create an "Indexing Job" as part of a Batch Process.
Indexing Service
This is the most automated way to index documents. The Indexing Service will periodically poll the Grooper database to determine if the index needs to be updated. If it does, it will submit an "Indexing Job". Keep in mind how the Auto Index property of an Indexing Behavior described above affects this service.
- If set: will add, update, and/or remove documents from the index
- If not set: will only remove deleted documents or update changed documents already in the search index






