AI Search (Functionality)

From Grooper Wiki
(Redirected from Search Page)

This article was migrated from an older version and has not been updated for the current version of Grooper.

This tag will be removed upon article review and update.

This article is about the current version of Grooper.

Note that some content may still need to be updated.

20252024

AI Search enables Grooper's document search and retrieval features in the Search page. It provides the framework to create document search indexes by Content Type and submit documents to an index. Once indexed, documents can be retrieved by full text searches in the Search Page with feature rich querying and filtering capabilities. Once retrieved, users can view documents in the Search page, download the results, or submit documents for further processing in Grooper.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2024). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

About

AI Search is a Grooper "Repository Option" that enables document indexing in Grooper. This allows Grooper users to search and retrieve documents in the Grooper Search page. It utilizes Microsoft Azure AI Search services to do this. Grooper integrates with this service to index content in a Grooper Repository and search for it from Search Page. With AI Search enabled on the Grooper Root, you can index documents according to their Content Type and search for them using robust query and filtering options in the Search Page

Put simply, AI Search will make it easier to store and retrieve your documents in Grooper. To understand how, let's first understand what Grooper has been.

Historically Grooper has been a transient platform for document processing:

  • documents come in
  • data is collected from those documents
  • the data and documents are pushed out of Grooper to some place

It has never been a place to store documents and/or their data.

While it has been possible to keep Batches and their content in Grooper it has never been a best practice, nor has it been convenient to do so. You could, theoretically, devise some kind of hierarchical folder and naming convention by which you organize Batches in the node tree, but this is very time consuming and is probably not even that useful. Say you wanted to retrieve all "Invoices" that have a "Total Amount" over "$1,000.00". Without "indexing" the documents and their data, and the ability to "query" that index, this would be extremely time consuming at best, even if they're nicely organized. The criteria by which you organize something one day might not align with the method by which you choose to search for them later.

With Grooper's AI Search you will be able to quickly and efficiently index your documents and their data to allow for ease of retrieval as well as gain a deeper understanding of them.

About Microsoft Azure AI Search

Grooper's AI Search functionality is built using Azure AI Search services (formerly known as Azure Cognitive Search). Azure AI Search is a cloud-based search-as-a-service solution provided by Microsoft Azure. It has allowed our developers to build a sophisticated search experience into Grooper. Here are some key features and capabilities:

  • Full-Text Search: Azure AI Search supports full-text search with capabilities like faceting, filtering, and scoring, allowing users to search through large volumes of text efficiently.
  • Customizable Indexing: Developers can define custom indexes tailored to their specific data schema. This flexibility allows for a more relevant and precise search experience.
  • Scalability: The service can scale up or down based on the workload, making it suitable for applications of all sizes.
  • Security and Compliance: Azure AI Search ensures data security and compliance with industry standards, offering features like role-based access control (RBAC), data encryption, and integration with Active Directory.
  • APIs and SDKs: Azure AI Search provides REST APIs and client libraries for various programming languages, making it easy to integrate with different types of applications.

Need to set up your own AI Search service in Azure? Check out our quickstart guide to get started

External links

How To

Integrating Azure AI Search with Grooper will require a few setup steps:

  1. Create an Azure AI Search Service
    • This is the only step done outside of Grooper.
  2. Configure the AI Search Repository Option
  3. Configure an Indexing Behavior on a Content Type
  4. Create the search index
  5. Index documents and data from Grooper

Create an Azure AI Search Service

Configure the AI Search Repository Option

To search documents in Grooper, we use Azure's AI Search service. In order to connect to an Azure AI Search service, the AI Search option must be added to the list of Repository Options in Grooper. Here, users will enter the Azure AI Search URL endpoint where calls are issued and an admin's API key. Both of these can be obtained from the Microsoft Azure portal once you have added an Azure AI Search resource.

With AI Search added to your Grooper Repository, you will be able to add an Indexing Behavior to one or more Content Types, create a search index, index documents and search them using the Search Page.

  1. Select the root object in the node tree.
  2. Click the ellipsis button on the Options property.
  3. Click the "Add" button in the "Options" window.
  4. Select AI Search from the drop-down menu.


  1. Enter the Azure AI Search URL into the URL property.
  2. Add your Azure AI Search API key to the API Key property.
  3. Click the "OK" button to close the "Options" window.
  4. Click the "Save" button to save all changes.

Configure an Indexing Behavior on a Content Type

An Indexing Behavior is a Content Type Behavior designed to enable the ability for inherited Content Types to be indexed via the AI Search functionality.

Before indexing documents, you must add an Indexing Behavior to the Content Types you want to index. Most typically, this will be done on a Content Model. All child Document Types will inherit the Indexing Behavior and its configuration (More complicated Content Models may require Indexing Behaviors configured on multiple Content Types).


The Indexing Behavior defines:

General

  • The index's Name in Azure.
    • Be aware, Azure has some naming rules for its index names and metadata field names. These can be found at the link below.
    https://learn.microsoft.com/en-us/rest/api/searchservice/naming-rules
  • Which documents are added to the index.
    • Only documents who are classified as the Indexing Behavior's Content Type OR any of its children Content Types will be indexed.
    • In other words, when set on a Content Model only documents classified as one of its Document Types will be indexed.
  • What Included Elements are added to the search index (including which Data Elements from a Data Model are included, if any).
  • What Built in Fields are added to the search index. Note, if you leverage any of these built in fields and also want to use Included Elements there cannot be naming conflicts between the Included Elements and the Built in Fields. The Built in Fields are typical meta-data points including:
    • Content: Index the full text content of the document. This would be the text generated by the Recognize activity.
    • Attachment Name: Index the document's attachment filename. This would be the original name of the file as it existed before being acquired by Grooper.
    • Type Name: Index the name of the document's Content Type.
    • Page Count: Index the number of pages within the document.
    • Flag Message: Index the flag message associated with the document. This would include auto-generated messages like whether or not "required" fields were empty, a type of validation error, or even null.
    • Path: Index the path in the "Batches" folder of the node three where the document exists.
    • All: Enable all Built in Elements.
  • Page Limit: The maximum number of pages to include when indexing the full text content of a document.
  • Flatten: Specifies that the search index should be flattened. "Flattening" a search index generally refers to the process of transforming a hierarchical or nested data structure into a flat, non-hierarchical structure. In the context of a search index, this could involve several different actions depending on the specific needs and the data structure being indexed.
  • Auto Index: If set to True, specifies that the Indexing Service should automatically add new documents to this search index, remove deleted documents from the index and update "changed" documents present in the search index. When set to False (default setting), the Indexing Service will not add new documents to the search index. However, it will still remove deleted documents and update any "changed" documents already in the search index.
    • A "changed" document is one whose index metadata changes. If the data of any of the Included Elements or the Built in Fields change, the Indexing Service will update the documents index data.
    • "Changed" document example: An "Invoice Number" Data Field is one of the Included Elements for the Indexing Behavior. A document is already present in the search index before it is extracted. Then, the document runs through the Extract step of a Batch Process, populating the "Invoice Number" field in its Data Model. This would constitute a "change" and the index would be updated for the document.

Search Page Options

  • Access List: If set, specifies a restricted set of users who may search this index. If not set, all authenticated users may search this index.
  • AI Analysts: An optional list of AI Analysts available for chat sessions regarding the search result set.
  • Generators: An optional list of AI Generators to be available for generating documents from the search result set. This is a collection of LLM Models, Instructions, and Examples that define how an AI would structure said documents.


  1. Select the Content Model from the provided Project.
  2. Click the ellipsis button for the Behaviors property.
  3. In the "Behaviors" window click the "Add" button.
  4. Select Indexing Behavior from the drop-down menu.


  1. An Indexing Behavior will be added to the collection.
  2. For our purposes the bulk of the properties can be left to their default setting. The only thing we'll change is the Included Elements property. Click the ellipsis button for this property.
  3. In the "Included Elements" window ALT+LeftClick the Content Model to select it and all child elements.
  4. Click the "OK" button to close the "Included Elements" window.
  5. Click the "OK" button to close the "Behaviors" window.
  6. Be sure to save all changes.

The Name property determines the index's name in Azure. This name:

  • Must be all lower case.
  • Can only contain letters, numbers, dashes (-) or underscores (_)
  • Cannot contain consecutive dashes or underscores.
  • May be between 2 and 128 characters long.

Create the search index

This will create the search index in Azure. Without the search index created, documents can't be added to an index. This only needs to be done once per index.

  1. Right-click on the Content Model from the provided Project.
  2. Choose "Search > Create Search Index" from the pop-out menu.
  3. Click the "Execute" button in the "Create Search Index" window.


  1. If you navigate to your Azure AI Search resource in a web browser...
  2. ...and go to your indexes...
  3. ...you will see a new index named after the Name property of the Indexing Behavior of the Content Type this command was used against.

Index documents and data from Grooper

With the search index created you can now add data to the search index. Documents must be classified in Grooper before they can be indexed. Only Document Types/Content Types inheriting an Indexing Behavior are eligible for indexing.

There are four ways to index documents:

  1. "Add to Index" Batch Folder Object Command - This is a right-click command applied to a single document. It will index a single document. This method is best for one-off testing and submitting small numbers of documents to an index.
  2. The "Submit Indexing Job" command - This is a right-click command executed from the Content Model/Content Type configured with the Indexing Behavior. It will index all documents currently in the Grooper Repository that inherit the Content Type's Indexing Behavior. This method is useful to index a large number of documents that already exist somewhere in a Grooper Repository.
  3. Execute Activity with "Add to Index" command - This gives us a way to index documents in a Batch Process. When configured with an "Add to Index" command, the Execute activity will apply that command to each document in a Batch (therefore indexing them). This is a great way to automate document indexing at a specific point within a Batch Process's flow.
  4. Indexing Service - This is the "set it and forget it" method for document indexing. The Indexing Service is a Grooper service that runs in the background, periodically polling the Grooper Repository for new documents that need to be indexed, documents whose data has changed to update the index and documents that have been deleted to be removed from the index. This is a great way to automated document indexing in the background.

"Add to Index" Batch Folder Object Command

This is the most manual way of adding to the search index. This is an object command done on a per document basis (or via multi-selecting) in a Batch Viewer.


  1. Select the provided Batch from the node tree.
  2. Click on the "Viewer" tab.
  3. Notice that the document in this Batch is classified as a Document Type that is inheriting from the Content Model with the Indexing Behavior.
  4. Right-click on the document from the provided Batch in the Batch Viewer.
  5. Select "Search > Add To Index" from the pop-out menu.
  6. Click "Execute" in the "Add to Index" window.


Optional: How to verify your index in Azure
  1. Navigate to your Azure AI Search resource in a web browser.
  2. Click the "Indexes" button.
  3. Click on the listed index in use.


  1. Click the "Search" button to perform an open query.
  2. You should see the JSON results of the performed query in the "Results" portion of the site.z

The "Submit Indexing Job" command

This is another manual approach as it also involves an object command. Because this command is applied to a Content Type, however, it will index all documents that are classified as that Content Type or inherit from it.


First things first, we need to make sure an Activity Processing Service is running for our repository.

  1. If you click on the "Machines" folder in the node tree...
  2. ...and select the Grooper server where your repository is hosted...
  3. ...you should see an Activity Processing Service is installed and running.
    • If you do not see the appropriate service for your repository of Grooper, please visit the Grooper Command Console article for information on installing the appropriate service.


Once you've confirmed an Activity Processing Service is installed and running, you can use the "Submit Indexing Job" object command.

  1. Right-click the Content Model from the provided Project.
  2. Select "Search > Submit Indexing Job" from the pop-out menu.
  3. Take note of the "Added Documents", "Updated Documents", and "Deleted Documents" properties.
  4. Click the “Execute” button in the “Submit Indexing Job” window.
    • If there are no “added”, “updated”, or “deleted” documents, the window will close and no job will be submitted.
    • If there are, however, an “Indexing Job” will be created and the active “Activity Processing Service” will complete the tasks of the job to update the index.

Execute Activity with "Add to Index" command

This is an automated approach as it will create an "Indexing Job" as part of a Batch Process. This will perform the exact same command as the "Add to Index" object command explained earlier. When an Execute step is reached in a Batch Process a job will be created with a task for each document in scope.

  • If the document in scope does not need to be added, updated, or deleted, no task will be created. If that is true for all documents in scope, no job will be created.
  • An Activity Processing Service will need to be installed and running for the given repository in order for the job to be picked up and worked.


  1. Right-click the provided Project.
  2. Select "Add > Batch Process" from the pop-out menu.
  3. Name the Batch Process.
  4. Click the "Execute" button from the "Add" window.


  1. Right-click on the newly created Batch Process.
  2. Select "Add Activity > Utilities > Execute" from the pop-out menu.
  3. The step will be named based on the Activity chosen.
  4. Click the "Execute" button in the "Add Activity" window.


  1. Set the Scope and Folder Level properties.
    • For our purposes a Scope of Folder, and a Folder Level of 1 are accurate.
  2. Click the ellipsis button for the Steps property.
  3. Click the "Add" button in the "Steps" window.
  4. Choose Execute Command from the drop-down window.


  1. This will add an "Execute Command" to the collection of "Steps".
  2. Click the drop-down button for the Command property.
  3. Select Batch-Folder > Add to Index from the drop-down menu.
  4. Click the "OK" button from the "Steps" window.
  5. Be sure to save all changes.


  1. Click on the "Activity Tester" tab.
  2. Select the document from the provided Batch in the "Batch Viewer".
  3. Click the "Start" button to create an "Indexing Job" for the selected document.

Indexing Service (Our best practice method)

This is the most automated way to index documents. The Indexing Service will periodically poll the Grooper database to determine if classified documents that inherit from a Content Type with an Indexing Behavior need to be added, updated, or deleted. If it does, it will submit an "Indexing Job" with tasks for each document that needs to be added, updated, or deleted.

Keep in mind how the Auto Index property of an Indexing Behavior described above affects this service.

  • If set: will add, update, and/or remove documents from the index
  • If not set: will only remove deleted documents or update changed documents already in the search index

Also keep in mind:

  • an Indexing Service will need to be installed and running for the given repository in order for the job to be "Indexing Job" to be created.
  • an Activity Processing Service will need to be installed and running for the given repository in order for the job to be picked up and worked


First things first, we need to make sure an Indexing Service and an Activity Processing Service are running for our repository.

  1. If you click on the "Machines" folder in the node tree...
  2. ...and select the Grooper server where your repository is hosted...
  3. ...you should see an Activity Processing Service is installed and running.
    • If you do not see the requisite services for your repository of Grooper, please visit the Grooper Command Console article for information on installing the appropriate services.


At this point there's nothing really left to do but let the service poll the database and look for updates to submit to the index.

Search Page

Once you've got indexed documents, you can start searching for documents in the search index! The Search Page allows you to find documents in your search index.


The image below gives an overview of the Search Page interface.

  • Use the following elements to perform the query in this image to get a result with the setup configured so far:
    • Search:
      invoiceNO: 1*
    • Filter:
      invoiceDate ge 2022-01-01 and invoiceDate le 2024-01-31
      • BE AWARE: When filtering date values, dates must be entered in the format yyyy-MM-dd.
    • Select:
      totalAmount
    • Order By:
      poNumber
  • Once the provided inputs are entered into the appropriate fields, click the magnifying glass button to perform the query.
  • Once the query has been executed, select the returned document from the bottom portion of the Search page's UI, the portion "that displays query results".
  • Once selected you should see the result appear in the document viewer.

Continue reading for more information on these individual fields of the Search page.


The Search Page allows you to build a search query using four components:

  • Search: This is the only required parameter. Here, you will enter your search terms, using the Lucene query syntax.
  • Filter: An optional filter to set inclusion/restriction criteria for documents returned, using the OData syntax.
    • BE AWARE: When filtering date values, dates must be entered in the format yyyy-MM-dd.
  • Select: Optionally selects which fields you want displayed for each document.
  • Order By: Optionally orders the list of documents returned.

Anatomy of a Search query

Documents are retrieved by executing a search query in the search page. These queries will return one or more documents in the search index that match the queries parameters. Queries are formed by editing the following fields in the Search page.

  • Search - This is for text searching. Document's full text and fields' text will be searched. Simple keyword and phrase search are supported as well as more advanced searches using the Lucene query syntax. Leave this blank to search all documents in the index.
  • Filter - This will filter out results from the Search query based on metadata field values using comparison operators ("greater than", "less than", "equal to", etc.).
  • Select - This will alter what field values are displayed in the result list.
  • Order By - This will allow users to re-order the result list based on a field value.

Search

The Search configuration searches the full text of each document in the index (including text in string fields). This uses the Lucene query syntax to return documents.

Basic Search features

The simplest types of searches are keyword searches and phrase searches.

Keyword search

Format example: searchTerm

Keyword searches search the full text of each document for a single search term. This includes the document full text content and any string fields in the index.

  • Be aware, keyword search terms must be full words in order to match. Full words are surrounded by a word boundary (a space, a punctuation mark or a dash -). To perform substring matching, use a wildcard search or a regex search.
    • "Word boundaries" include spaces, punctuation marks and special characters like dashes -, dollar signs $, number signs #, and asterisks *.
  • Multiple search terms can be entered into the same query. Grooper will search for both terms on the document.
    • This is effectively the same as a logical OR operation. searchTerm1 searchTerm2 and searchTerm1 OR searchTerm2 are equivalent searches.
    • When multiple keywords are searched in this way, results are ordered by how many terms are hit on each document by default. Documents at the top of the list will have all search terms matched. Documents at the bottom will only have one term matched. Configuring the Order By field will override this default.
  • Special characters will need to be escaped with a slash. This includes the following:
    + - & | ! ( ) { } [ ] ^ " ~ * ? : \ /

Phrase search Format example: "searchTerm1 searchTerm2"

Because multiple search terms can be included in a single search, if you want to search a phrase (like "ACME Insurance") you must enclose the phrase in quotes. This will prevent matching documents which only contain one word in the phrase.

Advanced Search features

Lucene also supports several advanced querying features, including wildcards, fuzzy matching, regex matching, and more.

Wildcard searches

Format example: searchTerm? and searchTerm*

Use ? for a single wildcard character and * for multiple wildcard characters.

  • Be aware, wildcards can only be used for prefix matching (wildcard at the end of a term) and infix matching (wildcard in the middle of a term). Wildcards cannot be used for suffix matching (wildcard at the beginning of a term). However, regex can be used for suffix matching. Examples below:
    • Prefix matching (use wildcards): alpha* returns alphanumeric or alphabetical
    • Infix matching (use wildcards): non*al returns non-numerical or nonsensical
    • Suffix matching (use regex): /.*numeric/ returns alphanumeric

Fuzzy matching

Format example: searchTerm~

Fuzzy search can only be applied to single words (not phrases in quotes). Terms are matched based on a character edit distance of one to two characters. Azure's full fuzzy search documentation can be found here: https://learn.microsoft.com/en-us/azure/search/search-query-fuzzy

  • Azure's implementation of "fuzzy matching" is not the same as Grooper's. Terms are matched based on a character edit distance of 1-2.
    • grooper~ or grooper~2 would match any word that was up to two characters different. For example, "trouper" "looper" "groop" or "grooperey".
    • grooper~1 would match any word that was up to one character different. For example, "trooper" "groopr" or "groopers".
  • Be aware, in Lucene full syntax, the tilde (~) is used for both fuzzy search and proximity search. When placed after a quoted phrase, ~ invokes proximity search. When placed at the end of a term, ~ invokes fuzzy search.

Boolean operators

Format example: AND OR NOT

Boolean operators can help improve the precision of search query.

Field searching

Format example: fieldName:searchExpression

Search built in fields and extracted Data Model values. For example, Invoice_No:8* would return any document whose extracted "Invoice No" field started with the number "8"

Regular expression matching

Format example: /regex/

Enclose a regex pattern in forward slashes to incorporate it into the Lucene query. For example, /[0-9]{3}[a-z]/

  • Lucene regex searches are matched against single words/terms.
  • Lucene regex does not use the Perl Compatible Regular Expressions (PCRE) library. Most notably, this means it does not use single-letter character classes, such as \d to match a single digit. Instead, enter the full character class in brackets, such as [0-9] to match a single digit.

Proximity Search

Format example: "searchTerm1 searchTerm2"~2

Proximity searches are used to find terms that are "near" each on a document. For example, "oil gas"~2 would find the terms "oil" and "gas" within two words of each other. So, it would return instances of "oil and gas" as well as "oil and natural gas".

  • Be aware, in Lucene full syntax, the tilde (~) is used for both fuzzy search and proximity search. When placed after a quoted phrase, ~ invokes proximity search. When placed at the end of a term, ~ invokes fuzzy search.

Boosted Search

Format example: searchTerm1^2 searchTerm2

Boosted search adjusts the default relevance scoring mechanism. By default, Grooper will return the "most relevant" results first in the results list. For searches with multiple terms, each term is weighted equally as important. For the search standard invoice the term "standard" would be weighted the same as "invoice". If you wanted the term "invoice" to carry more weight and have results with that term bubble to the top of the list, you could boost that term by a factor of two like this: standard invoice^2

  • You can also boost phrases.
  • The higher the boost value, the more relevant the term is relative to other terms.
  • You can dampen a term's relevance with a value less than one (for example, 0.50 would half the term's weight).
Full Lucene documentation

Azure's full documentation of Lucene query syntax can be found here: https://learn.microsoft.com/en-us/azure/search/query-lucene-syntax

Filter

First you search, then you filter. The Filter parameter specifies criteria for documents to be included or excluded from the search results. This gives users an excellent mechanism to further fine tune their search query. Commonly, users will want to filter a search set based on the field values. Both built in index fields and/or values extracted from a Data Model can be incorporated into the filter criteria.

Azure AI Search uses the OData syntax to define filter expressions. Azure's full OData syntax documentation can be found here: https://learn.microsoft.com/en-us/azure/search/search-query-odata-filter

  • BE AWARE: When filtering date values, dates must be entered in the format yyyy-MM-dd.


For example, this Filter specification would only return search results whose "invoiceDate" Data Field value was between January 1st, 2022 and less than January 31st, 2024:

invoiceDate ge 2022-01-01 and invoiceDate le 2024-01-31

This query therefore is filtering by the "invoiceData" field for results greater than or equal to 2022-01-01 and less than or equal to 2024-01-31.

  • Be aware, comparison operators like "greater than" (gt) and "less than" (lt) rely on a field's value type. Make sure the Grooper Data Element's value type is set correctly before indexing documents. For example, if you expect to be able to filter a date value correctly, the Data Field's Value Type should be set to DateTime.

FYI

The Search page UI is making it a bit easier by not having to specify the filter syntax. This is actually OData syntax, but a bit shortened. In full OData syntax, it would look specifically like the following:

$filter=invoiceDate ge 2022-01-01T00:00:00.000Z and invoiceDate le 2024-01-3100:00:00.000Z
Full OData filter documentation

Azure's full OData filter syntax documentation can be found here: https://learn.microsoft.com/en-us/azure/search/search-query-odata-filter

Select

The Select parameter is an optional parameter to select which metadata fields are shown in the result list.

  • Select has nothing to do with what results are returned, only what fields are displayed.
  • This can be exceptionally helpful when navigating indexes with a large number of fields. If you have 30 fields but only want to view a handful of them, simply enter which fields you want visible in the Select editor.
  • You can select any of the built in fields or Data Elements in the index (defined in the Indexing Behavior).
  • Multiple fields can be selected using a comma separated list (e.g. Field1,Field2,Field3)


For example, to only display a "totalAmount" Data Field in the search results, you would enter totalAmount in the Select editor.

Full OData select documentation

Azure's full OData select syntax documentation can be found here: https://learn.microsoft.com/en-us/azure/search/search-query-odata-select

Order By

Order By is an optional parameter that will define how the search results are sorted.

  • Any field in the index can be used to sort results.
  • The field's value type will determine how items are sorted.
    • String values are sorted alphabetically.
    • Datetime values are sorted by oldest or newest date.
    • Numerical value types are sorted smallest to largest or largest to smallest.
  • Sort order can be ascending or descending.
    • Add asc after the field's name to sort in ascending order. This is the default direction.
    • Add desc after the field's name to sort in ascending order.
  • Multiple fields may be used to sort results.
    • Separate each sort expression with a comma (e.g. Field1 desc,Field2)
    • The leftmost field will be used to sort the full result list first, then it's sub-sorted by the next, then sub-sub-sorted by the next, and so on.


For example, to order the result list by a "poNumber" Data Field in the search results, you would enter poNumber in the Select editor.

Full OData orderby documentation

Azure's full OData orderby syntax documentation can be found here: https://learn.microsoft.com/en-us/azure/search/search-query-odata-orderby

Search Page Commands

There are several new commands users can execute from the Search page. These commands give users a new way of starting and continuing work in Grooper. These commands can be divided into two sets of commands: "result set commands" and "document commands"

Search Query Commands

These commands are accessed above the query editor. They are used in conjunction with written querys.

  • Execute Query - This will execute the query written in the Search, Filter, Order By, and Select parameters.
  • Clear Query - This will clear all query parameters.
  • AI Generate - This will leverage AI Generators defined on the Indexing Behavior associated with this index to allow users to craft documents based on the results of the query.
  1. Favorites - This will allow users to store and retrieve queries they use frequently.

Result Set Commands

These commands can be accessed from a dropdown list in the Search page UI. They can be applied to the entire result set or a selection from the result set.

  • Create Batch - Creates a Batch from the result set and submits an Import Job to start processing it.
  • Submit Job - Submits a Processing Job for documents in the result set. This command is intended for "on demand" activity processing.
  • Analyst Cat - Select an AI Analyst to start a chat session with the result set.
  • Download - Download a document, generated from the result set. May be one of the following:
    • Download PDF - Generates a single bookmarked PDF with optional search hit highlights.
    • Download ZIP - Generates a ZIP file containing each document in the result set.
    • Download CSV - Generates a CSV file from the result set's data fields.
    • Download Custom - Generates a custom document using an "AI Generator"

Document Commands

These commands can be accessed from the Search page when right-clicking a document in the result list.

  • Go to Document - Navigates users with Design page permissions to that document in the Grooper node tree.
  • Review Document - Opens the document in a Review Viewer with a Data View, Folder View and Thumbnail View.
  • Copy Link - Creates a URL link to the document. When clicking the link users will be taken to a Review Viewer with a Data View, Folder View and Thumbnail View.

Saving and sharing Search queries

The "Favorites" button allows users to save and share queries.

  • Saving queries allows users to quickly retrieve documents based on pre-configured set of documents.
  • Sharing queries allows users to quickly communicate a document set to another user.

BE AWARE! In version 2024, saving a query will be stored in your browser's session cache.

If you switch browsers or clear your browser's cache, you will loose your saved queries. This will change in version 2025, in which queries will be saved to the Grooper database.

Saving a query

Users can save queries they would like to execute later. Saved queries are "user specific". Each user only has access to their saved queries. (Because saved queries are stored in the browser cache, they are "browser specific" too. If users switch browsers/computers/clear their browser cache, they will "lose" their saved queries. This will change in version 2025 where queries will be saved to the Grooper database).

  1. After entering the query parameters, press the "Favorites" button.
  2. This pulls up a dropdown menu. Press the "Add" button to add the query to your Favorites list.
  3. This brings up the "Add Favorite" window. Name the query.
  4. Press "OK" when done.

You can now bring up this saved query and execute it whenever you need to in the future.

Retrieving a saved query

  1. Press the "Favorites" button.
  2. This pulls up a dropdown menu. Select the query.
    • From this dropdown, you can also edit a saved query's name, delete a query or save over a query from here. Simply hover over the desired query and use the corresponding button to do so.


  1. This will fill in the Search Query Editor with the saved query.
  2. Press the search button to execute the query.

Sharing a query

Queries can be shared between Grooper users via a copied link. As long as the user has permissions to the Grooper Repository and the Search Page, they will be able to execute the shared query.

  1. After entering the query parameters, press the "Favorites" button.
  2. This pulls up a dropdown menu. Press the "Link" button.
    • This copies the query configuration to a shareable link in your clipboard.
    • This link can be pasted anywhere you wish to share the query (email, chat clients, etc).

File:2024 AISearch 03 03 Sharing-a-saved-query-02.png

Different result set view layouts

More on Lucene

The Lucene query language is a powerful and flexible search language used for querying full-text search engines based on the Apache Lucene library. Lucene provides the foundation for various search platforms, including Elasticsearch, Solr, and most importantly for Grooper, AI Search. The query language allows users to perform complex searches using a syntax that supports a range of operators and expressions.

Key Features of Lucene Query Language:

  • Boolean Operators: Use AND, OR, and NOT to combine or exclude terms.
  • Field-specific Searches: Query specific fields in the indexed documents (e.g., title:Azure AND content:AI).
  • Wildcards: Use * (matches multiple characters) and ? (matches a single character) within terms.
  • Phrase Searches: Use quotes to search for exact phrases (e.g., artificial intelligence).
  • Proximity Searches: Find terms within a certain distance from each other (e.g., cloud computing"~5).
  • Fuzzy Searches: Use the tilde ~ symbol to find terms with similar spellings (e.g., search~).
  • Range Searches: Search within a range of values (e.g., date:[20230101 TO 20231231]).

How Azure AI Search Uses Lucene Query Language

  • Query Syntax: Azure AI Search allows users to write queries using Lucene syntax directly in the search requests. This enables precise and complex searches, including filtering, scoring, and relevance tuning.
  • Fielded Searches: Azure AI Search supports querying specific fields in your index, much like Lucene. For example, you can search for documents where a certain field matches a given term (e.g., fieldName:searchTerm).
  • Boolean Logic: Users can combine multiple search criteria using Boolean operators to refine search results. This is useful in narrowing down search results by combining conditions (e.g., category:Technology AND date:[20230101 TO 20231231]).
  • Filtering: Azure AI Search leverages OData's capabilities to perform filtered searches, which allow users to filter results by different fields (e.g., price ranges, ratings).
  • Highlighting: Azure AI Search can highlight the parts of the document that match the search query, using Lucene's query syntax to determine what to highlight.

Examples

Let's consider a scenario where you have a series of invoice documents indexed in Azure AI Search. The documents include key data points like invoice_id, vendor_name, invoice_date, amount, status, and line_items.

Example 1: Finding All Invoices from a Specific Vendor Suppose you want to find all invoices from the vendor Acme Corp.

  • Lucene Query:
    vendor_name:"Acme Corp."
  • Explanation:
This query searches for all documents where the vendor_name field exactly matches "Acme Corp.".

Example 2: Finding Invoices Above a Certain Amount Let's say you need to find all invoices where the amount is greater than $10,000.

  • Lucene Query:
    amount:[10000 TO *]
  • Explanation:
This query searches for all invoices where the amount is greater than or equal to 10,000. The * wildcard indicates there is no upper limit in this range.

Example 3: Finding Unpaid Invoices within a Date Range You want to find all invoices with the status of Unpaid that were issued in January 2024.

  • Lucene Query:
    status:Unpaid AND invoice_date:[20240101 TO 20240131]
  • Explanation:
The query searches for invoices that have a status of Unpaid and an invoice_date between January 1, 2024, and January 31, 2024.

Example 4: Searching for Specific Items in Line Items If you want to find invoices that include a line item with the description laptop.

  • Lucene Query:
    line_items.description:laptop
  • Explanation:
This query looks into the line_items table (consider tables contain arrays of information) and searches for any line item where the description field contains the word laptop.

Example 5: Combining Multiple Conditions Let's say you want to find all invoices from Acme Corp. issued after July 1, 2024, with an amount greater than $5,000.

  • Lucene Query:
    vendor_name:"Acme Corp." AND invoice_date:[20240701 TO *] AND amount:[5000 TO *]
  • Explanation:
This query combines multiple conditions to filter invoices from Acme Corp. where the invoice_date is after July 1, 2024, and the amount is greater than $5,000.

These examples demonstrate how you can leverage the Lucene query language to perform detailed searches on your indexed invoice documents in Azure AI Search. By using these queries, you can quickly and effectively find specific documents that match complex criteria.

In essence, Azure AI Search uses the Lucene query language to enable complex and customizable search functionality, giving developers and users the ability to craft tailored search queries that meet their specific needs.

More on OData

The OData (Open Data Protocol) query language is a standardized protocol for querying and updating data, particularly in web services. It is built on RESTful principles and allows for querying data in a simple and consistent way across various data sources. OData is widely used in services like Azure AI Search, where it complements other query languages, such as Lucene, by providing additional filtering, sorting, and pagination capabilities.

Key Features of OData Query Language

  • Filtering ($filter): Apply conditions to retrieve only the data that matches specified criteria.
    OData's $filter option allows you to apply precise filters to your search results. For example, you can filter results based on date ranges, numerical values, or text matches. This is especially useful when you want to refine search results according to specific conditions.
  • Ordering ($orderby): Sort the results based on specified fields.
    With $orderby, you can sort search results by one or more fields, either in ascending or descending order. This is useful when you need to order search results by relevance, date, or other criteria.
  • Selection ($select): Specify which fields to include in the response.
    OData's $select allows you to specify which fields to include in the search results. This can reduce the payload size by only returning the necessary fields from the documents.
  • Top and Skip ($top and $skip): Control pagination by specifying the number of records to return and the offset.
    OData provides $top and $skip parameters to control pagination in search results. $top specifies how many results to return, while $skip determines how many records to skip. This is useful for handling large datasets where you want to present data in smaller chunks.
  • Expanding ($expand): Retrieve related entities in a single query (useful for relationships in data models).
  • Counting ($count): Get the total number of records matching a query.
    The $count option enables you to retrieve the total number of documents that match a query without retrieving the actual documents. This can be combined with other query options to get insights into the dataset size.

Examples

Filter Invoices by Status and Date Range

  • OData Query:
    $filter=status eq 'Unpaid' and invoice_date ge 2024-01-01 and invoice_date le 2024-01-31
  • Explanation:
This query filters for invoices where the status is Unpaid and the invoice_date is between January 1, 2024, and January 31, 2024.

Sort Invoices by Amount in Descending Order

  • OData Query:
    $orderby=amount desc
  • Explanation:
This query sorts the invoices by the amount field in descending order, showing the highest amounts first.

Select Specific Fields from the Results

  • OData Query:
    $select=invoice_id,vendor_name,amount
  • Explanation:
This query returns only the invoice_id, vendor_name, and amount fields in the results, omitting other fields.

Pagination with Top and Skip

  • OData Query:
    $top=10&$skip=20
  • Explanation:
This query retrieves the third page of results, with 10 results per page (i.e., results 21-30).

Counting Matching Documents

  • OData Query:
    $count=true&$filter=status eq 'Paid'
  • Explanation:
This query returns the count of documents where the status is Paid without retrieving the documents themselves.

In Azure AI Search, OData is used to refine, organize, and manage the results of search queries. It works alongside Lucene to offer a robust and flexible querying mechanism, making it easier to handle complex data retrieval scenarios in search applications.