AI Search: Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Line 2: Line 2:




Azure AI Search Setup Notes
Create an efficient and effective document search and retrieval mechanism in Grooper using Azure's AI Search service.
* Microsoft's Azure AI Search documentation is found here:
* https://learn.microsoft.com/en-us/azure/search/


=== Basic AI Search Setup ===


Basic Setup
Before you can start using the Search page to search for documents, there's some basic setup you need to perform.  Some of these steps are performed outside of Grooper.  Most are performed inside of Grooper.
# Create an Azure AI Search Service
 
#*<li class="fyi-bullet"> The following article from Microsoft instructs users how to create a Search Service: https://learn.microsoft.com/en-us/azure/search/search-create-service-portal
<big>Outside of Grooper</big>
#*<li class="fyi-bullet"> You will need the Azure AI Search service's "URL" and either "Primary admin key" or "Secondary admin key" for the next step. These values can be found by accessing the Azure Search service from https://portal.azure.com/.
# Create an AI Search service in Azure.  This will require
# Add an "AI Search" Repository Option to the Grooper Repository's root node.
#*The following article from Microsoft instructs users how to create a Search Service:
# Enter the URL and API key for the Azure AI Search service.  Copy this from Azure.
#*: https://learn.microsoft.com/en-us/azure/search/search-create-service-portal
# Add an Indexing Behavior on a Content Model in Grooper.
#* Microsoft's full AI Search documentation is found here:
#*<li class="fyi-bullet"> Documents must be classified before they can be indexed.  Only Document Types/Content Types inheriting an Indexing Behavior are eligible for indexing.
#*: https://learn.microsoft.com/en-us/azure/search/
# Create the search index by right-clicking the Content Model and selecting "Search > Create Search Index"
#*<li class="fyi-bullet"> You will need the Azure AI Search service's "URL" and either "Primary admin key" or "Secondary admin key" for the next step. These values can be found by accessing the Azure Search service from the Azure portal ([https://portal.azure.com/ portal.azure.com]).
<big>Inside of Grooper</big>
#<li value=2> Add '''''AI Search''''' to the '''Grooper Root''' node's '''''Repository Options'''''. Enter the URL and admin key for the Azure AI Search service (copied from Azure).
# Add an '''''Indexing Behavior''''' on a '''Content Model'''.
#*<li class="fyi-bullet"> Documents must be classified in Grooper before they can be indexed.  Only '''Document Types'''/'''Content Types''' inheriting an '''''Indexing Behavior''''' are eligible for indexing.
# Create the search index by right-clicking the '''Content Model''' and selecting "Search > Create Search Index"
#*<li class="fyi-bullet"> This creates the search index in Azure.  Without the search index created, documents can't be added to an index.  This only needs to be done ''once'' per index.
#*<li class="fyi-bullet"> This creates the search index in Azure.  Without the search index created, documents can't be added to an index.  This only needs to be done ''once'' per index.
# Index documents in one of the following ways:
# Index documents in one of the following ways:
#* Manually, one document at a time by right-clicking the document and using the "Add to Index" command.
#* Manually, one document at a time by right-clicking the document and using the "'''''Add to Index'''''" command.
#* Using an Execute activity in a Batch Process to apply the "Add to Index" command to all documents in a Batch.
#* Using an '''''Execute''''' activity in a '''Batch Process''' to apply the "'''''Add to Index'''''" command to all documents in a '''Batch'''.
#* Manually submitting an "Indexing Job" by right-click.
#* Submitting an "Indexing Job" by right-clicking the '''Content Model''' (from step 3) and selecting "Search > Submit Indexing Job".
#* Running the Grooper Indexing Service to index documents automatically in the background.
#* Running the Grooper '''Indexing Service''' to index documents automatically in the background.
#**<li class="attn-bullet"> '''''BE AWARE:''''' The Grooper Indexing Service periodically polls the Grooper database to determine if the index needs to be updated.  If it does, it will submit an Indexing Job.  An '''Activity Processing''' service '''''must also''''' be running to fully automate indexing in the background.
#**<li class="attn-bullet"> '''''BE AWARE:''''' The Grooper '''Indexing Service''' periodically polls the Grooper database to determine if the index needs to be updated.  If it does, it will submit an "Indexing Job".  An '''Activity Processing''' service '''''must also''''' be running to fully automate indexing in the background.
 
==== Repository Options: AI Search ====
 
'''''Repository Options''''' are new to Grooper 2024.  They add new functionality to the whole Grooper Repository.  These optional features are added using the '''''Options''''' property editor on the '''Grooper Root''' node.
 
To search documents in Grooper, we use Azure's AI Search service.  In order to connect to an Azure AI Search service, the '''''AI Search''''' option must be added to the list of '''''Repository Options''''' in Grooper.  Here, users will enter the Azure AI Search URL endpoint where calls are issued and an admin's API key.  Both of these can be obtained from the Microsoft Azure portal once you have added an Azure AI Search resource.
 
With '''''AI Search''''' added to your Grooper Repository, you will be able to add an '''''Indexing Behavior''''' to one or more '''Content Types''', create a search index, index documents and search them using the Search Page.
 
==== Indexing documents for search  ====
 
Before documents can be searched, they must be indexed.  The search index holds the content you want to search. This includes each document's full OCR or native text obtained from the '''Recognize''' activity and can optionally include '''Data Model''' results collected from the '''Extract''' activity.  We use the Azure AI Search Service to create search indexes according to an '''''Indexing Behavior''''' defined for '''Content Types''' in Grooper. Documents are made searchable by adding them to a search index.  Once indexed, you can search for documents using Grooper's Search page.
 
===== The Indexing Behavior: Defines the search index =====
Before indexing documents, you must add an '''''Indexing Behavior''''' to the '''Content Types''' you want to index.  Most typically, this will be done on a '''Content Model'''.  All child '''Document Types''' will inherit the '''''Indexing Behavior''''' and its configuration (More complicated '''Content Models''' may require '''''Indexing Behaviors''''' configured on multiple '''Content Types''').
 
The '''''Indexing Behavior''''' defines:
* The index's name in Azure.
* Which documents are added to the index.
** Only documents who are classified as the '''''Indexing Behavior's''''' '''Content Type''' OR any of its children '''Content Types''' will be indexed.
** In other words, when set on a '''Content Model''' only documents classified as one of its '''Document Types''' will be indexed.
* What fields are added to the search index (including which '''Data Elements''' from a '''Data Model''' are included, if any).
* Any options for the search index in the Grooper Search page (included access restriction to the search index).
 
*<li class="attn-bullet"> '''''BE AWARE:''''' Once an '''''Indexing Behavior''''' is added to a '''Content Type''', you must use the "Create Search Index" command to create the index in Azure.  Do this by right-clicking the '''Content Type''' and choosing "Search > Create Search Index".
 
With the '''''Indexing Behavior''''' defined, and the search index created, now you can start indexing documents.
 
===== Adding documents to the search index =====
 
Documents may be added to a search index in one of the following ways:
* Right-clicking the document and using the "Add to Index" command
*: This is the most manual way of indexing documents.  Documents may also be removed using the "Remove from Index" command.
* Using an '''''Execute''''' activity in a '''Batch Process''' to apply the "Add to Index" command to all documents in a '''Batch'''.
*: This is a more automated way of indexing documents.  It adds documents to a search index at a specific point in a '''Batch Process'''.
* Running the Grooper '''Indexing Service''' to index documents automatically in the background.
*: This is the ''most'' automated way of indexing documents. The Indexing Service periodically polls the Grooper database and adds newly classified documents to the index, updates the index if changes are made (to their extracted data for example), or removes documents from the index if they've been deleted.

Revision as of 14:55, 1 July 2024

This is a placeholder for an article on an upcoming new feature in version 2024.


Create an efficient and effective document search and retrieval mechanism in Grooper using Azure's AI Search service.

Basic AI Search Setup

Before you can start using the Search page to search for documents, there's some basic setup you need to perform. Some of these steps are performed outside of Grooper. Most are performed inside of Grooper.

Outside of Grooper

  1. Create an AI Search service in Azure. This will require

Inside of Grooper

  1. Add AI Search to the Grooper Root node's Repository Options. Enter the URL and admin key for the Azure AI Search service (copied from Azure).
  2. Add an Indexing Behavior on a Content Model.
    • Documents must be classified in Grooper before they can be indexed. Only Document Types/Content Types inheriting an Indexing Behavior are eligible for indexing.
  3. Create the search index by right-clicking the Content Model and selecting "Search > Create Search Index"
    • This creates the search index in Azure. Without the search index created, documents can't be added to an index. This only needs to be done once per index.
  4. Index documents in one of the following ways:
    • Manually, one document at a time by right-clicking the document and using the "Add to Index" command.
    • Using an Execute activity in a Batch Process to apply the "Add to Index" command to all documents in a Batch.
    • Submitting an "Indexing Job" by right-clicking the Content Model (from step 3) and selecting "Search > Submit Indexing Job".
    • Running the Grooper Indexing Service to index documents automatically in the background.
      • BE AWARE: The Grooper Indexing Service periodically polls the Grooper database to determine if the index needs to be updated. If it does, it will submit an "Indexing Job". An Activity Processing service must also be running to fully automate indexing in the background.

Repository Options: AI Search

Repository Options are new to Grooper 2024. They add new functionality to the whole Grooper Repository. These optional features are added using the Options property editor on the Grooper Root node.

To search documents in Grooper, we use Azure's AI Search service. In order to connect to an Azure AI Search service, the AI Search option must be added to the list of Repository Options in Grooper. Here, users will enter the Azure AI Search URL endpoint where calls are issued and an admin's API key. Both of these can be obtained from the Microsoft Azure portal once you have added an Azure AI Search resource.

With AI Search added to your Grooper Repository, you will be able to add an Indexing Behavior to one or more Content Types, create a search index, index documents and search them using the Search Page.

Indexing documents for search

Before documents can be searched, they must be indexed. The search index holds the content you want to search. This includes each document's full OCR or native text obtained from the Recognize activity and can optionally include Data Model results collected from the Extract activity. We use the Azure AI Search Service to create search indexes according to an Indexing Behavior defined for Content Types in Grooper. Documents are made searchable by adding them to a search index. Once indexed, you can search for documents using Grooper's Search page.

The Indexing Behavior: Defines the search index

Before indexing documents, you must add an Indexing Behavior to the Content Types you want to index. Most typically, this will be done on a Content Model. All child Document Types will inherit the Indexing Behavior and its configuration (More complicated Content Models may require Indexing Behaviors configured on multiple Content Types).

The Indexing Behavior defines:

  • The index's name in Azure.
  • Which documents are added to the index.
    • Only documents who are classified as the Indexing Behavior's Content Type OR any of its children Content Types will be indexed.
    • In other words, when set on a Content Model only documents classified as one of its Document Types will be indexed.
  • What fields are added to the search index (including which Data Elements from a Data Model are included, if any).
  • Any options for the search index in the Grooper Search page (included access restriction to the search index).
  • BE AWARE: Once an Indexing Behavior is added to a Content Type, you must use the "Create Search Index" command to create the index in Azure. Do this by right-clicking the Content Type and choosing "Search > Create Search Index".

With the Indexing Behavior defined, and the search index created, now you can start indexing documents.

Adding documents to the search index

Documents may be added to a search index in one of the following ways:

  • Right-clicking the document and using the "Add to Index" command
    This is the most manual way of indexing documents. Documents may also be removed using the "Remove from Index" command.
  • Using an Execute activity in a Batch Process to apply the "Add to Index" command to all documents in a Batch.
    This is a more automated way of indexing documents. It adds documents to a search index at a specific point in a Batch Process.
  • Running the Grooper Indexing Service to index documents automatically in the background.
    This is the most automated way of indexing documents. The Indexing Service periodically polls the Grooper database and adds newly classified documents to the index, updates the index if changes are made (to their extracted data for example), or removes documents from the index if they've been deleted.