|
Tag: Redirect target changed |
| (7 intermediate revisions by 2 users not shown) |
| Line 1: |
Line 1: |
| {{beta}}
| | #Redirect [[AI Search and the Search Page]] |
| <blockquote>
| |
| '''''AI Search''''' is a Grooper '''''Repository Option'''''. Enabling this option creates an efficient and effective document search and retrieval mechanism in Grooper using Azure's AI Search service.
| |
| </blockquote>
| |
| | |
| == About ==
| |
| | |
| '''''AI Search''''' is a Grooper '''''Repository Option'''''. When enabled from the '''Grooper Root''', this gives users the ability to create a document search and retrieval mechanism in Grooper. This integrates Grooper with an existing Azure AI Search service. With this integration, Grooper can:
| |
| * Create a search index for documents of a specified '''Content Type''' ('''Content Model''')
| |
| * Add documents assigned this '''Content Type''' ('''Document Types''' in the '''Content Model''') to the search index.
| |
| * Use the Search page to search for documents in the search index.
| |
| ** Simple full text search is supported.
| |
| ** More advanced search querying is supported through Azure's implementation of the Lucene query syntax and OData filter syntax.
| |
| <section begin="ai_search_basics" />
| |
| === Basic AI Search Setup ===
| |
| | |
| Before you can start using the Search page to search for documents, there's some basic setup you need to perform. Some of these steps are performed outside of Grooper. Most are performed inside of Grooper.
| |
| | |
| <big>Outside of Grooper</big>
| |
| # Create an AI Search service in Azure.
| |
| #*The following article from Microsoft instructs users how to create a Search Service:
| |
| #*: https://learn.microsoft.com/en-us/azure/search/search-create-service-portal
| |
| #* Microsoft's full AI Search documentation is found here:
| |
| #*: https://learn.microsoft.com/en-us/azure/search/
| |
| #*<li class="fyi-bullet"> You will need the Azure AI Search service's "URL" and either "Primary admin key" or "Secondary admin key" for the next step. These values can be found by accessing the Azure Search service from the Azure portal ([https://portal.azure.com/ portal.azure.com]).
| |
| <big>Inside of Grooper</big>
| |
| #<li value=2> Add '''''AI Search''''' to the '''Grooper Root''' node's '''''Repository Options'''''. Enter the URL and admin key for the Azure AI Search service (copied from Azure).
| |
| # Add an '''''Indexing Behavior''''' on a '''Content Model'''.
| |
| #*<li class="fyi-bullet"> Documents must be classified in Grooper before they can be indexed. Only '''Document Types'''/'''Content Types''' inheriting an '''''Indexing Behavior''''' are eligible for indexing.
| |
| # Create the search index. To do this, right-click the '''Content Model''' and select "Search > Create Search Index"
| |
| #*<li class="fyi-bullet"> This creates the search index in Azure. Without the search index created, documents can't be added to an index. This only needs to be done ''once'' per index.
| |
| # Submit an "Indexing Job" to index any documents classified using the '''Content Model''' ''currently'' in the Grooper Repository. To do this, right-click the '''Content Model''' and select "Search > Submit Indexing Job". | |
| #*<li class="fyi-bullet"> This is one of many ways to index documents using AI Search. For a full list (including a ways to automate document indexing) [[#Adding documents to the search index|see below]].
| |
| | |
| ==== Repository Options: AI Search ====
| |
| | |
| '''''Repository Options''''' are new to Grooper 2024. They add new functionality to the whole Grooper Repository. These optional features are added using the '''''Options''''' property editor on the '''Grooper Root''' node.
| |
| | |
| To search documents in Grooper, we use Azure's AI Search service. In order to connect to an Azure AI Search service, the '''''AI Search''''' option must be added to the list of '''''Repository Options''''' in Grooper. Here, users will enter the Azure AI Search URL endpoint where calls are issued and an admin's API key. Both of these can be obtained from the Microsoft Azure portal once you have added an Azure AI Search resource.
| |
| | |
| With '''''AI Search''''' added to your Grooper Repository, you will be able to add an '''''Indexing Behavior''''' to one or more '''Content Types''', create a search index, index documents and search them using the Search Page.
| |
| | |
| ==== Indexing documents for search ====
| |
| | |
| Before documents can be searched, they must be indexed. The search index holds the content you want to search. This includes each document's full OCR or native text obtained from the '''Recognize''' activity and can optionally include '''Data Model''' results collected from the '''Extract''' activity. We use the Azure AI Search Service to create search indexes according to an '''''Indexing Behavior''''' defined for '''Content Types''' in Grooper. Documents are made searchable by adding them to a search index. Once indexed, you can search for documents using Grooper's Search page.
| |
| | |
| ===== The Indexing Behavior: Defines the search index =====
| |
| Before indexing documents, you must add an '''''Indexing Behavior''''' to the '''Content Types''' you want to index. Most typically, this will be done on a '''Content Model'''. All child '''Document Types''' will inherit the '''''Indexing Behavior''''' and its configuration (More complicated '''Content Models''' may require '''''Indexing Behaviors''''' configured on multiple '''Content Types''').
| |
| | |
| | |
| The '''''Indexing Behavior''''' defines:
| |
| * The index's name in Azure.
| |
| * Which documents are added to the index.
| |
| ** Only documents who are classified as the '''''Indexing Behavior's''''' '''Content Type''' OR any of its children '''Content Types''' will be indexed.
| |
| ** In other words, when set on a '''Content Model''' only documents classified as one of its '''Document Types''' will be indexed.
| |
| * What fields are added to the search index (including which '''Data Elements''' from a '''Data Model''' are included, if any).
| |
| * Any options for the search index in the Grooper Search page (included access restriction to the search index).
| |
| | |
| {|class="attn-box"
| |
| |
| |
| ⚠
| |
| |
| |
| '''''BE AWARE:''''' Once an '''''Indexing Behavior''''' is added to a '''Content Type''', you must use the "Create Search Index" command to create the index in Azure. Do this by right-clicking the '''Content Type''' and choosing "Search > Create Search Index".
| |
| |}
| |
| | |
| With the '''''Indexing Behavior''''' defined, and the search index created, now you can start indexing documents.
| |
| | |
| ===== Adding documents to the search index =====
| |
| | |
| Documents may be added to a search index in one of the following ways:
| |
| * Using the "'''''Add to Index'''''" command.
| |
| ** This is the most "manual" way of doing things.
| |
| **Select one or more documents, right-click them and select "Search > Add to Index" to add only the selected documents to the search index.
| |
| ** Documents may also be manually removed from the search index in this way by using the "Remove From Index" command.
| |
| * Using the "'''''Submit Indexing Job'''''" command.
| |
| ** This is a manual way of indexing all existing documents for the '''Content Model'''.
| |
| ** The Indexing Job will add newly classified documents to the index, update the index if changes are made (to their extracted data for example), and remove documents from the index if they've been deleted.
| |
| ** Select the '''Content Model''', right-click it and select "Search > Submit Indexing Job".
| |
| * Using an '''''Execute''''' activity in a '''Batch Process''' to apply the "'''''Add to Index'''''" command to all documents in a '''Batch'''.
| |
| ** This is one way to automate document indexing.
| |
| ** Bear in mind, if documents or their data change ''after'' this step would run, they would still need to be re-indexed ''after'' changes are made.
| |
| * Running the Grooper '''Indexing Service''' to index documents automatically in the background.
| |
| ** This is the most automated way to index documents.
| |
| ** The Grooper '''Indexing Service''' periodically polls the Grooper database to determine if the index needs to be updated. If it does, it will submit an "Indexing Job".
| |
| ** The Indexing Job will add newly classified documents to the index, update the index if changes are made (to their extracted data for example), and remove documents from the index if they've been deleted.
| |
| ** The '''''Indexing Behavior's''''' '''''Auto Index''''' property must also be enabled for the '''Indexing Service''' to sub
| |
| **<li class="attn-bullet"> '''''BE AWARE:''''' An '''Activity Processing''' service '''''must also''''' be running to fully automate indexing in the background.
| |
| <section end="ai_search_basics" />
| |
| | |
| === The Search Page ===
| |
| Once you've got indexed documents, you can start searching for documents in the search index! The Search page allows you to find documents in your search index.
| |
| | |
| The Search page allows you to build a search query using four components:
| |
| * '''''Search''''': This is the only required parameter. Here, you will enter your search terms, using the Lucene query syntax.
| |
| * '''''Filter''''': An optional filter to set inclusion/restriction criteria for documents returned, using the OData syntax.
| |
| * '''''Select''''': Optionally selects which fields you want displayed for each document.
| |
| * '''''Order By''''': Optionally orders the list of documents returned.
| |
| | |
| ==== Search ====
| |
| | |
| The '''''Search''''' configuration searches the full text of each document in the index. This uses the Lucene query syntax to return documents. For a simple search query, just enter a word or phrase (enclosed in quotes <code>""</code>) in the '''''Search''''' editor. Grooper will return a list of any documents with that word or phrase in their text data.
| |
| | |
| | |
| Lucene also supports several advanced querying features, including:
| |
| * Wildcard searches: <code>?</code> and <code>*</code>
| |
| *: Use <code>?</code> for a single wildcard character and <code>*</code> for multiple wildcard characters.
| |
| * Fuzzy matching: <code>searchTerm~</code>
| |
| *: Fuzzy search can only be applied to terms. Fuzzy searched phrases should not be enclosed in quotes. Azure's full fuzzy search documentation can be found here: https://learn.microsoft.com/en-us/azure/search/search-query-fuzzy
| |
| * Regular expression matching: <code>\regex\</code>
| |
| *: Enclose a regex pattern in backslashes to incorporate it into the Lucene query. For example, <code>\\d{3}[a-z]\</code>
| |
| * Boolean operators: <code>AND</code> <code>OR</code> <code>NOT</code>
| |
| *: Boolean operators can help improve the precision of search query.
| |
| * Field searching: <code>fieldName:searchExpression</code>
| |
| *: Search built in fields and extracted '''Data Model''' values. For example, <code>Invoice_No:8*</code> would return any document whose extracted "Invoice No" field started with the number "8"
| |
| | |
| Azure's full documentation of Lucene query syntax can be found here: https://learn.microsoft.com/en-us/azure/search/query-lucene-syntax
| |
| | |
| ==== Filter ====
| |
| | |
| First you search, then you filter. The '''''Filter''''' parameter specifies criteria for documents to be included or excluded from the search results. This gives users an excellent mechanism to further fine tune their search query. Commonly, users will want to filter a search set based on the field values. Both built in index fields and/or values extracted from a '''Data Model''' can be incorporated into the filter criteria.
| |
| | |
| Azure AI Search uses the OData syntax to define filter expressions. Azure's full OData syntax documentation can be found here: https://learn.microsoft.com/en-us/azure/search/search-query-odata-filter
| |
| | |
| ==== Select ====
| |
| | |
| The '''''Select''''' parameter defines what field data is returned in the result list. You can select any of the built in fields or '''Data Elements''' defined in the '''''Indexing Behavior'''''. This can be exceptionally helpful when navigating indexes with a large number of fields. Multiple fields can be selected using a comma separated list (e.g. <code>Field1,Field2,Field3</code>)
| |
| | |
| ==== Order By ====
| |
| | |
| '''''Order By''''' is an optional parameter that will define how the search results are sorted.
| |
| * Any field in the index can be used to sort results.
| |
| * The field's value type will determine how items are sorted.
| |
| ** String values are sorted alphabetically.
| |
| ** Datetime values are sorted by oldest or newest date.
| |
| ** Numerical value types are sorted smallest to largest or largest to smallest.
| |
| * Sort order can be ascending or descending.
| |
| ** Add <code>asc</code> after the field's name to sort in ascending order. This is the default direction.
| |
| ** Add <code>desc</code> after the field's name to sort in ascending order.
| |
| * Multiple fields may be used to sort results.
| |
| ** Separate each sort expression with a comma (e.g. <code>Field1 desc,Field2</code>)
| |
| ** The leftmost field will be used to sort the full result list first, then it's sub-sorted by the next, then sub-sub-sorted by the next, and so on.
| |