AI Search: Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Line 5: Line 5:


{{#lst:What's New in Grooper 2024|ai_search_basics}}
{{#lst:What's New in Grooper 2024|ai_search_basics}}
Once you've got indexed documents, you can start searching for documents in the search index!  The Search page allows you to find documents in your search index.
The Search page allows you to build a search query using four components:
* '''''Search''''': This is the only required parameter.  Here, you will enter your search terms, using the Lucene query syntax.
* '''''Filter''''': An optional filter to set inclusion/restriction criteria for documents returned, using the OData syntax.
* '''''Select''''': Optionally selects which fields you want displayed for each document.
* '''''Order By''''': Optionally orders the list of documents returned.
==== Search ====
The '''''Search''''' configuration searches the full text of each document in the index. This uses the Lucene query syntax to return documents.  For a simple search query, just enter a word or phrase (enclosed in quotes <code>""</code>) in the '''''Search''''' editor.  Grooper will return a list of any documents with that word or phrase in their text data.
Lucene also supports several advanced querying features, including:
* Wildcard searches: <code>?</code> and <code>*</code>
*: Use <code>?</code> for a single wildcard character and <code>*</code> for multiple wildcard characters.
* Fuzzy matching: <code>searchTerm~</code>
*: Fuzzy search can only be applied to terms. Fuzzy searched phrases should not be enclosed in quotes. Azure's full fuzzy search documentation can be found here: https://learn.microsoft.com/en-us/azure/search/search-query-fuzzy
* Regular expression matching:  <code>\regex\</code>
*: Enclose a regex pattern in backslashes to incorporate it into the Lucene query.  For example, <code>\\d{3}[a-z]\</code>
* Boolean operators: <code>AND</code> <code>OR</code> <code>NOT</code>
*: Boolean operators can help improve the precision of search query.
* Field searching: <code>fieldName:searchExpression</code>
*: Search built in fields and extracted '''Data Model''' values. For example, <code>Invoice_No:8*</code> would return any document whose extracted "Invoice No" field started with the number "8"
Azure's full documentation of Lucene query syntax can be found here:  https://learn.microsoft.com/en-us/azure/search/query-lucene-syntax
==== Filter ====
First you search, then you filter.  The '''''Filter''''' parameter specifies criteria for documents to be included or excluded from the search results.  This gives users an excellent mechanism to further fine tune their search query.  Commonly, users will want to filter a search set based on the field values.  Both built in index fields and/or values extracted from a '''Data Model''' can be incorporated into the filter criteria.
Azure AI Search uses the OData syntax to define filter expressions.  Azure's full OData syntax documentation can be found here: https://learn.microsoft.com/en-us/azure/search/search-query-odata-filter
==== Select ====
The '''''Select''''' parameter defines what field data is returned in the result list.  You can select any of the built in fields or '''Data Elements''' defined in the '''''Indexing Behavior'''''.  This can be exceptionally helpful when navigating indexes with a large number of fields.  Multiple fields can be selected using a comma separated list (e.g. <code>Field1,Field2,Field3</code>)
==== Order By ====
'''''Order By''''' is an optional parameter that will define how the search results are sorted.
* Any field in the index can be used to sort results. 
* The field's value type will determine how items are sorted. 
** String values are sorted alphabetically.
** Datetime values are sorted by oldest or newest date.
** Numerical value types are sorted smallest to largest or largest to smallest.
* Sort order can be ascending or descending.
** Add <code>asc</code> after the field's name to sort in ascending order.  This is the default direction.
** Add <code>desc</code> after the field's name to sort in ascending order.
* Multiple fields may be used to sort results.
** Separate each sort expression with a comma (e.g. <code>Field1 desc,Field2</code>)
** The leftmost field will be used to sort the full result list first, then it's sub-sorted by the next, then sub-sub-sorted by the next, and so on.

Revision as of 13:55, 23 July 2024

2025 BETA

This article covers new or changed functionality in the current or upcoming beta version of Grooper. Features are subject to change before version 2025's GA release. Configuration and functionality may differ from later beta builds and the final 2025 release.

Create an efficient and effective document search and retrieval mechanism in Grooper using Azure's AI Search service.


Once you've got indexed documents, you can start searching for documents in the search index! The Search page allows you to find documents in your search index.

The Search page allows you to build a search query using four components:

  • Search: This is the only required parameter. Here, you will enter your search terms, using the Lucene query syntax.
  • Filter: An optional filter to set inclusion/restriction criteria for documents returned, using the OData syntax.
  • Select: Optionally selects which fields you want displayed for each document.
  • Order By: Optionally orders the list of documents returned.

Search

The Search configuration searches the full text of each document in the index. This uses the Lucene query syntax to return documents. For a simple search query, just enter a word or phrase (enclosed in quotes "") in the Search editor. Grooper will return a list of any documents with that word or phrase in their text data.


Lucene also supports several advanced querying features, including:

  • Wildcard searches: ? and *
    Use ? for a single wildcard character and * for multiple wildcard characters.
  • Fuzzy matching: searchTerm~
    Fuzzy search can only be applied to terms. Fuzzy searched phrases should not be enclosed in quotes. Azure's full fuzzy search documentation can be found here: https://learn.microsoft.com/en-us/azure/search/search-query-fuzzy
  • Regular expression matching: \regex\
    Enclose a regex pattern in backslashes to incorporate it into the Lucene query. For example, \\d{3}[a-z]\
  • Boolean operators: AND OR NOT
    Boolean operators can help improve the precision of search query.
  • Field searching: fieldName:searchExpression
    Search built in fields and extracted Data Model values. For example, Invoice_No:8* would return any document whose extracted "Invoice No" field started with the number "8"

Azure's full documentation of Lucene query syntax can be found here: https://learn.microsoft.com/en-us/azure/search/query-lucene-syntax

Filter

First you search, then you filter. The Filter parameter specifies criteria for documents to be included or excluded from the search results. This gives users an excellent mechanism to further fine tune their search query. Commonly, users will want to filter a search set based on the field values. Both built in index fields and/or values extracted from a Data Model can be incorporated into the filter criteria.

Azure AI Search uses the OData syntax to define filter expressions. Azure's full OData syntax documentation can be found here: https://learn.microsoft.com/en-us/azure/search/search-query-odata-filter

Select

The Select parameter defines what field data is returned in the result list. You can select any of the built in fields or Data Elements defined in the Indexing Behavior. This can be exceptionally helpful when navigating indexes with a large number of fields. Multiple fields can be selected using a comma separated list (e.g. Field1,Field2,Field3)

Order By

Order By is an optional parameter that will define how the search results are sorted.

  • Any field in the index can be used to sort results.
  • The field's value type will determine how items are sorted.
    • String values are sorted alphabetically.
    • Datetime values are sorted by oldest or newest date.
    • Numerical value types are sorted smallest to largest or largest to smallest.
  • Sort order can be ascending or descending.
    • Add asc after the field's name to sort in ascending order. This is the default direction.
    • Add desc after the field's name to sort in ascending order.
  • Multiple fields may be used to sort results.
    • Separate each sort expression with a comma (e.g. Field1 desc,Field2)
    • The leftmost field will be used to sort the full result list first, then it's sub-sorted by the next, then sub-sub-sorted by the next, and so on.