2024:AI Search and the Search Page: Difference between revisions

From Grooper Wiki
initial post // via Wikitext Extension for VSCode
edits made for brevity // via Wikitext Extension for VSCode
Line 8: Line 8:
Put simply, Azure AI Search will make it easier to keep your documents in '''Grooper'''. To understand how, let's first understand what '''Grooper''' has been.
Put simply, Azure AI Search will make it easier to keep your documents in '''Grooper'''. To understand how, let's first understand what '''Grooper''' has been.
Historically '''Grooper''' has been a transient platform for document processing: documents come in, data is collected from those documents, then the data and documents are pushed out of Grooper to some place. It has never been a place to store documents and/or their data.
Historically '''Grooper''' has been a transient platform for document processing:  
* documents come in
* data is collected from those documents
* the data and documents are pushed out of '''Grooper''' to some place
 
It has never been a place to store documents and/or their data.
While it has been possible to keep '''Batches''' and their content in '''Grooper''' it has never been a best practice, nor has it been convenient, to do so. You could, theoretically, devise some kind of hierarchical foldering and naming convention by which you organize '''Batches''' in the node tree, but this is very time consuming and is probably not even that useful. Say you wanted to retrieve all "Invoices" that have a "Total Amount" over "$1,000.00". Without "indexing" the documents and their data, and the ability to "query" those indices, this would be extremely time consuming at best.
While it has been possible to keep '''Batches''' and their content in '''Grooper''' it has never been a best practice, nor has it been convenient, to do so. You could, theoretically, devise some kind of hierarchical foldering and naming convention by which you organize '''Batches''' in the node tree, but this is very time consuming and is probably not even that useful. Say you wanted to retrieve all "Invoices" that have a "Total Amount" over "$1,000.00". Without "indexing" the documents and their data, and the ability to "query" that index, this would be extremely time consuming at best.
With Azure AI Search you will be able to quickly and efficiently index your documents and their data to allow for ease of retieval as well as gain a deeper understanding of them.
With Azure AI Search you will be able to quickly and efficiently index your documents and their data to allow for ease of retrieval as well as gain a deeper understanding of them.
<div style="padding-left: 1.5em;">
<div style="padding-left: 1.5em;">
=== Microsoft Azure AI Search ===
=== Microsoft Azure AI Search ===
[https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search Azure AI Search], formerly known as Azure Cognitive Search, is a cloud-based search-as-a-service solution provided by [https://en.wikipedia.org/wiki/Microsoft_Azure Microsoft Azure]. It allows developers to build sophisticated search experiences into custom applications. Here are some key features and capabilities:
[https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search Azure AI Search], formerly known as Azure Cognitive Search, is a cloud-based search-as-a-service solution provided by [https://en.wikipedia.org/wiki/Microsoft_Azure Microsoft Azure]. It has allowed our developers to build a sophisticated search experience into '''Grooper'''. Here are some key features and capabilities:
* '''Full-Text Search''': Azure AI Search supports full-text search with capabilities like faceting, filtering, and scoring, allowing users to search through large volumes of text efficiently.
* '''Full-Text Search''': Azure AI Search supports full-text search with capabilities like faceting, filtering, and scoring, allowing users to search through large volumes of text efficiently.
* '''Cognitive Skills Integration''': It integrates with [https://azure.microsoft.com/en-us/products/ai-services/ Azure AI Services] to apply AI skills such as image recognition, language understanding, and text extraction to the indexed content. This makes it possible to enhance search results with AI-driven insights.
* '''Customizable Indexing''': Developers can define custom indexes tailored to their specific data schema. This flexibility allows for a more relevant and precise search experience.
* '''Customizable Indexing''': Developers can define custom indexes tailored to their specific data schema. This flexibility allows for a more relevant and precise search experience.
* '''Faceted Navigation''': The service supports faceted navigation, enabling users to filter and drill down into search results based on predefined categories or attributes.
* '''Faceted Navigation''': The service supports faceted navigation, enabling users to filter and drill down into search results based on predefined categories or attributes.
* '''Synonym Mapping''': Azure AI Search includes synonym maps, which help handle variations in user queries by treating different terms with similar meanings as equivalent.
* '''Search Analytics''': It provides insights into search patterns and behaviors, allowing developers to optimize the search experience based on user interactions.
* '''Search Analytics''': It provides insights into search patterns and behaviors, allowing developers to optimize the search experience based on user interactions.
* '''Scalability''': The service can scale up or down based on the workload, making it suitable for applications of all sizes.
* '''Scalability''': The service can scale up or down based on the workload, making it suitable for applications of all sizes.
* '''Security and Compliance''': Azure AI Search ensures data security and compliance with industry standards, offering features like [https://en.wikipedia.org/wiki/Role-based_access_control role-based access control (RBAC)], data encryption, and integration with [https://en.wikipedia.org/wiki/Active_Directory Active Directory].
* '''Security and Compliance''': Azure AI Search ensures data security and compliance with industry standards, offering features like [https://en.wikipedia.org/wiki/Role-based_access_control role-based access control (RBAC)], data encryption, and integration with [https://en.wikipedia.org/wiki/Active_Directory Active Directory].
* '''Geospatial Search''': It supports geospatial search capabilities, allowing users to perform location-based searches and filter results based on geographical data.
* '''APIs and SDKs''': Azure AI Search provides [https://en.wikipedia.org/wiki/REST REST APIs] and client libraries for various programming languages, making it easy to integrate with different types of applications.
* '''APIs and SDKs''': Azure AI Search provides [https://en.wikipedia.org/wiki/REST REST APIs] and client libraries for various programming languages, making it easy to integrate with different types of applications.
Azure AI Search is used in a variety of applications, including e-commerce sites, enterprise search portals, document management systems, and any other scenario where efficient and effective search capabilities are required.
=== Relevance of Azure AI Search with Grooper ===
* '''Enhanced Search Capabilities''': Azure AI Search provides powerful full-text search functionalities that can be used to index and search large volumes of documents processed by Grooper.
* '''Cognitive Skills''': Azure AI Search's cognitive skills can augment '''Grooper's'' capabilities by applying AI to extract insights, recognize entities, and understand the context within documents. This can enhance the data extraction and classification processes in Grooper.
* '''Scalability''': Azure AI Search's ability to scale with the workload makes it suitable for handling the dynamic and often large-scale document processing tasks managed by Grooper.
* '''Advanced Filtering and Faceting''': With features like faceting and filtering, users can refine their search results efficiently, making it easier to locate specific documents or information within a large dataset.


=== Integration with Grooper ===
=== Integration with Grooper ===
* '''API Integration''': '''Grooper''' can leverage Azure AI Search's REST APIs to automate the indexing of documents and retrieval of search results. This integration can be built into Grooper's workflow to ensure seamless data processing and search capabilities.
* '''API Integration''': '''Grooper''' can leverage Azure AI Search's REST APIs to automate the indexing of documents and retrieval of search results. This integration can be built into '''Grooper's''' workflow to ensure seamless data processing and search capabilities.
* '''Security and Compliance''': Both '''Grooper''' and Azure AI Search offer robust security features. Integrating these ensures that document processing and search operations are secure and compliant with industry standards.
* '''Security and Compliance''': Both '''Grooper''' and Azure AI Search offer robust security features. Integrating these ensures that document processing and search operations are secure and compliant with industry standards.
* '''Indexing Processed Documents''': Once '''Grooper''' processes and extracts data from documents, this data can be sent to Azure AI Search for indexing. This allows users to search through the processed data quickly and efficiently.
* '''Indexing Processed Documents''': Once '''Grooper''' processes and extracts data from documents, this data can be sent to Azure AI Search for indexing. This allows users to search through the processed data quickly and efficiently.
* '''Querying Indexed Documents and Data''': Once Azure Ai Search has indexed documents and their data from '''Grooper''', user's can leverage powerful query syntax like [https://learn.microsoft.com/en-us/azure/search/query-lucene-syntax Lucene] and [https://learn.microsoft.com/en-us/odata/overview OData] to efficiently retrieve the information from their documents.
** Indexing is an intake process that loads content into Azure AI Search service and makes it searchable. Through Azure AI Search, inbound text is processed into tokens and stored in inverted indexes, and inbound vectors are stored in vector indexes. The document format that Azure AI Search can index is [https://en.wikipedia.org/wiki/JSON JSON].
 
* '''Querying Indexed Documents and Data''': Once Azure AI Search has indexed documents and their data from '''Grooper''', user's can leverage powerful query syntax like [https://learn.microsoft.com/en-us/azure/search/query-lucene-syntax Lucene] and [https://learn.microsoft.com/en-us/odata/overview OData] to efficiently retrieve the information from their documents.
=== Examkple Workflow ===
** Querying can happen once an index is populated with searchable content, when '''Grooper''' sends query requests to a search service and handles responses. All query execution is over a search index that you control.
# '''Document Acquisition''': '''Grooper''' acquires documents from various sources (scanned images, [https://en.wikipedia.org/wiki/PDF PDFs], [https://en.wikipedia.org/wiki/Email emails], etc.).
# '''Data Extraction''': '''Grooper''' processes these documents to extract structured data (text, [https://en.wikipedia.org/wiki/Metadata metadata], images, etc.).
# '''Transformation and Enrichment''': The extracted data can be enriched using '''Grooper's''' capabilities or Azure AI Services.
# '''Indexing''': The processed and enriched data is sent to Azure AI Search for indexing.
# '''Search and Retrieval''': Users can perform searches on the indexed data using Azure AI Search's advanced search features. The results can be used within '''Grooper's''' '''Search''' interface.
</div>
</div>
== How To ==
== How To ==
Line 56: Line 46:


=== Index Documents and Data from Grooper ===
=== Index Documents and Data from Grooper ===
<div style="padding-left: 1.5em;">
==== "Add to Index" Batch Folder Object Command ====


==== "Create Search Index" Content Type Object Command ====
==== Execute Activity with "Add to Index" Command ====
==== Indexing Service ====
</div>
=== Use the Search Page ===
=== Use the Search Page ===
</div>
</div>

Revision as of 08:58, 13 August 2024

2025 BETA

This article covers new or changed functionality in the current or upcoming beta version of Grooper. Features are subject to change before version 2025's GA release. Configuration and functionality may differ from later beta builds and the final 2025 release.

Glossary

About

Put simply, Azure AI Search will make it easier to keep your documents in Grooper. To understand how, let's first understand what Grooper has been.

Historically Grooper has been a transient platform for document processing:

  • documents come in
  • data is collected from those documents
  • the data and documents are pushed out of Grooper to some place

It has never been a place to store documents and/or their data.

While it has been possible to keep Batches and their content in Grooper it has never been a best practice, nor has it been convenient, to do so. You could, theoretically, devise some kind of hierarchical foldering and naming convention by which you organize Batches in the node tree, but this is very time consuming and is probably not even that useful. Say you wanted to retrieve all "Invoices" that have a "Total Amount" over "$1,000.00". Without "indexing" the documents and their data, and the ability to "query" that index, this would be extremely time consuming at best.

With Azure AI Search you will be able to quickly and efficiently index your documents and their data to allow for ease of retrieval as well as gain a deeper understanding of them.

Microsoft Azure AI Search

Azure AI Search, formerly known as Azure Cognitive Search, is a cloud-based search-as-a-service solution provided by Microsoft Azure. It has allowed our developers to build a sophisticated search experience into Grooper. Here are some key features and capabilities:

  • Full-Text Search: Azure AI Search supports full-text search with capabilities like faceting, filtering, and scoring, allowing users to search through large volumes of text efficiently.
  • Customizable Indexing: Developers can define custom indexes tailored to their specific data schema. This flexibility allows for a more relevant and precise search experience.
  • Faceted Navigation: The service supports faceted navigation, enabling users to filter and drill down into search results based on predefined categories or attributes.
  • Search Analytics: It provides insights into search patterns and behaviors, allowing developers to optimize the search experience based on user interactions.
  • Scalability: The service can scale up or down based on the workload, making it suitable for applications of all sizes.
  • Security and Compliance: Azure AI Search ensures data security and compliance with industry standards, offering features like role-based access control (RBAC), data encryption, and integration with Active Directory.
  • APIs and SDKs: Azure AI Search provides REST APIs and client libraries for various programming languages, making it easy to integrate with different types of applications.

Integration with Grooper

  • API Integration: Grooper can leverage Azure AI Search's REST APIs to automate the indexing of documents and retrieval of search results. This integration can be built into Grooper's workflow to ensure seamless data processing and search capabilities.
  • Security and Compliance: Both Grooper and Azure AI Search offer robust security features. Integrating these ensures that document processing and search operations are secure and compliant with industry standards.
  • Indexing Processed Documents: Once Grooper processes and extracts data from documents, this data can be sent to Azure AI Search for indexing. This allows users to search through the processed data quickly and efficiently.
    • Indexing is an intake process that loads content into Azure AI Search service and makes it searchable. Through Azure AI Search, inbound text is processed into tokens and stored in inverted indexes, and inbound vectors are stored in vector indexes. The document format that Azure AI Search can index is JSON.
  • Querying Indexed Documents and Data: Once Azure AI Search has indexed documents and their data from Grooper, user's can leverage powerful query syntax like Lucene and OData to efficiently retrieve the information from their documents.
    • Querying can happen once an index is populated with searchable content, when Grooper sends query requests to a search service and handles responses. All query execution is over a search index that you control.

How To

Create an Azure AI Search Service

Please refer to the following MSDN article about how to create an Azure AI Search service via their portal.

Configure the AI Search Repository Option

Configure an Indexing Behavior on a Content Type

Index Documents and Data from Grooper

"Add to Index" Batch Folder Object Command

"Create Search Index" Content Type Object Command

Execute Activity with "Add to Index" Command

Indexing Service

Use the Search Page