Search Classifier (Classify Method)

STUB

This article is a stub. It contains minimal information on the topic and should be expanded.

Would you like to see this article expanded? Let us know at groopereducation@bisok.com.

"Search Classifier" is a Classify Method that classifies documents (folder Batch Folders) by finding similar documents in a document search index. The Search Classifier method uses an embeddings model and vector similarity to give an unclassified document the same description Document Type as its closest match in the search index.

Best Practices: How to avoid index alignment errors

The Search Classifier method is unique in that it uses both documents stored in Grooper Batches and search index data in AI Search to classify documents. For things to run smoothly, the search index and the documents used for classification in Grooper need to be aligned. If they are not aligned, Search Classifier can error out when attempting to classify documents.

Misalignment Example 1: A document is deleted without updating the search index.
Misalignment Example 2: A document's Document Type is changed without updating the search index.

To avoid errors during a Classify step, you should keep your search index aligned with documents the Search Classifier method uses to determine a document's Document Type. This can be done in one of two ways:

Filter Search Classifier a dedicated set of classification examples.
Search Classifier's "Filter" property allows users to pick a subset of documents in the search index for classification. If you restrict this to a known set of document that you know will never be deleted or changed, you can avoid these types of index alignment errors when other/new documents are more in flux.
Keep the Indexing Service on.
The Indexing Service will continually index documents as they are brought into Grooper, have their data changed or are deleted. This service runs in the background, periodically polling the Grooper Repository to align documents with the search index.

Filter Search Classifier

Do this if you have a small set of documents you want to use for classification examples.

This scenario presumes you have a dedicated set of documents you want to use for classification. These documents will stay in one or more "Classification Example" Batches you create. You should add a hidden Boolean field to your Data Model called something like "Is Training Example". This field should be "False" by default and manually set to "True" by Grooper designers for documents in the "Classification Example" Batches. Then, you can use Search Classifier's Filter property to only use documents whose "Is Training Example" field is "True". This will ensure the Search Classifier never compares unclassified documents to anything in the search index besides documents in the "Classification Example" Batches.

The general steps for this setup would be as follows:

Add a Data Field named "Is Training Example" to the root Data Model for your Content Model (or whichever Content Type has the Indexing Behavior configured).
Make it a Boolean field by setting its "Value Type" to "Boolean".
Make its default value "false" by setting its "Default Value" property to False.

Gather up documents that best represent each of the Document Types in your Content Model.
- You can organize them however you want in your Grooper Repository. However, a good practice is to create one or more "Classification Example" Batches in the Test branch of the Batches folder.
Assign them their correct Document Type.
For each document, change the "Is Training Example" field from False to True.
Add these documents to the search index.
- Ex: Right click each Batch Folder and execute the "Search > Add to Index" command.
Navigate to the Content Model and expand the "Search Classifier" (Classification Method) properties.
Select the "Filter" property and enter this expression: Is_Training_Example eq true
- true must be in all lower case letters.
- Is_Traning_Example is just whatever you named your Boolean Data Field with its name cleansed for the search filter's requirements.
Make the "Is Training Example" Data Field a hidden field by setting its "Visible" property to False.
- This is optional but encouraged. Doing this will obscure this property from Review users and prevent them from accidentally editing it.

This will ensure the Search Classifier only compares documents to those whose "Is Training Example" field is set to "true" when attempting to classify new documents.

Enable the Indexing Service

Do this if you want to classify according to all documents in the search index.

The Indexing Service is a Grooper service that runs in the background, continuously updating your search index(es). Installing this service will keep your index up to date as documents flow through a Grooper Repository. It continuously polls the Grooper Repository, looking for documents that need to be added to or removed from the search index and whos search index data needs to be changed.

Importantly, having an Indexing Service running will help avoid issues with Search Classifier throwing errors when a document in the Grooper Repository is misaligned with its values in the search index. The Indexing Service will automatically update the search index when:

Documents are assigned a Content Type (e.g. Document Type)
A document's Content Type changes.
A document is deleted.

With an up-to-date search index that accurately matches documents in the Grooper Repository, Search Classifier will not throw an error due to an index being misaligned with documents in Grooper.

You can still use the filtering method described in the previous section and run an Indexing Service.
Be aware: There will be a brief time between the Indexing Service's polling cycles and the time it takes to index new entries where the Grooper Repository is technically misaligned with the search index. However, this is unlikely to cause issues in real world scenarios.

To install an Indexing Service:

Open Grooper Command Console (GCC).
- GCC must be run as an administrator.
- GCC can be accessed from the Windows Start menu.
- Or, the executable gcc.exe can be found in the Grooper install directory.
Use the following GCC command to install the Indexing Service:
```
services install <connectionNo> IndexingService <userName> <password>
```
- <connectionNo> is a required parameter. Enter the connection number for the Grooper Repository using the service. If you don't know the connection number, enter the connections list command for a list of all Grooper Repository connections.
- <userName> is a required parameter. Enter the user name to run the service under. This user must have the "Log on as Service" permission in Windows.
- <password> is a required parameter. Enter the password for the provided user name.
Enter ? to prompt the user for their password. This will mask the entered password.
After attempting the install GCC will present an installation log. At the end of this log it will inform you if:
- The service was successfully installed.
- Or, the service installation FAILED.
Verify you also have an Activity Processing service installed.
- After polling the Grooper Repository, the Indexing Service creates a "Processing Job" to update the search index. An Activity Processing service needs to be running in order to start and complete the Processing Job.
- To install an Activity Processing service, execute this command in GCC:
```
services install <connectionNo> ActivityProcessing <userName> <password> [threadCount] [queueName]
```
  - [threadCount] and [queueName] are optional.
  - If you want to specify a specific thread count, replace [threadCount] with an appropriate integer. Not setting an integer here will assume the default setting of "multiple" threads.
  - If you want to specify a Processing Queue], replace [queueName] with a Processing Queue's name. Leaving this blank will assume the Default processing queue.
Start both the Indexing Service and Activity Processing services.
- Either from GCC with the services start command or from the Design page using the "Machines" node.