Search Classifier (Classify Method): Difference between revisions

Revision as of 11:21, 11 August 2025

STUB

This article is a stub. It contains minimal information on the topic and should be expanded.

Would you like to see this article expanded? Let us know at groopereducation@bisok.com.

"Search Classifier" is a Classify Method that classifies documents (folder Batch Folders) by finding similar documents in a document search index. The Search Classifier method uses an embeddings model and vector similarity to give an unclassified document the same description Document Type as its closest match in the search index.

Best Practices: How to avoid index alignment errors

The Search Classifier method is unique in that it uses both documents stored in Grooper Batches and search index data in AI Search to classify documents. For things to run smoothly, the search index and the documents used for classification in Grooper need to be aligned. If they are not aligned, Search Classifier can error out when attempting to classify documents.

Misalignment Example 1: A document is deleted without updating the search index.
Misalignment Example 2: A document's Document Type is changed without updating the search index.

To avoid errors during a Classify step, you should keep your search index aligned with documents the Search Classifier method uses to determine a document's Document Type. This can be done in one of two ways:

Filter Search Classifier a dedicated set of classification examples.
Search Classifier's "Filter" property allows users to pick a subset of documents in the search index for classification. If you restrict this to a known set of document that you know will never be deleted or changed, you can avoid these types of index alignment errors when other/new documents are more in flux.
Keep the Indexing Service on.
The Indexing Service will continually index documents as they are brought into Grooper, have their data changed or are deleted. This service runs in the background, periodically polling the Grooper Repository to align documents with the search index.

Filter Search Classifier

Do this if you have a small set of documents you want to use for classification examples.

This scenario presumes you have a dedicated set of documents you want to use for classification. These documents will stay in one or more "Classification Example" Batches you create. You should add a hidden Boolean field to your Data Model called something like "Is Training Example". This field should be "False" by default and manually set to "True" by Grooper designers for documents in the "Classification Example" Batches. Then, you can use Search Classifier's Filter property to only use documents whose "Is Training Example" field is "True". This will ensure the Search Classifier never compares unclassified documents to anything in the search index besides documents in the "Classification Example" Batches.

The general steps for this setup would be as follows:

Add a Data Field named "Is Training Example" to the root Data Model for your Content Model (or whichever Content Type has the Indexing Behavior configured).
Make it a Boolean field by setting its "Value Type" to "Boolean".
Make its default value "false" by setting its "Default Value" property to False.

Gather up documents that best represent each of the Document Types in your Content Model.
- You can organize them however you want in your Grooper Repository. However, a good practice is to create one or more "Classification Example" Batches in the Test branch of the Batches folder.
Assign them their correct Document Type.
For each document, change the "Is Training Example" field from False to True.
Add these documents to the search index.
- Ex: Right click each Batch Folder and execute the "Search > Add to Index" command.
Navigate to the Content Model and expand the "Search Classifier" (Classification Method) properties.
Select the "Filter" property and enter this expression: Is_Training_Example eq true
- true must be in all lower case letters.
- Is_Traning_Example is just whatever you named your Boolean Data Field with its name cleansed for the search filter's requirements.
Make the "Is Training Example" Data Field a hidden field by setting its "Visible" property to False.
- This is optional but encouraged. Doing this will obscure this property from Review users and prevent them from accidentally editing it.

This will ensure the Search Classifier only compares documents to those whose "Is Training Example" field is set to "true" when attempting to classify new documents.

Enable the Indexing Service

Do this if you want to classify according to all documents in the search index.

@@ Line 1: / Line 1: @@
 {{stubs}}
 <blockquote>{{#lst:Glossary|Search Classifier}}</blockquote>
+== Best Practices: How to avoid index alignment errors ==
+The Search Classifier method is unique in that it uses both documents stored in Grooper Batches and search index data in AI Search to classify documents. For things to run smoothly, the search index and the documents used for classification in Grooper need to be aligned. If they are not aligned, Search Classifier can error out when attempting to classify documents.
+* Misalignment Example 1: A document is deleted without updating the search index.
+* Misalignment Example 2: A document's Document Type is changed without updating the search index.
+To avoid errors during a Classify step, you should keep your search index aligned with documents the Search Classifier method uses to determine a document's Document Type. This can be done in one of two ways:
+* Filter Search Classifier a dedicated set of classification examples.
+*: Search Classifier's "Filter" property allows users to pick a subset of documents in the search index for classification. If you restrict this to a known set of document that you know will never be deleted or changed, you can avoid these types of index alignment errors when other/new documents are more in flux.
+* Keep the Indexing Service on.
+*: The Indexing Service will continually index documents as they are brought into Grooper, have their data changed or are deleted. This service runs in the background, periodically polling the Grooper Repository to align documents with the search index.
+=== Filter Search Classifier ===
+'''Do this if you have a small set of documents you want to use for classification examples.'''
+This scenario presumes you have a dedicated set of documents you want to use for classification. These documents will stay in one or more "Classification Example" Batches you create. You should add a hidden Boolean field to your Data Model called something like "Is Training Example". This field should be "False" by default and manually set to "True" by Grooper designers for documents in the "Classification Example" Batches. Then, you can use Search Classifier's Filter property to only use documents whose "Is Training Example" field is "True". This will ensure the Search Classifier never compares unclassified documents to anything in the search index besides documents in the "Classification Example" Batches.
+The general steps for this setup would be as follows:
+# Add a Data Field named "Is Training Example" to the root Data Model for your Content Model (or whichever Content Type has the Indexing Behavior configured).
+# Make it a Boolean field by setting its "Value Type" to "Boolean".
+# Make its default value "false" by setting its "Default Value" property to <code>False</code>.
+# Gather up documents that best represent each of the Document Types in your Content Model.
+#*<li class="fyi-bullet"> You can organize them however you want in your Grooper Repository. However, a good practice is to create one or more "Classification Example" Batches in the Test branch of the Batches folder.
+# Assign them their correct Document Type.
+# For each document, change the "Is Training Example" field from False to ''True''.
+# Add these documents to the search index.
+#*<li class="fyi-bullet"> Ex: Right click each Batch Folder and execute the "Search > Add to Index" command.
+# Navigate to the Content Model and expand the "Search Classifier" (Classification Method) properties.
+# Select the "Filter" property and enter this expression: <code>Is_Training_Example eq true</code>
+#*<li class="attn-bullet"> <code>true</code> must be in all lower case letters.
+#*<li class="fyi-bullet"> <code>Is_Traning_Example</code> is just whatever you named your Boolean Data Field with its name cleansed for the search filter's requirements.
+# Make the "Is Training Example" Data Field a hidden field by setting its "Visible" property to ''False''.
+#*<li class="fyi-bullet"> This is optional but encouraged. Doing this will obscure this property from Review users and prevent them from accidentally editing it.
+This will ensure the Search Classifier only compares documents to those whose "Is Training Example" field is set to "true" when attempting to classify new documents.
+=== Enable the Indexing Service ===
+'''Do this if you want to classify according to ''all'' documents in the search index.'''