GPT Embeddings (Classify Method)

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

STUB

This article is a stub. It contains minimal information on the topic and should be expanded.

Would you like to see this article expanded? Let us know at groopereducation@bisok.com.

BE AWARE: GPT Embeddings is obsolete as of version 2025. The LLM Classifier and Search Classifier methods are the new and improved AI-enabled classification methods. GPT Embeddings is a Classify Method that uses an OpenAI embeddings model and trained document samples to tell one document from another.

GPT Embeddings should be considered a BETA feature.

This feature was recently added by the development team without a specific use case in mind.
Rather, it was developed in response to ChatGPT's growing popularity.
While it should work in theory, with no specific use case originating the feature, it has not been extensively tested.
As new use cases emerge that are suited for this feature, this section's documentation will be expanded.

An embedding is a vector (list) of numbers. You can determine the difference between embeddings based on the distance between their vectors. A small distance between embeddings suggests they are highly related. A low distance between the embeddings suggests they are less related.

When using GPT Embeddings to classify documents, you will train the Content Model by giving Grooper example documents for each Document Type. The GPT model will assign the Document Types embeddings based on the text content from each trained document. When documents are classified (using the Classify activity), embeddings from the unclassified document are compared to the trained embedding values for each Document Type. Documents are then assigned the Document Type with the most similar embeddings.

For more information on embeddings, visit the following OpenAI documentation:

https://platform.openai.com/docs/guides/embeddings

⚠

Please be aware embeddings have a maximum number of input tokens per request. This means there is a cutoff point for longer documents. How many input tokens are available depends on the GPT model you're using.

OpenAI recommends using the "text-embedding-ada-002" model for embeddings.
This model has 8191 maximum input tokens available.