2023:GPT Lookup (Lookup Specification)

From Grooper Wiki

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252023

GPT Lookup is a Lookup Specification that performs a lookup using an OpenAI GPT model. PLEASE NOTE: GPT Lookup should be considered a "beta" feature. It was implemented as a prototype and has not been extensively tested.

About

Lookup Properties

Following are brief descriptions of properties that are unique to GPT Lookup.

API Key

You must fill this property with a valid API key from OpenAI in order to leverage GPT integration with Grooper. See the Obtain an API Key section of the GPT Integration article for instruction on how to get a key.

Model

The API Key you use will determine which GPT models are available to you. The different GPT models can affect the text generated based on their size, training data, capabilities, prompt engineering, and fine-tuning potential. GPT-3's larger size and training data, in particular, can potentially result in more sophisticated, diverse, and contextually appropriate text compared to GPT-2. However, the actual performance and quality of the generated text also depend on various other factors, such as prompt engineering, input provided, and specific use case requirements. GPT-4 is the latest version, as of this writing, and takes the GPT model even further.

Parameters

Temperature

In the context of text generation using language models like ChatGPT, the temperature parameter is a setting that controls the randomness or randomness of the generated text. It is used during the sampling process, where the model selects the next word or token to generate based on its predicted probabilities.

When generating text, the language model assigns probabilities to different words or tokens based on their likelihood of occurring next in the context of the input text. The temperature parameter is used to scale these probabilities before sampling from them. A higher temperature value (e.g., 1.0) makes the probabilities more uniform and increases randomness, resulting in more varied and diverse text. On the other hand, a lower temperature value (e.g., 0.2) makes the probabilities more concentrated and biased towards the most likely word, resulting in more deterministic and focused text.

For example, with a higher temperature setting, the model may generate sentences like:

"The weather is hot and sunny. I love to go swimming or hiking."

With a lower temperature setting, the model may generate sentences like:

"The weather is hot. I love to go swimming."

The choice of temperature parameter depends on the desired output. Higher values are useful when you want more creativity and diversity in the generated text, but it may lead to less coherent or nonsensical sentences. Lower values are useful when you want more deterministic and focused text, but it may result in repetitive or overly conservative output. It's a hyperparameter that can be tuned to achieve the desired balance between randomness and coherence in the generated text.

TopP

TopP, also known as "nucleus sampling" or "stochastic decoding with dynamic vocabulary," is a text generation technique that is used to improve the diversity and randomness of generated text. It is often used as an alternative to traditional approaches like random sampling or greedy decoding in language models, such as GPT-2 and GPT-3.

In TopP sampling, instead of sampling from the entire probability distribution of possible next words or tokens, the model narrows down the choices to a subset of the most likely options. The subset is determined dynamically based on a predefined probability threshold, denoted as "p". The model considers only the words or tokens whose cumulative probability mass (probability of occurrence) falls within the top "p" value. The remaining words or tokens with lower probabilities are pruned from the selection.

Mathematically, given a probability distribution over all possible words or tokens, TopP sampling works as follows:

  1. Compute the cumulative distribution function (CDF) of the probabilities for the given distribution.
  2. Sort the probabilities in descending order and calculate the cumulative sum of probabilities from highest to lowest.
  3. Stop when the cumulative sum exceeds the threshold "p". So 0.1 means only the tokens comprising the top 10% probability mass are considered.
  4. The remaining set of words or tokens whose probabilities fall within the threshold "p" is considered for sampling.

By using TopP sampling, the model can generate text that is more diverse, as it allows for the possibility of selecting less frequent or rarer words or tokens, and it introduces randomness in the selection process. It can prevent the model from becoming overly deterministic or repetitive in its generated output, leading to more creative and varied text generation results.

Presence Penalty

The "presence penalty" is a technique used in text generation to encourage the model to generate more concise and focused outputs by penalizing the repetition of the same words or tokens in the generated text. It is a regularization technique that aims to reduce redundancy and promote diversity in the generated output.

The presence penalty is typically implemented as an additional term in the loss function during the training process of a language model. This term penalizes the model for generating the same words or tokens multiple times within a short span of text. The presence penalty can be formulated in different ways, depending on the specific model architecture and objectives, but the general idea is to assign a higher loss or penalty when the model generates repetitive or redundant text.

The presence penalty encourages the model to generate text that is more concise, avoids repetitive patterns, and promotes the use of a wider vocabulary. It helps prevent the model from generating overly verbose or redundant text, which can be undesirable in certain text generation tasks, such as story generation or summarization.

The magnitude of the presence penalty can be tuned to control the level of repetition allowed in the generated text. A higher penalty value would result in stricter avoidance of repetition, while a lower penalty value would allow for more repetition. The presence penalty is one of the techniques that can be used in combination with other regularization methods, such as temperature scaling, top-k sampling, or fine-tuning, to improve the quality and diversity of generated text.

Frequency Penalty

Frequency-based regularization techniques in text generation can refer to methods that aim to control the distribution of word or token frequencies in the generated text. This can be achieved by adding penalties or constraints to the model during training, such as limiting the occurrence of certain words or tokens, promoting the use of less frequent words or tokens, or controlling the balance of word or token frequencies in the generated text.

Response Format

This specifies the format in which data will be exchanged with the web service. Can be one of the following values:

  • Text - The response will be plain text. Record and value selectors should be specified using regular expressions.
  • JSON - The response will be in JSON format. Record and value selectors should be specified using JSONPath syntax.
  • XML - The request and response body will be in XML format. Record and value selectors should be specified using XPath syntax.

The format selected here will be used both for sending POST data and interpreting responses. It is currently not possible to send an XML request then interpret the response as JSON, or vice-versa.

Record Selector

This is a JSONPath or XPath expression which selects records in the response.

The record selector is used to specify which JSON or XML entities represent records in the result set.

JSON Notes
In a JSON response, the Record Selector may be used as follows:
  • If the selector matches an array, one record will be generated for each element of the array.
  • If the selector matches one or more objects, one record will be generated for each object.
  • Leave the property empty to select an array or object at the root of the JSON document.
XML Notes
In an XML response, the Record Selector may be used as follows:
  • One record will be generated for each XML element matched by the selector.
  • Leave the property empty to select a singleton record at the root of the XML.

Timeout

The amount of time, in seconds, to wait for a response from the web service before raising a timeout error.

How To

Following is a simple of example that will demonstrate how to use the GPT Lookup functionality. As with everything else regarding GPT Integration in Grooper 2023, this is fairly untested and needs more experimentation to see its full potential. If nothing else, this example is intended to give you a basic understanding of how to establish the lookup so you can try things out on your own.

  1. Start by deleting all other fields in the example Data Model other than "Lessor" and "Lessee".
    • This is meant to reduce the number of calls you will be making to OpenAI for GPT results as "Lessor" and "Lessee" are the only Data Fields that will be leveraged in the following lookup example.


  1. Right-click the Data Model.
  2. Add a Data Field.


  1. Name it "Letter of Thanks".
  2. Click the "Execute" button.


  1. With the newly created Data Field object selected, set the Display Width property to 500.
  2. Set the Multi-line property to Enabled.
  3. Expand the sub-properties of the Multi Line property and set the Display Lines property to 15.
  4. Set the Word Wrap property to True.


  1. Select the Data Model.
  2. Click the ellipsis button on the Lookups property.


  1. In the "Lookups" window, click the "Add new lookups specification" button.
  2. Select the "GPT Lookup" option.


  1. With the "GPT Lookup" added to the "List of Lookup Specification" and selected, paste in your API key to the API Key property.
  2. Click the ellipsis button for the Prompt property.


  1. In the "Prompt" editor, type the following string:
    • Write a letter of thanks regarding the ease of purchase and clean state of the property from @Lessor to @Lessee.
    • As you type this out (if you do instead of copy pasting) you will notice intellisense pop-up for when you use the @ symbol. Using the @ symbol allows you to leverage elements from your Data Model when creating your lookup.
  2. When you have completed writing your prompt, click the "OK" button.


  1. Click the ellipsis button for the Value Selectors property.


  1. In the "Value Selectors" window click the "Add new value selector" button.


  1. With "Value Selector" added to the "list of Value Selector" and selected, click the drop-down button for the Target Field property.
  2. Select the "Letter_of_Thanks" field.
    • Based on this configuration, the value generated by our prompt from our lookup will populate this field with the information generated by GPT.


  1. Back in the "Lookups" menu, scroll down in the property grid, and in the "Lookup Options" area click the drop-down button for the Trigger Mode property.
  2. Because @ symbols are being used in the prompt to leverage elements from the Data Model the Conditional setting should be selected.


  1. At the bottom of the property grid notice the Lookup Fields and Target Fields are populated because elements were targeted in the prompt, and a field was targeted with the Value Selectors property.
  2. Click "OK" to close this menu.


  1. With the lookup configured it's time to test. Click the "Tester" tab.
  2. Select "Folder (1)" from the "GPT Complete Examples" batch.
  3. Click the "Test the data element" button.
  4. Notice the "Lessee" value is successfully returned ...
  5. ... and that it is being leveraged as the salutation in the value created for the "Letter of Thanks" field.


  1. Also notice the "Lessor" value being returned ...
  2. ... and that it is being leveraged as the complementary close in the value created for the "Letter of Thanks" field.
  3. Feel free to take a look at the text created for the letter from the GPT AI.