2023:GPT Integration (Concept): Difference between revisions

From Grooper Wiki
No edit summary
 
(9 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{AutoVersion}}
__NOINDEX__
{|class="important-box"
|
'''!!'''
|
'''LEGACY TECHNOLOGY DETECTED!!'''
 
"GPT Integration" refers to Grooper's early attempts at integrating OpenAI's GPT models into the product. The information in this article is largely obsolete.


For more current information on Grooper's integration with AI technologies, refer to the following resources:
* [[Grooper and AI]] - An overview of Grooper's AI integrations.
* [[Ask AI]] - An LLM-based extractor.
* [[AI Extract]] - A "Fill Method" using LLMs for large scale data extraction with minimal setup.
* [[Clause Detection]] - An LLM embeddings based Data Section extract method.
|}
[[File:OpenAI Logo.svg.png|right|thumb|500px|Enhancing Grooper by integrating with modern AI technology.]]
[[File:OpenAI Logo.svg.png|right|thumb|500px|Enhancing Grooper by integrating with modern AI technology.]]


Line 18: Line 31:
== ABOUT ==
== ABOUT ==
GPT (Generative Pre-trained Transformer) integration can be used for three things in '''Grooper''':
GPT (Generative Pre-trained Transformer) integration can be used for three things in '''Grooper''':
* '''[[#GPT Complete (Extractor Type)|Extraction]]''' - Prompt the GPT model to return information it finds in a document.
* '''[[#GPT Complete (Value Extractor)|Extraction]]''' - Prompt the GPT model to return information it finds in a document.
* '''[[#GPT Embeddings (Classification Method)|Classification]]''' - GPT has been trained against a massive corpus of information, which allows for a lot of potential when it comes to classifying documents. The idea here is that because it's seen so much, the amount of training required in '''Grooper''' should be less.
* '''[[#GPT Embeddings (Classify Method)|Classification]]''' - GPT has been trained against a massive corpus of information, which allows for a lot of potential when it comes to classifying documents. The idea here is that because it's seen so much, the amount of training required in '''Grooper''' should be less.
* '''[[#GPT Lookup (Lookup)|Lookup]]''' - With a GPT lookup you can provide information collected from a model in '''Grooper''' as <code><span style="color:#ff00ff">@</span></code> variables in a prompt to have GPT generate data.
* '''[[#GPT Lookup (Lookup Specification)|Lookup]]''' - With a GPT lookup you can provide information collected from a model in '''Grooper''' as <code><span style="color:#ff00ff">@</span></code> variables in a prompt to have GPT generate data.


In this article you will be shown how '''Grooper''' leverages GPT for the aforementioned methods. Some example use cases will be given to demonstrate a basic approach. Given the nature of the way this technology works, it will be up to the user to get creative about how this can be used for their needs.
In this article you will be shown how '''Grooper''' leverages GPT for the aforementioned methods. Some example use cases will be given to demonstrate a basic approach. Given the nature of the way this technology works, it will be up to the user to get creative about how this can be used for their needs.
Line 65: Line 78:
[[Image:GPT Integration 003.png]]
[[Image:GPT Integration 003.png]]


=== GPT Complete (Extractor Type) ===
=== GPT Complete (Value Extractor) ===
{{#lst:Glossary|GPT Complete}}
{{#lst:Glossary|GPT Complete}}
Please visit the '''''[[GPT Complete (Extractor Type)|GPT Complete]]''''' article for more information.
Please visit the '''''[[GPT Complete]]''''' article for more information.


=== GPT Embeddings (Classification Method) ===
=== GPT Embeddings (Classification Method) ===
{{#lst:Glossary|GPT Embeddings}}
Please visit the '''''[[GPT Embeddings (Classification Method)|GPT Embeddings]]''''' article for more information.
Please visit the '''''[[GPT Embeddings (Classification Method)|GPT Embeddings]]''''' article for more information.


=== GPT Lookup (Lookup) ===
=== GPT Lookup (Lookup Specification) ===
Following is a simple of example that will demonstrate how to use the ''GPT Lookup'' functionality. As with everything else regarding GPT Integration in Grooper 2023, this is fairly untested and needs more experimentation to see its full potential. If nothing else, this example is intended to give you a basic understanding of how to establish the lookup so you can try things out on your own.
{{#lst:Glossary|GPT Lookup}}
 
Please visit the '''''[[GPT Lookup]]''''' article for more information.
# Start by deleting all other fields in the example '''Data Model''' other than "Lessor" and "Lessee".
#* This is meant to reduce the number of calls you will be making to OpenAI for GPT results as "Lessor" and "Lessee" are the only '''Data Fields''' that will be leveraged in the following lookup example.
 
[[Image:GPT Integration 034.png]]
 
 
# <li value=2> Right-click the '''Data Model'''.
# Add a '''Data Field'''.
 
[[Image:GPT Integration 035.png]]
 
 
# <li value=4> Name it "Letter of Thanks".
# Click the "Execute" button.
 
[[Image:GPT Integration 036.png]]
 
 
# <li value=6> With the newly created '''Data Field''' object selected, set the '''''Display Width''''' property to ''500''.
# Set the '''''Multi-line''''' property to ''Enabled''.
# Expand the sub-properties of the '''''Multi Line''''' property and set the '''''Display Lines''''' property to ''15''.
# Set the '''''Word Wrap''''' property to ''True''.
 
[[Image:GPT Integration 037.png]]
 
 
# <li value=10> Select the '''Data Model'''.
# Click the ellipsis button on the '''''Lookups''''' property.
 
[[Image:GPT Integration 038.png]]
 
 
# <li value=12> In the "Lookups" window, click the "Add new lookups specification" button.
# Select the "GPT Lookup" option.
 
[[Image:GPT Integration 039.png]]
 
 
# <li value=14> With the "GPT Lookup" added to the "List of Lookup Specification" and selected, paste in your API key to the '''''API Key''''' property.
# Click the ellipsis button for the '''''Prompt''''' property.
 
[[Image:GPT Integration 040.png]]
 
 
# <li value=16> In the "Prompt" editor, type the following string:
#* <code>Write a letter of thanks regarding the ease of purchase and clean state of the property from <span style="color:#00b400;">@Lessor</span> to <span style="color:#00b400;">@Lessee</span>.</code>
#* As you type this out (if you do instead of copy pasting) you will notice intellisense pop-up for when you use the <code style="color:#00b400;">@</code> symbol. Using the <code style="color:#00b400;">@</code> symbol allows you to leverage elements from your '''Data Model''' when creating your lookup.
# When you have completed writing your prompt, click the "OK" button.
 
[[Image:GPT Integration 041.png]]
 
 
# <li value=18> Click the ellipsis button for the '''''Value Selectors''''' property.
 
[[Image:GPT Integration 042.png]]
 
 
# <li value=19> In the "Value Selectors" window click the "Add new value selector" button.
 
[[Image:GPT Integration 043.png]]
 
 
# <li value=20> With "Value Selector" added to the "list of Value Selector" and selected, click the drop-down button for the '''''Target Field''''' property.
# Select the "Letter_of_Thanks" field.
#* Based on this configuration, the value generated by our prompt from our lookup will populate this field with the information generated by GPT.
 
[[Image:GPT Integration 044.png]]
 
 
# <li value=22> Back in the "Lookups" menu, scroll down in the property grid, and in the "Lookup Options" area click the drop-down button for the '''''Trigger Mode''''' property.
# Because <code style="color:#00b400;">@</code> symbols are being used in the prompt to leverage elements from the '''Data Model''' the ''Conditional'' setting should be selected.
 
[[Image:GPT Integration 045.png]]
 
 
# <li value=24> At the bottom of the property grid notice the '''''Lookup Fields''''' and '''''Target Fields''''' are populated because elements were targeted in the prompt, and a field was targeted with the '''''Value Selectors''''' property.
# Click "OK" to close this menu.
 
[[Image:GPT Integration 046.png]]
 
 
# <li value=26> With the lookup configured it's time to test. Click the "Tester" tab.
# Select "Folder (1)" from the "GPT Complete Examples" batch.
# Click the "Test the data element" button.
# Notice the "Lessee" value is successfully returned ...
# ... and that it is being leveraged as the salutation in the value created for the "Letter of Thanks" field.
 
[[Image:GPT Integration 047.png]]
 
 
# <li value=31> Also notice the "Lessor" value being returned ...
# ... and that it is being leveraged as the complementary close in the value created for the "Letter of Thanks" field.
# Feel free to take a look at the text created for the letter from the GPT AI.
 
[[Image:GPT Integration 048.png]]
 
==== Lookup Properties ====
Following are brief descriptions of properties that are unique to GPT Lookup. Properties that overlap with previously explained properties, or are self explanatory, will be skipped.
 
===== Response Format =====
This specifies the format in which data will be exchanged with the web service. Can be one of the following values:
* '''Text''' - The response will be plain text. Record and value selectors should be specified using regular expressions.
* '''JSON''' - The response will be in JSON format. Record and value selectors should be specified using [https://www.ietf.org/archive/id/draft-goessner-dispatch-jsonpath-00.html JSONPath] syntax.
* '''XML''' - The request and response body will be in XML format. Record and value selectors should be specified using [https://en.wikipedia.org/wiki/XPath XPath] syntax.
 
The format selected here will be used both for sending POST data and interpreting responses. It is currently not possible to send an XML request then interpret the response as JSON, or vice-versa.
 
===== Record Selector =====
This is a JSONPath or XPath expression which selects records in the response.
 
The record selector is used to specify which JSON or XML entities represent records in the result set.
 
: '''JSON Notes'''
:: In a JSON response, the Record Selector may be used as follows:
::* If the selector matches an array, one record will be generated for each element of the array.
::* If the selector matches one or more objects, one record will be generated for each object.
::* Leave the property empty to select an array or object at the root of the JSON document.
 
: '''XML Notes'''
:: In an XML response, the Record Selector may be used as follows:
::* One record will be generated for each XML element matched by the selector.
::* Leave the property empty to select a singleton record at the root of the XML.

Latest revision as of 16:43, 27 August 2025

!!

LEGACY TECHNOLOGY DETECTED!!

"GPT Integration" refers to Grooper's early attempts at integrating OpenAI's GPT models into the product. The information in this article is largely obsolete.

For more current information on Grooper's integration with AI technologies, refer to the following resources:

  • Grooper and AI - An overview of Grooper's AI integrations.
  • Ask AI - An LLM-based extractor.
  • AI Extract - A "Fill Method" using LLMs for large scale data extraction with minimal setup.
  • Clause Detection - An LLM embeddings based Data Section extract method.
Enhancing Grooper by integrating with modern AI technology.

Grooper's GPT Integration is refers to the usage of OpenAI's GPT models within Grooper to enhance the capabilities of data extractors, classification, and lookups.

OpenAI's GPT model has made waves in the world of computing. Our Grooper developers recognized the potential for this to grow Grooper's capabilities. Adding its functionality will allow for users to explore and find creative solutions for processing their documents using this advanced technology.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

ABOUT

GPT (Generative Pre-trained Transformer) integration can be used for three things in Grooper:

  • Extraction - Prompt the GPT model to return information it finds in a document.
  • Classification - GPT has been trained against a massive corpus of information, which allows for a lot of potential when it comes to classifying documents. The idea here is that because it's seen so much, the amount of training required in Grooper should be less.
  • Lookup - With a GPT lookup you can provide information collected from a model in Grooper as @ variables in a prompt to have GPT generate data.

In this article you will be shown how Grooper leverages GPT for the aforementioned methods. Some example use cases will be given to demonstrate a basic approach. Given the nature of the way this technology works, it will be up to the user to get creative about how this can be used for their needs.

Things to Consider

Before moving forward it would be prudent to mention a few things about GPT and how to use it.

Prompt Engineering

This first thing to consider is how to structure a good prompt so that you get the results you are expecting. There is a bit of an art to knowing how to do this. GPT can tell bad jokes and write accidentally hilarious poems about your life, but it can also help you do your job better. The catch: you need to help it do its job better, too. At its most basic level, OpenAI's GPT-3 and GPT-4 predict text based on an input called a prompt. But to get the best results, you need to write a clear prompt with ample context. Further on in this article when the GPT Complete Value Extractor is being demonstrated you will see an example of prompt engineering.

Follow this link, or perhaps even this one, for more information on prompt engineering.

Tokens and Pricing

Another consideration is the way GPT pricing works. You are going to be charged for the "tokens" used when interacting with GPT. To that end, the prompt that you write, the text that you leverage to get a result, and the result that is returned to you are all considered part of the token consumption. You will need to be considerate of this as you build and use GPT in your models.

Follow this link for more information on what tokens are.

Follow this link for more information on GPT pricing.

Location Data for Data Extraction

The final thing to consider is in regards to the GPT Complete Value Extractor type (more on this soon.) If you have used Grooper before then you are probably familiar with how a returned value is highlighted with a green box in the document viewer. One of the main strengths of Grooper's text synthesis is that it collects location information for each character which allows this highlighting to occur. The GPT model does not consider location information when generating its results which means there will be no highlighting on the document for values collected with this method. The main impact this will have is on your ability to validate information returned by the GPT model.

How To

With the discussion of concepts out of the way, it is time to get into Grooper and see how and where to use the GPT integration.

Obtain an API Key

Grooper is able to integrate with OpenAI's GPT model because they have provided a web API. All we need in order use the Grooper GPT functionality is an API key. Here you will learn how to obtain an API key for yourself so you can start using GPT with Grooper.

  1. The first thing you should do is visit OpenAI API site and login or create an account.
  2. Once logged in, click the "Personal" menu in the top right.
  3. Within in this menu click the "View API Keys" option, which will take you to the "API keys" page.


  1. On the "API keys" page, click the "+ Create new secret key" button, which will make an "API key generated" pop-up.


  1. Highlight and copy, or click the copy button to copy the key string to your clipboard.
    • A word of warning here. You WILL NOT get another chance to copy this string. You can always create a new one, but once you close this pop-up, you will not have another chance to copy the key string out.

GPT Complete (Value Extractor)

GPT Complete is a Value Extractor that leverages Open AI's GPT models to generate chat completions for inputs, returning one hit for each result choice provided by the model's response.

PLEASE NOTE: GPT Complete is a deprecated Value Extractor. It uses an outdated method to call the OpenAI API. Please use the Ask AI extractor going forward.

Please visit the GPT Complete article for more information.

GPT Embeddings (Classification Method)

BE AWARE: GPT Embeddings is obsolete as of version 2025. The LLM Classifier and Search Classifier methods are the new and improved AI-enabled classification methods. GPT Embeddings is a Classify Method that uses an OpenAI embeddings model and trained document samples to tell one document from another. Please visit the GPT Embeddings article for more information.

GPT Lookup (Lookup Specification)

PLEASE NOTE: GPT Lookup is obsolete as of version 2025. Much of its functionality was replaced by newer and better LLM-based extraction methods, such as AI Extract. If absolutely necessary, its functionality could also be replicated with a Web Service Lookup implementation. GPT Lookup is a Lookup Specification that performs a lookup using an OpenAI GPT model. Please visit the GPT Lookup article for more information.