Ask AI (Extractor Type)

From Grooper Wiki
(Redirected from Ask AI)

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

Ask AI is an Extractor Type that executes a chat completion using a large language model (LLM), such as OpenAI's GPT models. It uses a document's text content and user-defined instructions (a question about the document) in the chat prompt. Ask AI then returns the response as the extractor's result. Ask AI is a powerful, LLM based extraction method, that can be used anywhere in Grooper an Extractor Type is referenced. It can complete a wide array of tasks in Grooper with simple text prompts.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2024). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

About

Ask AI is a new Extractor Type in [2024]. It is a replacement for the now outdated GPT Complete Extractor Type.

  • GPT Complete uses a deprecated method to call the OpenAI API.
  • Ask AI can be used with any LLM Service Provider.
  • Ask AI has other advantages over GPT Complete

The general idea behind Ask AI is simple: prompt an LLM chatbot with a question about a document to return a result. With appropriate instructions, Ask AI can even parse JSON responses into a Data Model's instance hierarchy. For example, you can use Ask AI as a Row Match Table Extract Method.

Ask AI Pros and Cons

Pros

  • Returns data using a natural language prompt.
  • Less knowledge of Grooper extractors required to return data
  • Quicker time to value.
  • Easier to maintain over time.

Cons

  • LLM responses can be unpredictable.
  • LLM responses can be inaccurate.
  • LLMs answer by predicting the next best word one after another. They do not “know” anything.
  • As an extractor, “Ask AI” must be configured “per field”.
  • An API call is made every time the extractor executes.
    • This is very important to consider. Limiting the text sent to the LLM whenever possible should be a consideration. The Context Extractor property can play a big role in helping with this.
  • No result highlighting

Properties

Model
The API Key you use will determine which GPT models are available to you. The different GPT models can affect the text generated based on their size, training data, capabilities, prompt engineering, and fine-tuning potential. GPT-3's larger size and training data, in particular, can potentially result in more sophisticated, diverse, and contextually appropriate text compared to GPT-2. However, the actual performance and quality of the generated text also depend on various other factors, such as prompt engineering, input provided, and specific use case requirements. GPT-4o is the latest version, as of this writing, and takes the GPT model even further.

Parameters
Please see the Parameters article for more information.

Instructions
The instructions or question to include in the prompt. The prompt sent to OpenAI consists of text content from the document, which provides context, plus the text entered here. This property should ask a question about the content or provide instructions for generating output. For example, "what is the effective date?", "summarize this document", or "Your task is to generate a comma-separated list of assignors".

Preprocessing
Please visit the Preprocessing article for more information.

Context Extractor
An optional extractor which filters the document content included in the prompt. All Value Extractor types are available.

Max Response Length
The maximum length of the output, in tokens. 1 token is equivalent to approximately 4 characters for English text. Increasing this value decreases the maximum size of the context.

Parse JSON Response
If this property is enabled, JSON returned in the response will be parsed into a Data Instance hierarchy.

Use this mechanism to capture complex data and generate output instances with named children, producing output similar to using named regex groups with Pattern Match, or using Ordered Array collation. This type of Data Instance hierarchy can be consumed by the Row Match Table Extract Method or the Simple Section Extract Method.

When this property is enabled, this instructions should ask the AI to respond with JSON, and provide instructions and examples as need to ensure the AI understands the desired JSON format. The JSON may contain a single JSON object or an array of JSON objects.

If a single JSON object is returned, a single output instance will be generated, containing one named child for each property of the JSON object. This type of output would be appropriate for capturing a single-instance Data Section.

If a JSON array is returned, one output instance will be generated for each object in the array. Each output instance will have named children reflecting the properties of the JSON object. This type of output is appropriate for capturing a Data Table or multi-instance Data Section.

The AI must respond with JSON only, or with the JSON delimited using the prefix and suffix shown below. OpenAI models are typically trained for this out of the box, but models trained by other organizations may require special instructions.

How To

Establish an LLM Connector

First, we need to establish an LLM Connector within the Options property on the Root object.

Please visit the LLM Connector article for more information.

Configure Ask AI Extractor Type

Buyer

  1. Select the "buyer" Data Field.
  2. Click the drop-down button for the Value Extractor property.
  3. Select Ask AI from the drop-down menu.


  1. With the Value Extractor property set, click the ellipsis button to configure the extractor.


  1. In the "Value Extractor" window click the ellipsis button for the Model property.
  2. In the "Model" window select gpt-4o. As of the writing of this article, it is the most accurate model, and the best choice.


  1. Click the ellipsus button on the Instructions property.
  2. Add a prompt in the "Instructions" window.
    Who is the buyer? Respond with just the name.


  1. In the Batch Viewer be sure to select the document from the provided Batch.
  2. If the toggle for auto extraction is on, you will automatically get a result. It may be a good practice to turn this off when testing to limit token consumption.
    • Speaking of limiting token consumption, this may be a good opportunity to look at the Context Extractor property. Knowing that the "buyer" name is almost certainly always listed on the first page, you could make an extractor that only returns the text from the first page. Since this document is 3 pages, limiting the text sent to the AI would greatly reduce token consumption.
  3. Observe the returned value, but take note there is no highlighting.

Seller

  1. Select the "seller" Data Field.
  2. Set the Value Extractor to Ask AI and click the ellipsis button.


  1. Set the Model property to your choice of model.
  2. Add a prompt in the Instructions property.
    Who is the seller? Respond with just the name.
  3. In the Batch Viewer select the document from the provided Batch.
  4. Use auto extract or manually run extraction.
  5. Observe the result in the Results list view.

Summary

  1. Select the "summary" Data Field.
  2. Set the Value Extractor to Ask AI and click the ellipsis button.


  1. Set the Model property to your choice of model.
  2. Add a prompt in the Instructions property.
    Give a summary in 50 words or fewer.
  3. Select the document from the provided Batch.
  4. Use auto extract or manually run extraction.
  5. Observe the result in the Results list view.

Lands and Leases

  1. Select the "LandsAndLeases" Data Table.
  2. Set the Extract Method property to Row Match.
  3. Click the drop-down for the Row Extractor property and select Ask AI.


  1. Click the ellipsis button on the Row Extractor property.


  1. Set the Model property to your choice of model.
  2. Add a prompt in the Instructions property.
    • What is the content of the "Additional Lands and Leases" table? Answer with JSON formatted as below. The quote property should contain a quote from the document which includes each individual full row: [{section: "", township: "", range: "", county: "", state: "", quote: ""}]
    • Notice how this prompt has been engineered to return values as JSON objects. These objects will be created as sub-elements of the main extracted result. Because the provided JSON in the prompt defines the objects using the exact names of the Data Columns, the returned sub-elements will be able to populate their respective Data Columns.
  3. Enable the Parse JSON Response property.
    • This property must be enabled to take advantage of the parsed JSON objects as sub-elements described in the previous step.
  4. Select the document from the provided Batch.
  5. Use auto extract or manually run extraction.
  6. Observe the result in the Results list view.


  1. Select the Data Model from the provided Project.
  2. Click the "Tester" tab.
  3. Select the document from the provided Batch in the Batch Viewer.
  4. Click the "Test" button.
  5. View the extracted results in the Data Model Preview.