2024:AI Extract (Fill Method): Difference between revisions

From Grooper Wiki
Created page with "{{AutoVersion}} <blockquote></blockquote> == About == == How To =="
 
initial post // via Wikitext Extension for VSCode
Line 1: Line 1:
{{AutoVersion}}
{{AutoVersion}}


<blockquote></blockquote>
<blockquote>{{#lst:Glossary|AI Extract}}</blockquote>


== About ==
== About ==
The goal of '''''AI Extract''''' is to extract data by simply describing the data to be extracted as a result of the "self-descriptive" nature of the "container" '''Data Elements''' ('''Data Model''', '''Data Section''', or '''Data Table''') itself. This fill method presents an AI chatbot with all or part of the document content, and asks it to generate [https://en.wikipedia.org/wiki/JSON JSON] data matching the structure of descendant '''Data Elements'''. The JSON data is then used to populate descendant '''Data Sections''', '''Data Tables''', and '''Data Fields'''.


== How To ==
== How To ==
The following walkthrough will use an invoice as part of the supplied materials. Invoices are easy to understand and the setup too, will be easy.
'''<span style="color: red;"><big>HOWEVER, PLEASE BE AWARE'''</big></span>:</br>
Invoices are not the best use case for '''''AI Extract'''''. Typical invoice processing relies on accuracy of information. This is a problem for '''''AI Extract''''' for three main reasons.
# [https://en.wikipedia.org/wiki/Large_language_model LLM] [https://en.wikipedia.org/wiki/Chatbot chatbots] are best suited for documents with natural language flow such as legal contracts. Invoices do not represent documents with natural language flow, and instead of a type of structured data document.
# LLM chatbots do not return reliable information as they inherently do not "understand" information and typically can either "make things up", or be inconsistent across a large set of documents.
# Reliance on accurate information will typically involve human review to verify data integrity. An important aspect of reviewing data in '''Grooper''' is the highlighting of returned results in the '''Document Viewer'''. At best it is difficult to accurately highlight data from a chatbot as there is not technically any character coordinate information being returned from the chat bot as part of its delivered data. There are some settings of properties in the configuration of '''''AI Extract''''' that can aid in this highlighting, but it isn't perfect. At worst, it can be impossible at times to highlight data from a chat bot at all.
<div style="padding-left: 1.5em;">
=== Establish an LLM Connector ===
First, we need to establish an '''''LLM Connector''''' within the '''''Options''''' property on the '''Root''' object.
# Click on the '''Grooper''' '''Root''' node.
# Click the ellipsis button for the '''''Options''''' property.
# In the "Options" window click the "Add" button.
# Choose "LLM Connector" from the drop-down menu.
[[image: 2024_AI-Extract_01_01.png]]
# <li value=5> Click the ellipsis button for the "Service Providers" property of the newly added '''''LLM Connector'''''.</li>
# Click the "Add" button in the "Service Providers" window.
# Choose the '''''Open AI Provider''''' option from the drop-down menu.
[[image: 2024_AI-Extract_01_02.png]]
# <li value=8>Enter your API Key into the '''''API Key''''' property.</li>
[[image: 2024_AI-Extract_01_03.png]]
=== Configure AI Extract Fill Method ===
With an '''''LLM Connector''''' established we can now configure our '''Data Model''' with the '''''AI Extract''''' '''''Fill Method'''''.
# Select the '''Data Model''' from the provided '''Project'''.
# Click the ellipsis button on the '''''Fill Methods''''' property.
# Click the "Add" button in the "Fill Methods" window.
# Choose "AI Extract" from the drop-down menu.
[[image: 2024_AI-Extract_02_01.png]]
# <li value=5>Click the ellipsis button on the '''''Model''''' property of the newly added '''''AI Extract''''' '''''Fill Method'''''.</li>
# Select "gpt-4o" in the "Model" window. As of the writing of this article, it is the most accurate model, and the best choice.
#* Feel free to experiment with the other models to test results.
[[image: 2024_AI-Extract_02_02.png]]
# <li value=7>In the '''''Parameters''''' property group, lower the '''''Temperature''''' property to ''0.2''. This can help the AI be less "creative" with its responses.</li>
#* You may want to go as low as ''0'' to completely eliminate "creativity".
#* Please see the '''''[[Parameters (Property)|Parameters]]''''' article for more information.
[[image: 2024_AI-Extract_02_03.png]]
# <li value=8>Click the ellipsis button on the '''''Instructions''''' property.</li>
# Write a prompt in the "Instructions" window.
#* Prompts written here should be considered as "global" prompts for the entire model, not specific to individual '''Data Elements'''.
[[image: 2024_AI-Extract_02_04.png]]
# <li value=10>If you want to choose specific elements to extract you can do so. Click the ellipsis button on the '''''Included Elements''''' property.</li>
# In the "Included Elements" window you can choose specific elements.
#* Leaving this property default, or blank, will consider all '''Data Elements'''. Choosing specific '''Data Elements''' will only include those selected as part of the prompt given to the AI.
[[image: 2024_AI-Extract_02_05.png]]
# <li value=12>'''''Document Quoting''''' controls the text fed to the AI.</li>
#* Please see the '''''[[Document Quoting (Property)|Document Quoting]]''''' article for more information.
# '''''Preprocessing''''' controls the text supplied to the AI by adding or removing control characters.
#* Please see the '''''[[Preprocessing (Property)|Preprocessing]]''''' article for more information.
# '''''... Alignment''''' controls the highlighting of results in the '''Data Model Preview'''.
# Please see the '''''[[Alignment (Property)|Alignment]]''''' article for more information.
[[image: 2024_AI-Extract_02_06.png]]
</div>

Revision as of 08:53, 24 July 2024

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252024

AI Extract is a Fill Method that leverages a Large Language Model (LLM) to return extraction results to Data Elements in a data_table Data Model or insert_page_break Data Section. This mechanism provides powerful AI-based data extraction with minimal setup.

About

The goal of AI Extract is to extract data by simply describing the data to be extracted as a result of the "self-descriptive" nature of the "container" Data Elements (Data Model, Data Section, or Data Table) itself. This fill method presents an AI chatbot with all or part of the document content, and asks it to generate JSON data matching the structure of descendant Data Elements. The JSON data is then used to populate descendant Data Sections, Data Tables, and Data Fields.

How To

The following walkthrough will use an invoice as part of the supplied materials. Invoices are easy to understand and the setup too, will be easy.

HOWEVER, PLEASE BE AWARE:
Invoices are not the best use case for AI Extract. Typical invoice processing relies on accuracy of information. This is a problem for AI Extract for three main reasons.

  1. LLM chatbots are best suited for documents with natural language flow such as legal contracts. Invoices do not represent documents with natural language flow, and instead of a type of structured data document.
  2. LLM chatbots do not return reliable information as they inherently do not "understand" information and typically can either "make things up", or be inconsistent across a large set of documents.
  3. Reliance on accurate information will typically involve human review to verify data integrity. An important aspect of reviewing data in Grooper is the highlighting of returned results in the Document Viewer. At best it is difficult to accurately highlight data from a chatbot as there is not technically any character coordinate information being returned from the chat bot as part of its delivered data. There are some settings of properties in the configuration of AI Extract that can aid in this highlighting, but it isn't perfect. At worst, it can be impossible at times to highlight data from a chat bot at all.

Establish an LLM Connector

First, we need to establish an LLM Connector within the Options property on the Root object.

  1. Click on the Grooper Root node.
  2. Click the ellipsis button for the Options property.
  3. In the "Options" window click the "Add" button.
  4. Choose "LLM Connector" from the drop-down menu.


  1. Click the ellipsis button for the "Service Providers" property of the newly added LLM Connector.
  2. Click the "Add" button in the "Service Providers" window.
  3. Choose the Open AI Provider option from the drop-down menu.


  1. Enter your API Key into the API Key property.

Configure AI Extract Fill Method

With an LLM Connector established we can now configure our Data Model with the AI Extract Fill Method.

  1. Select the Data Model from the provided Project.
  2. Click the ellipsis button on the Fill Methods property.
  3. Click the "Add" button in the "Fill Methods" window.
  4. Choose "AI Extract" from the drop-down menu.


  1. Click the ellipsis button on the Model property of the newly added AI Extract Fill Method.
  2. Select "gpt-4o" in the "Model" window. As of the writing of this article, it is the most accurate model, and the best choice.
    • Feel free to experiment with the other models to test results.


  1. In the Parameters property group, lower the Temperature property to 0.2. This can help the AI be less "creative" with its responses.
    • You may want to go as low as 0 to completely eliminate "creativity".
    • Please see the Parameters article for more information.


  1. Click the ellipsis button on the Instructions property.
  2. Write a prompt in the "Instructions" window.
    • Prompts written here should be considered as "global" prompts for the entire model, not specific to individual Data Elements.


  1. If you want to choose specific elements to extract you can do so. Click the ellipsis button on the Included Elements property.
  2. In the "Included Elements" window you can choose specific elements.
    • Leaving this property default, or blank, will consider all Data Elements. Choosing specific Data Elements will only include those selected as part of the prompt given to the AI.


  1. Document Quoting controls the text fed to the AI.
  2. Preprocessing controls the text supplied to the AI by adding or removing control characters.
  3. ... Alignment controls the highlighting of results in the Data Model Preview.
  4. Please see the Alignment article for more information.