2024:Fine Tuning (LLM Construct)

From Grooper Wiki

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252024

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2024). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

About

Fine-tuning in the context of OpenAI models refers to the process of taking a pre-trained model, like GPT-4, and further training it on a specific dataset to make it more specialized for a particular task or domain. This allows the model to adapt its general language understanding to better handle the unique vocabulary, style, and structure of the domain it's fine-tuned on.

How Fine-Tuning Works

  • Base Model: The process starts with a large, general-purpose language model that has already been pre-trained on a vast corpus of text data.
  • Custom Dataset: You provide a dataset specific to your application, such as technical documentation, support logs, or chat transcripts. This dataset should align with the tasks you want the model to perform.
  • Training: The base model is further trained on this dataset, modifying the model's internal parameters so it can better predict and generate content related to the dataset.
  • Evaluation: After fine-tuning, the model is evaluated to ensure it performs well on the target task.

Benefits

  • Improved Performance: Fine-tuning makes the model more accurate and efficient for the specific tasks it was trained for, such as answering questions about a product, generating legal documents, or responding in a customer support chat.
  • Faster Development: By using a pre-trained model as a base, fine-tuning significantly reduces the time and computational resources needed compared to training a model from scratch.
  • Customization: It allows developers to incorporate their proprietary data, which the general model might not have seen before, into the model's understanding.

Example Use Cases

  • Fine-tuning based on a Data Model in Grooper to facilitate better extraction.
  • Training a model on legal documents to generate contracts or provide legal advice.
  • Customizing a chatbot to respond in the tone and style preferred by a company's brand.

Fine Tuning in Grooper

Fine-tuning in Grooper, particularly in the context of data extraction and Data Models, could have a significant impact on improving the accuracy and efficiency of data extraction processes. Here's how fine-tuning would affect Grooper's ability to extract data more accurately:

Customization for Specific Data Types

  • Fine-Tuning with Target Data: By fine-tuning Grooper models with specific data types, patterns, and documents that are regularly processed by your organization, Grooper can better learn and understand how to extract data in those contexts. This would allow it to adapt its extraction techniques to more accurately capture the fields and formats specific to your use cases, reducing the need for manual configuration or rule-setting.

Improved Recognition of Complex or Varied Formats

  • Handling Variations: Fine-tuning can help Grooper handle more complex and varied document formats. For example, if your Data Models involve extracting information from forms or tables with slight variations, a fine-tuned model would be better at recognizing those differences and still extracting the correct data fields. It essentially allows Grooper to generalize and adapt to more variations in the data without losing accuracy.

Increased Efficiency with Domain-Specific Knowledge

  • Industry-Specific Data: If Grooper is extracting data for a specialized domain (such as legal, financial, or medical documents), fine-tuning the model on domain-specific language can significantly improve Grooper's ability to understand and extract complex or technical information accurately. This leads to fewer false positives and less manual correction.

Reduction of Errors and Manual Overrides

  • Model Refinement: Fine-tuning Grooper's models based on historical data can reduce the number of errors during extraction, as the model becomes more familiar with the nuances of your data structure. Over time, this leads to a decrease in manual validation or correction, streamlining the process and saving effort.

Adaptation to Evolving Data Sources

  • Continuous Improvement: As your data evolves, Grooper can remain up-to-date by continuously fine-tuning on new document types or formats. This helps it stay accurate even as the characteristics of your data change, which is especially important for businesses dealing with dynamic, ever-changing sources of information.

Example Workflow with Grooper's Fine-Tuning Commands

  • Build Fine Tuning File: Generates the dataset based on labeled or extracted data that Grooper will use for fine-tuning. This helps build a custom training file specific to your document extraction needs.
  • Start Fine Tuning Job: Fine-tunes the existing extraction model using the generated dataset. This ensures the model adapts to specific nuances in your data.
  • Delete Fine Tuned Model: If a fine-tuned model underperforms or no longer suits the requirements, it can be removed, allowing Grooper to fall back to a general model or a new fine-tuned version.

In summary, fine-tuning in Grooper can lead to more accurate, domain-specific, and efficient data extraction, improving both the speed and reliability of the extraction processes, especially for complex or customized data environments.

How To

Verify Prerequisites

Before fine tuning you will need a few things in place.

  • An established LLM Connector.
  • A fleshed out Data Model with all appropriate Data Elements.
  • A configured Fill Method on the Data Model.
  • A Batch with at least ten (10) documents.
    • The documents in the Batch should be classified.
    • The documents in the Batch should have extracted data. Verify that the extracted data is correct.

The Batch and Project associated with this article will suffice for these purposes. You will, however, need to supply your own API key for your LLM Connector.


Verify LLM Connector

  1. On the Grooper Root node...
  2. ...within the Options property...
  3. ...an LLM Connector has been added.
  4. The Service Provider in this configuration is an Open AI LLM Provider. It has been configured with an API Key.


Verify Established Data Model

  1. A Data Model with appropriate Data Elements is established.
    • The Project provided with this article can be used as an example.


Verify Configured Fill Method

  1. On the Data Model...
  2. ...in the Fill Methods property...
  3. ...an AI Extract Fill Method is established and configured.


Verify Batch with Classified, Extracted Documents

  1. A Batch with ten documents is present. The documents are classified as a Content Type using the Data Model configured with the appropriate Fill Method.
  2. On the "Advanced" tab of one of the classified document, you can see there is a "DocumentData.json" file informing that extraction has been performed. This is true for each document.


  1. Using a Batch Process Step with a Review activity, established with a Data View, you can verify that the extracted data is accurately displaying what is on the document.

Perform Fine Tuning

  1. Right-click on the Data Model.
  2. Choose "Fine Tuning" > "Build Fine Tuning File".
  3. In the "Build Fine Tuning File" window, click the button on the Batches property.
  4. In the "Batches" window, select the corresponding Batch.
  5. Once the Batch is selected, click the "OK" button to close the "Batches" window.
  6. Back in the "Build Fine Tuning File" window, click the "Execute" button.


  1. With the Data Model still selected...
  2. ...on the "Advanced" tab...
  3. ...you will see a newly added JSONL file.


  1. Right-click the Data Model.
  2. Choose "Fine Tuning" > "Start Fine Tuning Job".
  3. Click the ellipsis button on the Base Model property and choose a base model.
    • As of the writing of this article only the following models can be used for Fine Tuning:
      • gpt-4o-2024-08-06
      • gpt-4o-mini-2024-07-18
      • gpt-4-0613
      • gpt-3.5-turbo-0125
      • gpt-3.5-turbo-1106
      • gpt-3.5-turbo-0613
      • babbage-002
      • davinci-002
  4. Click the ellipsis button for the Name Suffix property to give the Fine Tuning model a suffix name so it is easily recognizable.
  5. Click the "Execute" button to submit the Fine Tuning job.


  1. On the "Fine Tuning" section of your OpenAI platform...
  2. ...you will see a newly created Fine Tuned model.