Clause Detection (Section Extract Method)

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

Clause Detection is a insert_page_break Data Section Extract Method. It leverages LLM text embedding models to compare supplied samples of text against the text of a document to return what the AI determines is the "chunk" of text that most closely resembles the supplied samples.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2024). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

About

How To

In the following walkthrough we are going to setup Clause Detection on the Data Section of a provided project. The Data Section being a "container" of several descendant Data Fields will leverage the AI Extract Fill Method to collect the data for those fields.

The document we will be extracting against from the provided Batch consists of several pages with quite a few words in total. Given that fact, it would be costly to run AI Extract against the entire text of the document. To solve this problem we will use the Data Section for one of its key functions which is to define a subset of data within the document. In so doing we will drastically reduce the amount of text given to the LLM AI and as such greately reduce the tokens consumed and the time taken to run extraction.

Because we do not know the exact wording of the clause we will define as the structure of our Data Section it can prove quite challenging to attempt to define the structure of the Data Section via pattern matching. This is where Clause Detection will come into play. We can provide a sample of what the language of the clause we are looking for may be like. This sample will be leveraged within a text embeddings model (which we learned above is faster and cheaper than standard chatbot queries) to find a clause within the text of the document that is of high similarity to the sample.

In so doing we will not only be leveraging AI to easily extract the data we are after, but we will also be using AI to make using AI more efficient.

Select the "Granting Clause" Data Section from the provided Project.
Click the drop-down for the Extract Method property.
Select Clause Detection from the drop-down menu.

Expand the sub-properties and click the ellipsis button for the Model property.
In the "Model" window select text-embedding-3-large. Feel free to experiment with the other models.

Click the ellipsis button on the Queries property.
In the "Queries" window click the "Add" button.
This will add an entry to the "Sample Content".
Click the ellipsis button on the Sample Content property.

In the "Sample Content" window add the provided sample clause.

AGREEMENT, Made and entered into this [Effective Date] , by and between[Lessor Name] whose address is [Lessor Address] hereinafter called Lessor and [Lessee Name] whose address is [Lessee Address] hereinafter called Lessee.Lessor hereby grants, leases, and lets unto Lessee, for the purpose of investigating, exploring, drilling, developing, and producing oil, gas, and other hydrocarbons, and storing, handling, and transporting the same, all the oil and gas rights and interests in and under the land described as follows: [Legal Description of Property], containing approximately [Number of Acres] acres, more or less (hereinafter referred to as the "Leased Premises").

Click the "Tester" tab.
Be sure to select the document from the supplied Batch in the Batch Viewer.
Click the "Test" button.
View the extracted results in the Data Model Preview and see the highlighting in the Document Viewer".