Quoting Method

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

Quoting Methods provide various mechanisms to feed "quotes" from a document to an AI model for Grooper's LLM-based features. Quoting Methods control what text is fed to the AI, allowing users to feed the AI only the necessary context needed to respond or reduce costs by reducing the amount of input tokens sent to the LLM service. Depending on which Quoting Method is selected and configured, the quote may be the entire document text, a portion of a document's text, data extracted from the document, layout data, or a combination of this data.
"Quoting Method" is class of embedded objects that feed quotes to an LLM. Quoting Methods are selected and configured by various items (including AI Extract) using a "Document Quoting" property.

Quoting Methods control how document content is selected, formatted, and supplied to artificial intelligence (AI) models for Grooper's LLM-based features. By determining what information is included in an AI prompt and how it is presented, Quoting Methods play a critical role in maximizing extraction accuracy, efficiency, and contextual relevance.

What is a Quoting Method?

A Quoting Method defines the strategy for presenting document content to an AI model. It acts as a bridge between Grooper’s document data and the AI, ensuring that only the most pertinent, well-structured, and cost-effective content is included in the prompt. Quoting Methods can target specific regions, fields, or structures within a document, apply preprocessing, and format the output in various ways (such as plain text, JSON, or layout schemas).

Purpose and Utility

Quoting Methods serve several key purposes:

Targeted context: Focus the AI’s attention on only the necessary portions of a document, such as a specific Data Field, table, or region.
Cost efficiency: Minimize prompt size to stay within LLM token limits and reduce usage costs.
Context awareness: Apply preprocessing or structure the data to help the AI interpret complex layouts, tables, or natural language.
Adaptability: Support different quoting strategies for various tasks, such as extraction, summarization, or classification.

How Quoting Methods are Used

Quoting Methods are configured within Grooper’s AI-powered activities, such as AI Extract, Clause Detection, and other LLM integrations. When an AI operation is performed, the selected Quoting Method determines:

The scope of content to extract (entire document, specific region, extracted values, etc.).
Any preprocessing or formatting to apply.
The format of the result (plain text, JSON, layout schema, etc.).
How the quote is supplied to the AI, often with a standard prefix or instructions.

Quoting Methods are typically selected and configured in the properties of a Data Model, Data Field, or extraction activity.

Types of Quoting Methods

Grooper provides several built-in Quoting Methods, each suited to different scenarios:

Extracted (most common)

The Extracted method allows you to select all or part of a document, optionally applying preprocessing such as tab or paragraph marking. You can target specific content using a Value Extractor, or quote the entire document if no extractor is specified. Preprocessing options help the AI interpret complex layouts or natural language.

The Extracted method is general the default Quoting Method. When unconfigured, the Extracted method will simply send the LLM the entire document's text data.

Use cases: Supplying only relevant extracted values, improving AI understanding of tables or paragraphs, quoting entire documents.

Labeled Region

The Labeled Region method extracts content that appears after a header label, optionally ending at a footer label or after a specified number of lines. This is ideal for quoting tables, sections, or other structured regions demarcated by recognizable headers and/or footers.

Use cases: Extracting tables or sections following a label, focusing the AI on relevant regions, handling repeating or variably-placed regions.

Data Values

The Data Values method is used to send an LLM data collected by the Extract activity, instead of the document's text data. The document's Data Model must be extracted first for this Quoting Method to work.

The Data Values method serializes the document’s Data Model hierarchy—including all Data Fields, Data Sections, and Data Tables—into structured JSON. This provides the AI with a complete, machine-readable representation of the document’s extracted data.

Use cases: Supplying the AI with a full export of extracted data, enabling advanced validation, summarization, or integration workflows.

Layout Objects

The Layout Objects method supplies the AI with a page-by-page map of all visible content and layout elements. This includes text segments' coordinates and Layout Data obtained by an IP Profile (including lines, barcodes, and checkboxes). The output includes a detailed JSON schema and corresponding data, enabling spatially-aware extraction and analysis.

Use cases: Layout analysis, table extraction, form understanding, OMR extraction

Semantic

The Semantic method selects portions of the document that are semantically similar to a set of example queries. It uses natural language similarity search powered by LLM-compatible embeddings, enabling precise and context-aware content selection.

The "Semantic" Quoting Method is identical in configuration to the "Clause Detection" Section Extract Method. Clause Detection is just the Sematic method wrapped up as an extract method for Data Sections. For more information on the Semantic method's property configuration, visit the Clause Detection article.

Use cases: Clause detection, policy matching, locating passages with varied wording, minimizing prompt size while maximizing relevance.

Multi Quote

The Multi Quote method combines multiple Quoting Methods to generate a comprehensive document quote. Each method is executed in sequence, and their outputs are included in the final quote, clearly labeled for the AI. This is ideal for complex extraction scenarios where a single strategy may not provide sufficient context.

Use cases: Extracting data from documents with complex or ambiguous layouts, comparing different quoting strategies, improving extraction accuracy.

Areas of Grooper That Use Quoting Methods

Quoting Methods are utilized in several key areas of Grooper:

LLM-enabled separation and classification capabilities
- AI Separate: Controls what content is supplied to the AI to determine if a page is the first page of a new document.
- LLM Classifier: Controls what content is supplied to the AI to determine which Document Type to assign to a Batch Folder during classification.
- Mark Attachments: When an Attachment Rule's "Generative AI" option is enabled, controls what document content is supplied to the AI to analyze if a document should be attached to the one before it or after it.
LLM-enabled data extraction capabilities
- AI Extract: Controls what content is supplied to the AI for extraction tasks.
- LLM-enabled Section Extract Methods
  - AI Collection Reader: Controls what content is supplied to the AI to locate multi-instance sections on a document.
  - AI Section Reader: Controls what content is supplied to the AI to locate single instance sections on a document.
  - AI Transaction Detection: Controls what content is supplied to the AI for its "Boundary Detector" and "Transaction Extractor" configurations.
  - Clause Detection: Uses the "Semantic" method to locate and extract clauses within unstructured documents using semantic similarity.
- AI Table Reader: Controls what content is supplied to the AI for Data Table extraction.
Other LLM-enabled capabilities
- AI Schema Extractor: Controls what content is supplied to the AI to generate the JSON response.
- AI Generator: Controls what source content is used to generate the document.

Summary

Quoting Methods in Grooper provide flexible, powerful mechanisms for selecting and formatting document content for AI operations. By choosing the appropriate method and configuration, users can optimize extraction accuracy, efficiency, and contextual relevance across a wide range of document types and workflows.