Quoting Method: Difference between revisions
Dgreenwood (talk | contribs) Removed redirect to Document Quoting (Property) Tag: Removed redirect |
Dgreenwood (talk | contribs) |
||
| Line 54: | Line 54: | ||
=== Semantic === | === Semantic === | ||
The '''Semantic''' method selects portions of the document that are semantically similar to a set of example queries. It uses natural language similarity search powered by LLM-compatible embeddings, enabling precise and context-aware content selection. | The '''Semantic''' method selects portions of the document that are semantically similar to a set of example queries. It uses natural language similarity search powered by LLM-compatible embeddings, enabling precise and context-aware content selection. | ||
*<li class="fyi-bullet"> The "Semantic" Quoting Method is identical in configuration to the "Clause Detection" Section Extract Method. Clause Detection is just the Sematic method wrapped up as an extract method for Data Sections. For more information on the Semantic method's property configuration, visit the [[Clause Detection]] article. | |||
* Use cases: Clause detection, policy matching, locating passages with varied wording, minimizing prompt size while maximizing relevance. | * Use cases: Clause detection, policy matching, locating passages with varied wording, minimizing prompt size while maximizing relevance. | ||
Revision as of 13:10, 9 September 2025
Quoting Methods provide various mechanisms to feed "quotes" from a document to an AI model for Grooper's LLM-based features. Quoting Methods control what text is fed to the AI, allowing users to feed the AI only the necessary context needed to respond or reduce costs by reducing the amount of input tokens sent to the LLM service. Depending on which Quoting Method is selected and configured, the quote may be the entire document text, a portion of a document's text, data extracted from the document, layout data, or a combination of this data.
- "Quoting Method" is class of embedded objects that feed quotes to an LLM. Quoting Methods are selected and configured by various items (including AI Extract) using a "Document Quoting" property.
Quoting Methods control how document content is selected, formatted, and supplied to artificial intelligence (AI) models for Grooper's LLM-based features. By determining what information is included in an AI prompt and how it is presented, Quoting Methods play a critical role in maximizing extraction accuracy, efficiency, and contextual relevance.
What is a Quoting Method?
A Quoting Method defines the strategy for presenting document content to an AI model. It acts as a bridge between Grooper’s document data and the AI, ensuring that only the most pertinent, well-structured, and cost-effective content is included in the prompt. Quoting Methods can target specific regions, fields, or structures within a document, apply preprocessing, and format the output in various ways (such as plain text, JSON, or layout schemas).
Purpose and Utility
Quoting Methods serve several key purposes:
- Targeted context: Focus the AI’s attention on only the necessary portions of a document, such as a specific Data Field, table, or region.
- Cost efficiency: Minimize prompt size to stay within LLM token limits and reduce usage costs.
- Context awareness: Apply preprocessing or structure the data to help the AI interpret complex layouts, tables, or natural language.
- Adaptability: Support different quoting strategies for various tasks, such as extraction, summarization, or classification.
How Quoting Methods are Used
Quoting Methods are configured within Grooper’s AI-powered activities, such as AI Extract, Clause Detection, and other LLM integrations. When an AI operation is performed, the selected Quoting Method determines:
- The scope of content to extract (entire document, specific region, extracted values, etc.).
- Any preprocessing or formatting to apply.
- The format of the result (plain text, JSON, layout schema, etc.).
- How the quote is supplied to the AI, often with a standard prefix or instructions.
Quoting Methods are typically selected and configured in the properties of a Data Model, Data Field, or extraction activity.
Types of Quoting Methods
Grooper provides several built-in Quoting Methods, each suited to different scenarios:
Extracted (most common)
The Extracted method allows you to select all or part of a document, optionally applying preprocessing such as tab or paragraph marking. You can target specific content using a Value Extractor, or quote the entire document if no extractor is specified. Preprocessing options help the AI interpret complex layouts or natural language.
- The Extracted method is general the default Quoting Method. When unconfigured, the Extracted method will simply send the LLM the entire document's text data.
- Use cases: Supplying only relevant extracted values, improving AI understanding of tables or paragraphs, quoting entire documents.
Labeled Region
The Labeled Region method extracts content that appears after a header label, optionally ending at a footer label or after a specified number of lines. This is ideal for quoting tables, sections, or other structured regions demarcated by recognizable headers and/or footers.
- Use cases: Extracting tables or sections following a label, focusing the AI on relevant regions, handling repeating or variably-placed regions.
Data Values
The Data Values method is used to send an LLM data collected by the Extract activity, instead of the document's text data. The document's Data Model must be extracted first for this Quoting Method to work.
The Data Values method serializes the document’s Data Model hierarchy—including all Data Fields, Data Sections, and Data Tables—into structured JSON. This provides the AI with a complete, machine-readable representation of the document’s extracted data.
- Use cases: Supplying the AI with a full export of extracted data, enabling advanced validation, summarization, or integration workflows.
Layout Objects
The Layout Objects method supplies the AI with a page-by-page map of all visible content and layout elements. This includes text segments' coordinates and Layout Data obtained by an IP Profile (including lines, barcodes, and checkboxes). The output includes a detailed JSON schema and corresponding data, enabling spatially-aware extraction and analysis.
- Use cases: Layout analysis, table extraction, form understanding, OMR extraction
Semantic
The Semantic method selects portions of the document that are semantically similar to a set of example queries. It uses natural language similarity search powered by LLM-compatible embeddings, enabling precise and context-aware content selection.
- The "Semantic" Quoting Method is identical in configuration to the "Clause Detection" Section Extract Method. Clause Detection is just the Sematic method wrapped up as an extract method for Data Sections. For more information on the Semantic method's property configuration, visit the Clause Detection article.
- Use cases: Clause detection, policy matching, locating passages with varied wording, minimizing prompt size while maximizing relevance.
Multi Quote
The Multi Quote method combines multiple Quoting Methods to generate a comprehensive document quote. Each method is executed in sequence, and their outputs are included in the final quote, clearly labeled for the AI. This is ideal for complex extraction scenarios where a single strategy may not provide sufficient context.
- Use cases: Extracting data from documents with complex or ambiguous layouts, comparing different quoting strategies, improving extraction accuracy.
Areas of Grooper That Use Quoting Methods
Quoting Methods are utilized in several key areas of Grooper:
- AI Extract: Controls what content is supplied to the AI for extraction tasks.
- Clause Detection: Locates and extracts clauses within structured documents using semantic similarity.
- Data Model and Data Field configuration: Allows users to specify quoting strategies for individual fields or sections.
- Table and region extraction: Supports advanced extraction workflows for tables, forms, and labeled regions.
- Prompt engineering: Enables users to tailor the context and format of AI prompts for optimal results.
Best Practices
- Choose the Quoting Method that best matches your document structure and extraction goals.
- Use targeted extraction to minimize prompt size and improve AI relevance.
- Apply preprocessing for documents with complex layouts or natural language.
- Combine multiple quoting strategies with Multi Quote for challenging documents.
- Review diagnostic artifacts (such as generated JSON) to verify the structure and content supplied to the AI.
Summary
Quoting Methods in Grooper provide flexible, powerful mechanisms for selecting and formatting document content for AI operations. By choosing the appropriate method and configuration, users can optimize extraction accuracy, efficiency, and contextual relevance across a wide range of document types and workflows.