Extracted (Quoting Method): Difference between revisions

From Grooper Wiki
Created page with "{{AutoVersion}} <blockquote>{{#lst:Glossary|Extracted QM}}</blockquote> {|class="wip-box" | '''WIP''' | This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly. This tag will be removed upon draft completion. |} == Introduction =="
 
// via Wikitext Extension for VSCode
Line 13: Line 13:


== Introduction ==
== Introduction ==
'''Extracted''' is a [[Quoting Method]] that sends document text to an AI activity as plain text, either by quoting the full document content or only the portions matched by a configured Value Extractor. It is useful when you want precise control over what the AI sees, because you can limit the quote to relevant content, apply preprocessing to improve readability, and optionally preserve page or line references for easier review and troubleshooting.
In practice, Extracted is used when you want the AI to read:
* the whole document
* a specific set of matched values
* a cleaned-up version of the source text
* page-separated text
* text with line numbers for easier reference
=== Benefits and Drawbacks===
The primary benefit to using the Extracted Quoting Method is you can narrow down what you send to the LLM using any extraction method you like. It gives you flexible control over what you send to the AI.
However, using the Extracted Quoting Method can require understanding of different extractors, how they work, and how to grab specifically what you want to send to the LLM. Knowledge of Regular Expressions is highly recommended when using this method.
=== How Extracted differs from other quoting methods ===
The Extracted Quoting Method differs from other Quoting Methods because it focuses on sending '''plain text''' to the AI, either from the whole document or from only the portions matched by a configured Value Extractor. It is best when you want flexible control over what text the AI sees and how that text is cleaned up with preprocessing. Other Quoting Methods are typically more specialized. For example, some are better for sending structured [[Data Model]] values, some focus on text near labels or headers, and others preserve document layout in a more structured form. In short, Extracted is the most flexible choice when the goal is to quote readable document text rather than structured data or layout information.
=== Common use cases ===
* Quote only invoice numbers, dates, or totals found by a [[Value Extractor]].
* Quote an entire letter or contract so the AI can summarize it.
* Quote table text after preprocessing so spaced columns become easier for the AI to read.
* Quote page-separated content when page identity matters.
* Quote text with line numbers when the AI must cite or discuss specific lines.
== How to ==
=== Configure the Extracted Quoting Method ===
# Open the AI activity or configuration object where the Quoting Method is used, such as the AI Extract Fill Method.
# Add or select '''Extracted''' as the Quoting Method.
# Set "Description" if you want to tell the AI what this quote contains or how it should use it.
# Set "Quote Extractor" to a [[Value Extractor]] if you want only matched content.
# If your extractor may return multiple matches, set "Result Delimiter" to control how those matches are joined.
# Configure "Preprocessing" if the source text needs cleanup before being sent to the AI.
# Turn on "Include Line Numbers" if you want each quoted line numbered.
# Turn on "Paginated" if you want the quote split into separate pages.
# Save the configuration and test the AI activity with representative documents.
<div style="position: relative; box-sizing: content-box; max-height: 80vh; max-height: 80svh; width: 100%; aspect-ratio: 1.78; padding: 40px 0 40px 0;"><iframe src="https://app.supademo.com/embed/cmqjld1h302whyv0kyym9f9sj?embed_v=2&utm_source=embed" loading="lazy" title="01 Configuring the Extracted Quoting Method" allow="clipboard-write" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
== Property reference ==
=== Common Quoting Method properties ===
==== "Name" ====
A friendly label for this quoting method.
Use it when:
* multiple quoting methods are used together
* you want a clearer label in configuration and diagnostics
* you want the AI to see a more meaningful section name
If left blank, Grooper uses the quoting method's type name.
==== "Description" ====
Text that explains what the quote contains or how the AI should use it.
Use it when:
* the quote contains a specific kind of content
* the formatting needs explanation
* you want to guide the AI's interpretation of the quote
Example:
<pre>
This quote contains invoice header values extracted from the document.
Use it to identify the invoice number, invoice date, and total amount.
</pre>
=== Extracted properties ===
==== "Quote Extractor" ====
A [[Value Extractor]] that selects the content to quote.
Behavior:
* If set, Grooper finds all matching values and combines them into the quote.
* If not set, Grooper quotes the entire source text from the selected scope.
Use "Quote Extractor" when:
* you only want specific values
* you want to reduce prompt size
* you want to keep the AI focused on key content
Example regular expression:
<pre>
Invoice\s*No\.?:?\s*(\d+)
</pre>
==== "Result Delimiter" ====
The separator used when "Quote Extractor" returns more than one match.
Common values:
* <code>\n</code> for one result per line
* <code>\t</code> for tab-separated results
* <code>, </code> for inline lists
* <code>\f</code> when page-like separation is needed
Use this property to control how multiple matches appear in the final quote.
==== "Preprocessing" ====
A [[Text Preprocessor]] applied before the quote is sent to the AI.
This is useful when:
* paragraphs are wrapped across lines
* wide spaces should be treated like tabs
* tabular or multi-column text is hard to read as plain text
* control characters interfere with matching or interpretation
Use preprocessing carefully. It can improve readability and extraction, but it also changes the text the AI receives.
==== "Document Level" ====
Controls quote scope.
* '''On''' — Quote from the full [[Document Instance]].
* '''Off''' — Quote only from the current [[Section Instance]] or other field container scope.
Turn this on when:
* the AI needs document-wide context
* the target content may exist outside the current [[Data Section]] or [[Data Field]]
Leave this off when:
* only local context is needed
* you want a smaller, more focused quote
==== "Include Line Numbers" ====
Adds a line number to each quoted line.
Use this when:
* you want easier review in diagnostics
* the AI may need to refer to exact lines
* you are troubleshooting what text was sent
Example:
<pre>
01: Invoice Number: 12345
02: Date: 2024-06-01
03: Total: $1,234.56
</pre>
==== "Paginated" ====
Splits the quote into separate pages and labels each page.
Use this when:
* page boundaries matter
* the document is long
* the AI may need to discuss content page by page
This is especially helpful for contracts, reports, and multi-page forms.
== What Extracted does during execution ==
At runtime, Extracted follows this general sequence:
# Determine the source scope from "Document Level".
# If "Quote Extractor" is set, find the matching values; otherwise use the whole source text.
# If no matches are found, the quote reports that no match content was found.
# Apply "Preprocessing" if configured.
# If "Paginated" is on, split the text into pages.
# If "Include Line Numbers" is on, number the lines.
# Send the final quote text to the AI activity.
This order is helpful when troubleshooting, because it explains why the final quote may look different from the raw source text.
== Testing and diagnostics ==
=== How to test configuration ===
# Start with a document that clearly contains the text you expect to quote.
# Run the AI activity using Extracted.
# Review the output and confirm the AI received the right content.
# If the quote is too large, add or tighten the "Quote Extractor".
# If the quote is hard to read, adjust "Result Delimiter" or "Preprocessing".
# If page context matters, turn on "Paginated".
# If you need easier review, turn on "Include Line Numbers".
=== How to review diagnostics ===
To understand what Extracted actually did:
# Run the activity in Grooper Design with diagnostics available for that operation.
# Open the diagnostic details for the AI step.
# Review the quote content that was generated.
# Confirm:
#* whether the full document or current scope was used
#* whether "Quote Extractor" returned the expected matches
#* whether multiple matches were joined correctly by "Result Delimiter"
#* whether "Preprocessing" changed spacing, paragraphs, or control characters
#* whether line numbers were added
#* whether the quote was split into separate pages
If diagnostics show a message that no match content was found, the extractor did not return any matches in the chosen scope.
== Troubleshooting ==
=== Problem: The quote is too large ===
Possible causes:
* "Quote Extractor" is blank
* "Document Level" is on when only local content is needed
* "Paginated" or line numbering adds extra text
Try this:
* Add a targeted "Quote Extractor"
* Turn off "Document Level" if the current container is sufficient
* Remove extra formatting options unless they are needed
=== Problem: The AI did not see the expected value ===
Possible causes:
* The extractor did not match the source text
* The scope is too narrow
* Preprocessing changed the text in a way that affected the match
Try this:
* Test the extractor against the actual document text
* Turn on "Document Level" if the value may be outside the current scope
* Temporarily simplify or disable "Preprocessing" and test again
=== Problem: Multiple extracted values are hard to read ===
Possible causes:
* "Result Delimiter" is not appropriate for the content
Try this:
* Use <code>\n</code> for one value per line
* Use <code>\t</code> for row-like or table-like output
* Use a visible separator such as <code>, </code> only when inline text is preferred
=== Problem: Table text is confusing ===
Possible causes:
* Raw spacing does not clearly show columns
Try this:
* Configure "Preprocessing" to mark tabs or handle paragraph flow
* Use a delimiter that preserves row readability
* Test with and without line numbers to see which is easier to inspect
=== Problem: The quote is correct, but hard to reference ===
Try this:
* Turn on "Include Line Numbers"
* Turn on "Paginated" for multi-page documents
* Add a better "Name" and "Description" when multiple quotes are combined

Revision as of 13:52, 18 June 2026

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.

Introduction

Extracted is a Quoting Method that sends document text to an AI activity as plain text, either by quoting the full document content or only the portions matched by a configured Value Extractor. It is useful when you want precise control over what the AI sees, because you can limit the quote to relevant content, apply preprocessing to improve readability, and optionally preserve page or line references for easier review and troubleshooting.

In practice, Extracted is used when you want the AI to read:

  • the whole document
  • a specific set of matched values
  • a cleaned-up version of the source text
  • page-separated text
  • text with line numbers for easier reference

Benefits and Drawbacks

The primary benefit to using the Extracted Quoting Method is you can narrow down what you send to the LLM using any extraction method you like. It gives you flexible control over what you send to the AI.

However, using the Extracted Quoting Method can require understanding of different extractors, how they work, and how to grab specifically what you want to send to the LLM. Knowledge of Regular Expressions is highly recommended when using this method.

How Extracted differs from other quoting methods

The Extracted Quoting Method differs from other Quoting Methods because it focuses on sending plain text to the AI, either from the whole document or from only the portions matched by a configured Value Extractor. It is best when you want flexible control over what text the AI sees and how that text is cleaned up with preprocessing. Other Quoting Methods are typically more specialized. For example, some are better for sending structured Data Model values, some focus on text near labels or headers, and others preserve document layout in a more structured form. In short, Extracted is the most flexible choice when the goal is to quote readable document text rather than structured data or layout information.

Common use cases

  • Quote only invoice numbers, dates, or totals found by a Value Extractor.
  • Quote an entire letter or contract so the AI can summarize it.
  • Quote table text after preprocessing so spaced columns become easier for the AI to read.
  • Quote page-separated content when page identity matters.
  • Quote text with line numbers when the AI must cite or discuss specific lines.

How to

Configure the Extracted Quoting Method

  1. Open the AI activity or configuration object where the Quoting Method is used, such as the AI Extract Fill Method.
  2. Add or select Extracted as the Quoting Method.
  3. Set "Description" if you want to tell the AI what this quote contains or how it should use it.
  4. Set "Quote Extractor" to a Value Extractor if you want only matched content.
  5. If your extractor may return multiple matches, set "Result Delimiter" to control how those matches are joined.
  6. Configure "Preprocessing" if the source text needs cleanup before being sent to the AI.
  7. Turn on "Include Line Numbers" if you want each quoted line numbered.
  8. Turn on "Paginated" if you want the quote split into separate pages.
  9. Save the configuration and test the AI activity with representative documents.

Property reference

Common Quoting Method properties

"Name"

A friendly label for this quoting method.

Use it when:

  • multiple quoting methods are used together
  • you want a clearer label in configuration and diagnostics
  • you want the AI to see a more meaningful section name

If left blank, Grooper uses the quoting method's type name.

"Description"

Text that explains what the quote contains or how the AI should use it.

Use it when:

  • the quote contains a specific kind of content
  • the formatting needs explanation
  • you want to guide the AI's interpretation of the quote

Example:

This quote contains invoice header values extracted from the document.
Use it to identify the invoice number, invoice date, and total amount.

Extracted properties

"Quote Extractor"

A Value Extractor that selects the content to quote.

Behavior:

  • If set, Grooper finds all matching values and combines them into the quote.
  • If not set, Grooper quotes the entire source text from the selected scope.

Use "Quote Extractor" when:

  • you only want specific values
  • you want to reduce prompt size
  • you want to keep the AI focused on key content

Example regular expression:

Invoice\s*No\.?:?\s*(\d+)

"Result Delimiter"

The separator used when "Quote Extractor" returns more than one match.

Common values:

  • \n for one result per line
  • \t for tab-separated results
  • , for inline lists
  • \f when page-like separation is needed

Use this property to control how multiple matches appear in the final quote.

"Preprocessing"

A Text Preprocessor applied before the quote is sent to the AI.

This is useful when:

  • paragraphs are wrapped across lines
  • wide spaces should be treated like tabs
  • tabular or multi-column text is hard to read as plain text
  • control characters interfere with matching or interpretation

Use preprocessing carefully. It can improve readability and extraction, but it also changes the text the AI receives.

"Document Level"

Controls quote scope.

Turn this on when:

  • the AI needs document-wide context
  • the target content may exist outside the current Data Section or Data Field

Leave this off when:

  • only local context is needed
  • you want a smaller, more focused quote

"Include Line Numbers"

Adds a line number to each quoted line.

Use this when:

  • you want easier review in diagnostics
  • the AI may need to refer to exact lines
  • you are troubleshooting what text was sent

Example:

01: Invoice Number: 12345
02: Date: 2024-06-01
03: Total: $1,234.56

"Paginated"

Splits the quote into separate pages and labels each page.

Use this when:

  • page boundaries matter
  • the document is long
  • the AI may need to discuss content page by page

This is especially helpful for contracts, reports, and multi-page forms.

What Extracted does during execution

At runtime, Extracted follows this general sequence:

  1. Determine the source scope from "Document Level".
  2. If "Quote Extractor" is set, find the matching values; otherwise use the whole source text.
  3. If no matches are found, the quote reports that no match content was found.
  4. Apply "Preprocessing" if configured.
  5. If "Paginated" is on, split the text into pages.
  6. If "Include Line Numbers" is on, number the lines.
  7. Send the final quote text to the AI activity.

This order is helpful when troubleshooting, because it explains why the final quote may look different from the raw source text.

Testing and diagnostics

How to test configuration

  1. Start with a document that clearly contains the text you expect to quote.
  2. Run the AI activity using Extracted.
  3. Review the output and confirm the AI received the right content.
  4. If the quote is too large, add or tighten the "Quote Extractor".
  5. If the quote is hard to read, adjust "Result Delimiter" or "Preprocessing".
  6. If page context matters, turn on "Paginated".
  7. If you need easier review, turn on "Include Line Numbers".

How to review diagnostics

To understand what Extracted actually did:

  1. Run the activity in Grooper Design with diagnostics available for that operation.
  2. Open the diagnostic details for the AI step.
  3. Review the quote content that was generated.
  4. Confirm:
    • whether the full document or current scope was used
    • whether "Quote Extractor" returned the expected matches
    • whether multiple matches were joined correctly by "Result Delimiter"
    • whether "Preprocessing" changed spacing, paragraphs, or control characters
    • whether line numbers were added
    • whether the quote was split into separate pages

If diagnostics show a message that no match content was found, the extractor did not return any matches in the chosen scope.

Troubleshooting

Problem: The quote is too large

Possible causes:

  • "Quote Extractor" is blank
  • "Document Level" is on when only local content is needed
  • "Paginated" or line numbering adds extra text

Try this:

  • Add a targeted "Quote Extractor"
  • Turn off "Document Level" if the current container is sufficient
  • Remove extra formatting options unless they are needed

Problem: The AI did not see the expected value

Possible causes:

  • The extractor did not match the source text
  • The scope is too narrow
  • Preprocessing changed the text in a way that affected the match

Try this:

  • Test the extractor against the actual document text
  • Turn on "Document Level" if the value may be outside the current scope
  • Temporarily simplify or disable "Preprocessing" and test again

Problem: Multiple extracted values are hard to read

Possible causes:

  • "Result Delimiter" is not appropriate for the content

Try this:

  • Use \n for one value per line
  • Use \t for row-like or table-like output
  • Use a visible separator such as , only when inline text is preferred

Problem: Table text is confusing

Possible causes:

  • Raw spacing does not clearly show columns

Try this:

  • Configure "Preprocessing" to mark tabs or handle paragraph flow
  • Use a delimiter that preserves row readability
  • Test with and without line numbers to see which is easier to inspect

Problem: The quote is correct, but hard to reference

Try this:

  • Turn on "Include Line Numbers"
  • Turn on "Paginated" for multi-page documents
  • Add a better "Name" and "Description" when multiple quotes are combined