AI Response Alignment

From Grooper Wiki

This article is about the current version of Grooper.

Note that some content may still need to be updated.

20252024

When using AI Extract to extract structured data from documents, Grooper can align the LLM's extracted values back to their original locations in the document. This is controlled by the "Alignment" settings for Data Fields, Data Sections and Data Tables. Alignment enables users to navigate to, review, and highlight relevant content in the Data Grid and Data Viewer, improving the review experience and data validation workflow.

What is "alignment"?

Alignment is the process of mapping values extracted by the large language model (LLM) back to their source locations in the document. When alignment is configured (using AI Extract's "Alignment" properties), Grooper:

  • Highlights extracted values in the document viewer
  • Enables click-to-navigate from the Data Grid to the document location
  • Provides visual context for reviewing and validating extracted data
  • Improves confidence scoring and provenance tracking

Without alignment, extracted values are simply stored as data without any connection to their location in the original document. This is faster but provides no review context.

When to use alignment

Use alignment when:

  • Documents will be manually reviewed by users
  • Visual validation of extracted values is important
  • You need to trace values back to their source for quality control
  • Confidence scoring and provenance tracking are required

Disable alignment when:

  • Processing is fully automated with no human review
  • Performance is more important than review context
  • Documents are purely digital/structured with no OCR or layout variations

Configuring default alignment settings

Default alignment settings are configured on the AI Extract fill method itself, under the "Alignment" property. These defaults apply to all fields, sections, and tables unless overridden at the element level.

  1. Select your Data Model, Data Section, or Data Table
  2. In the Property Grid, expand the "Fill Methods" property
  3. Select or add an AI Extract fill method
  4. Expand the "Alignment" property
  5. Configure the following default alignment modes:
    • Field Alignment - Controls how field values are aligned (default: Natural)
    • Section Alignment - Controls how section instances are aligned (default: Auto)
    • Row Alignment - Controls how table rows are aligned (default: Auto)

Field alignment modes

Field alignment determines how individual field values are mapped back to the document. The following modes are available:

None

  • No alignment is performed
  • Only the value is requested from the LLM
  • Fastest option, but no highlighting or navigation
  • Use when: Field provenance is not needed or alignment is not possible

Natural

  • Default mode for fields
  • Grooper searches for an exact match of the value in the document text
  • Works with normalization and type-specific matching (handles different formats for numbers and dates)
  • Use when: Field values are unique and well-formed (e.g., IDs, invoice numbers, amounts)
  • Limitation: May fail for ambiguous or repeated values

Quoted

  • LLM returns both the value and the exact text as it appears in the document (originalValue)
  • Grooper uses the quoted text to locate the value in OCR results
  • Use when: Values may be ambiguous or appear multiple times
  • Best for: Long text fields, descriptions, or values embedded in larger passages
  • Limitation: Relies on LLM returning accurate quotes

Labeled

  • LLM returns both the value and a label/field name from the document
  • Grooper locates the label and associates the nearest value
  • Use when: Documents have clear field labels or headers
  • Best for: Forms, labeled fields, or documents with consistent label-value patterns
  • Limitation: May fail if labels are not unique or are missing

Labeled And Quoted

  • LLM returns the value, a label, and the original quoted text
  • Combines both label and quote for maximum accuracy
  • Use when: Neither label nor quote alone is sufficient
  • Best for: Complex documents, multi-line fields, or when maximum review accuracy is required
  • Most robust: Falls back gracefully if either label or quote is missing

Geometric

  • Requires the Layout Objects quoting method
  • LLM returns page number and bounding box coordinates for the value
  • Grooper uses spatial information to directly extract and highlight the region
  • Use when: Spatial accuracy is critical
  • Best for: Tables, structured layouts, or visually complex documents
  • Limitation: Requires Layout Objects quoting; fails if coordinates are invalid

Section alignment modes

Section alignment determines how section instances (single or repeating) are mapped back to the document. Configure this on Data Section elements using the AI Extract Section Options extension.

None

  • No alignment performed
  • Only section data is requested
  • Use when: Single-instance sections where provenance is not important

Auto

  • Default mode for sections
  • For multi-instance sections: Grooper infers boundaries from descendant field positions
  • For single-instance sections: Behaves like None
  • Use when: Section boundaries can be determined from field positions

Quote

  • LLM returns a quote containing the entire section content
  • Use when: Single-instance sections or sections with unique content
  • Best for: Sections that can be uniquely identified by their text

Start Quote

  • LLM returns a quote from the first line of each section instance
  • Grooper splits the document at these positions
  • Use when: Multi-instance sections with clear starting lines
  • Best for: Repeating sections like claims, line items, or transactions

Bounding Quotes

  • LLM returns quotes from both first and last lines of each section instance
  • Provides precise boundaries for each instance
  • Use when: Both start and end lines are needed for accuracy
  • Best for: Variable-length sections or complex multi-instance scenarios

Geometric

  • Requires the Layout Objects quoting method
  • LLM returns page number and bounding box for the section
  • Use when: Spatial accuracy is critical for visually distinct sections

Bound Child Fields

  • Sets section location to the bounding box of direct child fields
  • Use when: Section boundaries should match immediate child field locations

Bound Fields

  • Sets section location to the bounding box of all descendant fields
  • Use when: Section boundaries should encompass all nested fields

Table row alignment modes

Table row alignment determines how table rows are mapped back to the document. Configure this on Data Table elements using the AI Extract Table Options extension.

None

  • No alignment performed
  • Only table data is requested
  • Use when: Row provenance is not needed

Auto

  • Default mode for tables
  • Grooper infers row boundaries from cell value positions
  • Use when: Row boundaries can be determined from extracted cell values

Geometric

  • Requires the Layout Objects quoting method
  • LLM returns page number and bounding box for each row
  • Use when: Spatial accuracy is critical
  • Best for: Complex tables or when precise row highlighting is required

Best practices for alignment

Start with defaults

  • Use the default modes (Natural for fields, Auto for sections and tables) for most scenarios
  • Only override when specific alignment issues arise

Match alignment to document structure

  • Use Labeled modes for forms with clear field labels
  • Use Quoted modes for long text or ambiguous values
  • Use Geometric modes for complex layouts when using Layout Objects quoting

Consider performance

  • More complex alignment modes (Geometric, Labeled And Quoted) increase processing time
  • Disable alignment entirely for fully automated workflows
  • Test different modes to find the right balance for your use case

Use Layout Objects quoting for geometric alignment

  • Geometric alignment modes require the Layout Objects quoting method
  • This provides the LLM with location information for each text segment
  • Configure this in the "Document Quoting" property of AI Extract

Test and iterate

  • Review extracted data to verify alignment is working correctly
  • Adjust modes if values are not highlighting properly
  • Use diagnostic artifacts to troubleshoot alignment issues

Troubleshooting alignment issues

Values not highlighting:

  • Verify the correct alignment mode is selected
  • Check that OCR data is available (alignment requires OCR)
  • For Geometric mode, confirm Layout Objects quoting is enabled
  • Review diagnostic artifacts to see what the LLM returned

Incorrect highlighting:

  • Try a more specific alignment mode (e.g., Labeled instead of Natural)
  • For repeated values, use Quoted or Labeled modes
  • Verify LLM is returning accurate quotes or labels

Performance issues:

  • Reduce alignment complexity (use Auto or Natural instead of Geometric)
  • Consider disabling alignment for fields that don't need review
  • Use conditional triggers to skip alignment for certain scenarios

Example configuration

Invoice extraction with mixed alignment:

  • Invoice Number field: Natural (unique value, easy to match)
  • Invoice Date field: Natural (date matching handles format variations)
  • Description field: Quoted (long text, may be ambiguous)
  • Amount field: Labeled (may appear multiple times, use label to disambiguate)
  • Line Items table: Auto (infer rows from cell positions)
  • Notes section: Quote (capture entire section content)

This configuration balances accuracy, performance, and review usability for a typical invoice document.