AI Response Alignment

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

2024

When using AI Extract to extract structured data from documents, Grooper can align the LLM's extracted values back to their original locations in the document. This is controlled by the "Alignment" settings for Data Fields, Data Sections and Data Tables. Alignment enables users to navigate to, review, and highlight relevant content in the Data Grid and Data Viewer, improving the review experience and data validation workflow.

What is "alignment"?

Alignment is the process of mapping values extracted by the large language model (LLM) back to their source locations in the document. When alignment is configured (using AI Extract's "Alignment" properties), Grooper:

Highlights extracted values in the document viewer
Enables click-to-navigate from the Data Grid to the document location
Provides visual context for reviewing and validating extracted data
Improves confidence scoring and provenance tracking

Without alignment, extracted values are simply stored as data without any connection to their location in the original document. This is faster but provides no review context.

When to use alignment

Use alignment when:

Documents will be manually reviewed by users
Visual validation of extracted values is important
You need to trace values back to their source for quality control
Confidence scoring and provenance tracking are required

Disable alignment when:

Processing is fully automated with no human review
Performance is more important than review context
Documents are purely digital/structured with no OCR or layout variations

Configuring default alignment settings

Default alignment settings are configured on the AI Extract fill method itself, under the "Alignment" property. These defaults apply to all fields, sections, and tables unless overridden at the element level.

Select your Data Model, Data Section, or Data Table
In the Property Grid, expand the "Fill Methods" property
Select or add an AI Extract fill method
Expand the "Alignment" property
Configure the following default alignment modes:
- Field Alignment - Controls how field values are aligned (default: Natural)
- Section Alignment - Controls how section instances are aligned (default: Auto)
- Row Alignment - Controls how table rows are aligned (default: Auto)

Field alignment modes

Field alignment determines how individual field values are mapped back to the document. The following modes are available:

None

No alignment is performed
Only the value is requested from the LLM
Fastest option, but no highlighting or navigation
Use when: Field provenance is not needed or alignment is not possible

Natural

Default mode for fields
Grooper searches for an exact match of the value in the document text
Works with normalization and type-specific matching (handles different formats for numbers and dates)
Use when: Field values are unique and well-formed (e.g., IDs, invoice numbers, amounts)
Limitation: May fail for ambiguous or repeated values

Quoted

LLM returns both the value and the exact text as it appears in the document (originalValue)
Grooper uses the quoted text to locate the value in OCR results
Use when: Values may be ambiguous or appear multiple times
Best for: Long text fields, descriptions, or values embedded in larger passages
Limitation: Relies on LLM returning accurate quotes

Labeled

LLM returns both the value and a label/field name from the document
Grooper locates the label and associates the nearest value
Use when: Documents have clear field labels or headers
Best for: Forms, labeled fields, or documents with consistent label-value patterns
Limitation: May fail if labels are not unique or are missing

Labeled And Quoted

LLM returns the value, a label, and the original quoted text
Combines both label and quote for maximum accuracy
Use when: Neither label nor quote alone is sufficient
Best for: Complex documents, multi-line fields, or when maximum review accuracy is required
Most robust: Falls back gracefully if either label or quote is missing

Geometric

Requires the Layout Objects quoting method
LLM returns page number and bounding box coordinates for the value
Grooper uses spatial information to directly extract and highlight the region
Use when: Spatial accuracy is critical
Best for: Tables, structured layouts, or visually complex documents
Limitation: Requires Layout Objects quoting; fails if coordinates are invalid

Section alignment modes

Section alignment determines how section instances (single or repeating) are mapped back to the document. Configure this on Data Section elements using the AI Extract Section Options extension.

None

No alignment performed
Only section data is requested
Use when: Single-instance sections where provenance is not important

Auto

Default mode for sections
For multi-instance sections: Grooper infers boundaries from descendant field positions
For single-instance sections: Behaves like None
Use when: Section boundaries can be determined from field positions

Quote

LLM returns a quote containing the entire section content
Use when: Single-instance sections or sections with unique content
Best for: Sections that can be uniquely identified by their text

Start Quote

LLM returns a quote from the first line of each section instance
Grooper splits the document at these positions
Use when: Multi-instance sections with clear starting lines
Best for: Repeating sections like claims, line items, or transactions

Bounding Quotes

LLM returns quotes from both first and last lines of each section instance
Provides precise boundaries for each instance
Use when: Both start and end lines are needed for accuracy
Best for: Variable-length sections or complex multi-instance scenarios

Geometric

Requires the Layout Objects quoting method
LLM returns page number and bounding box for the section
Use when: Spatial accuracy is critical for visually distinct sections

Bound Child Fields

Sets section location to the bounding box of direct child fields
Use when: Section boundaries should match immediate child field locations

Bound Fields

Sets section location to the bounding box of all descendant fields
Use when: Section boundaries should encompass all nested fields

Table row alignment modes

Table row alignment determines how table rows are mapped back to the document. Configure this on Data Table elements using the AI Extract Table Options extension.

None

No alignment performed
Only table data is requested
Use when: Row provenance is not needed

Auto

Default mode for tables
Grooper infers row boundaries from cell value positions
Use when: Row boundaries can be determined from extracted cell values

Geometric

Requires the Layout Objects quoting method
LLM returns page number and bounding box for each row
Use when: Spatial accuracy is critical
Best for: Complex tables or when precise row highlighting is required

Best practices for alignment

Start with defaults

Use the default modes (Natural for fields, Auto for sections and tables) for most scenarios
Only override when specific alignment issues arise

Match alignment to document structure

Use Labeled modes for forms with clear field labels
Use Quoted modes for long text or ambiguous values
Use Geometric modes for complex layouts when using Layout Objects quoting

Consider performance

More complex alignment modes (Geometric, Labeled And Quoted) increase processing time
Disable alignment entirely for fully automated workflows
Test different modes to find the right balance for your use case

Use Layout Objects quoting for geometric alignment

Geometric alignment modes require the Layout Objects quoting method
This provides the LLM with location information for each text segment
Configure this in the "Document Quoting" property of AI Extract

Test and iterate

Review extracted data to verify alignment is working correctly
Adjust modes if values are not highlighting properly
Use diagnostic artifacts to troubleshoot alignment issues

Troubleshooting alignment issues

Values not highlighting:

Verify the correct alignment mode is selected
Check that OCR data is available (alignment requires OCR)
For Geometric mode, confirm Layout Objects quoting is enabled
Review diagnostic artifacts to see what the LLM returned

Incorrect highlighting:

Try a more specific alignment mode (e.g., Labeled instead of Natural)
For repeated values, use Quoted or Labeled modes
Verify LLM is returning accurate quotes or labels

Performance issues:

Reduce alignment complexity (use Auto or Natural instead of Geometric)
Consider disabling alignment for fields that don't need review
Use conditional triggers to skip alignment for certain scenarios

Example configuration

Invoice extraction with mixed alignment:

Invoice Number field: Natural (unique value, easy to match)
Invoice Date field: Natural (date matching handles format variations)
Description field: Quoted (long text, may be ambiguous)
Amount field: Labeled (may appear multiple times, use label to disambiguate)
Line Items table: Auto (infer rows from cell positions)
Notes section: Quote (capture entire section content)

This configuration balances accuracy, performance, and review usability for a typical invoice document.