Alignment (Property): Difference between revisions

Revision as of 13:37, 30 July 2024

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

"Alignment" refers to how Grooper highlights text from an AI response on a document in a Document Viewer. Alignment properties can be configured to alter how Grooper highlights results when using LLM-based extraction methods, such as AI Extract.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2024). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

[[Media: ]]
[[Media: ]]

Glossary

About

As data values are copied into the Grooper document, they may optionally be aligned to a location in the document content. The process of mapping AI responses back to the document allows Grooper to highlight regions on the document as users tab into the associated fields, and can be critical for efficient review of the documents. If the documents will not be reviewed by human operators after data extraction, then alignment may be unnecessary.

Keep in mind that changing the Alignment will affect the length of the prompt given to the AI. This in turn will affect the tokens consumed which will impact the "cost" of the prompt in both time of processing and monetary value. Therefore no Alignment will cost the least and produce no highlighting, which is best when fully automating and doing no human review.

Alignment is controlled by first configuring the alignment properties of the AI Extract Fill Method with defaults for how Data Sections, Data Tables, and Data Fields will be aligned. After the defaults have been established, they can be overridden on individual Data Sections, Data Tables, and Data Fields using the AI Extract Section Options, AI Extract Table Options, and AI Extract Field Options properties on these Data Elements.

Alignment Properties of AI Extract

These properties are set within AI Extract on container Data Elements (Data Model, Data Section, Data Table). Think of these as the "global" settings for descendant elements.

Default Field Alignment: This is the default alignment mode for all descendant Data Fields.
- None: Data values captured by the AI will not be aligned to the document and no highlighting will be produced.
- Natural: Data values will be aligned if their value appears verbatim in the document.
  - For example consider a Data Field set to a Value Type of DateTime with a Format Specifier of d. A date value is on a document with a format of "01/01/2001". The value in the Data Field would return as "01/01/2001" and would highlight on the document. If the value on the document was instead "January 1st, 2001" the value in the Data Field would return as "01/01/2001" and there would be no highlight.
  - Token consumption is higher than None.
- Quoted: Data values will be aligned using a quote from the document content.
  - Consider the previous example. In this case, if the value on the document was "January 1st, 2001" and the displayed value in the Data Field was "01/01/2001" this setting would produce a result.
  - Token consumption is higher than Natural.
- Labeled: Data values will be aligned using a label and a quote from the document content.
  - If multiple dates with similar values are on a document, this setting would allow the specific date related to a label to be returned. If not, the first occurrence of the value will always be highlighted.
  - Token consumption is higher than Quoted. This is the most expensive option, but may prove to be the most accurate for highlighting.
Default Section Alignment: This is the default alignment mode for all descendant Data Sections.
- None: The section will not be aligned to the document and no highlighting will be produced.
- Split: Ask the AI to quote the first line of each section instance, and then split the input at these positions.
  - Mainly used for Multi-instance Data Sections. Think of it like the Data Section Extract Method of Divider with a Split Position of Begin. The pink bounding box drawn will start on the first line of the instance of the section, and end at the first line of the next repeated section.
  - Token consumption is higher than None.
- BoundingQuotes: Ask the AI to quote the first and last lines of the section instance.
  - Think of it like the Data Section Extract Method of Divider with a Split Position of Between. The pink bounding box drawn wills tart on the first line of the instance of the section, and end at the last line of the section.
  - Token consumption is higher than Split.
- Quote: Ask the AI to quote the entire section content.
  - Think of it like the Data Section Extract Method of Simple. In this case you would create an extractor that would return an entire section instance.
  - Token consumption is higher than BoundingQuotes. This is the most expensive option, but may prove to be the most accurate for highlighting.
Default Table Alignment: This is the default alignment mode for all descendant Data Tables.
- None: The table will not be aligned to the document and no highlighting will be produced.
- Split: Ask the AI to quote the first line of each row, and then split the input at these positions.
  - Useful when considering multi-line table rows, but could be functional for single line table rows.
  - Token consumption is higher than None.
- BoundingQuotes: Ask the AI to quote the first and last lines of the row.
  - This is only really useful when considering multi-line table rows.
  - Token consumption is higher than Split.
- Quote: Ask the AI to quote the entire row.
  - Token consumption is higher than BoundingQuotes. This is the most expensive option, but may prove to be the most accurate for highlighting.

Alignment Properties of Data Elements

These properties are set on individual Data Elements. Think of thees as "overrides" to the "global" properties set on parent containers.

AI Extract Field Options is a property found on Data Fields that is set by configuring its sub-property Field Alignment. The settings found in the drop-down for Field Alignment mirror those seen in the Default Field Alignment property described above.
AI Extract Section Options is a property found on Data Sections that is set by configuring its sub-properties Section Alignment and Field Alignment. The settings in the drop-down for Section Alignment mirror those seen in the Default Section Alignment property described above. The settings in the drop-down for Field Alignment mirror those seen in the Default Field Alignment property described above, and affect descendant Data Fields.
AI Extract Table Options is a property found on Data Tables that is set by configuring its sub-properties Table Alignment and Field Alignment. The settings in the drop-down for Table Alignment mirror those seen in the Default Table Alignment property described above. The settings in the drop-down for Field Alignment mirror those seen in the Default Field Alignment property described above, and affect descendant Data Columns. Think of the "label" in this case as the column header.

How To

@@ Line 15: / Line 15: @@
 == About ==
 As data values are copied into the '''Grooper''' document, they may optionally be aligned to a location in the document content. The process of mapping AI responses back to the document allows '''Grooper''' to highlight regions on the document as users tab into the associated fields, and can be critical for efficient review of the documents. If the documents will not be reviewed by human operators after data extraction, then alignment may be unnecessary.
+Keep in mind that changing the '''''Alignment''''' will affect the length of the prompt given to the AI. This in turn will affect the tokens consumed which will impact the "cost" of the prompt in both time of processing and monetary value. Therefore no '''''Alignment''''' will cost the least and produce no highlighting, which is best when fully automating and doing no human review.
 '''''Alignment''''' is controlled by first configuring the alignment properties of the '''''AI Extract''''' '''''Fill Method'''''  with defaults for how '''Data Sections''', '''Data Tables''', and '''Data Fields''' will be aligned. After the defaults have been established, they can be overridden on individual '''Data Sections''', '''Data Tables''', and '''Data Fields''' using the '''''AI Extract Section Options''''', '''''AI Extract Table Options''''', and '''''AI Extract Field Options''''' properties on these '''Data Elements'''.
+<div style="padding-left: 1.5em;">
+=== Alignment Properties of AI Extract ===
+These properties are set within '''''AI Extract''''' on container '''Data Elements''' ('''Data Model''', '''Data Section''', '''Data Table'''). Think of these as the "global" settings for descendant elements.
+* '''''Default Field Alignment''''': This is the default alignment mode for all descendant '''Data Fields'''.
+** ''None'': Data values captured by the AI will not be aligned to the document and no highlighting will be produced.
+** ''Natural'': Data values will be aligned if their value appears verbatim in the document.
+*** For example consider a '''Data Field''' set to a '''''Value Type''''' of ''DateTime'' with a '''''Format Specifier''''' of ''d''. A date value is on a document with a format of "01/01/2001". The value in the '''Data Field''' would return as "01/01/2001" and would highlight on the document. If the value on the document was instead "January 1st, 2001" the value in the '''Data Field''' would return as "01/01/2001" and there would be no highlight.
+*** Token consumption is higher than ''None''.
+** ''Quoted'': Data values will be aligned using a quote from the document content.
+*** Consider the previous example. In this case, if the value on the document was "January 1st, 2001" and the displayed value in the '''Data Field''' was "01/01/2001" this setting would produce a result.
+*** Token consumption is higher than ''Natural''.
+** ''Labeled'': Data values will be aligned using a label and a quote from the document content.
+*** If multiple dates with similar values are on a document, this setting would allow the specific date related to a label to be returned. If not, the first occurrence of the value will always be highlighted.
+*** Token consumption is higher than ''Quoted''. This is the most expensive option, but may prove to be the most accurate for highlighting.
+* '''''Default Section Alignment''''': This is the default alignment mode for all descendant '''Data Sections'''.
+** ''None'': The section will not be aligned to the document and no highlighting will be produced.
+** ''Split'': Ask the AI to quote the first line of each section instance, and then split the input at these positions.
+*** Mainly used for Multi-instance '''Data Sections'''. Think of it like the '''Data Section''' '''''Extract Method''''' of ''Divider'' with a '''''Split Position''''' of ''Begin''. The pink bounding box drawn will start on the first line of the instance of the section, and end at the first line of the next repeated section.
+*** Token consumption is higher than ''None''.
+** ''BoundingQuotes'': Ask the AI to quote the first and last lines of the section instance.
+*** Think of it like the '''Data Section''' '''''Extract Method''''' of ''Divider'' with a '''''Split Position''''' of ''Between''. The pink bounding box drawn wills tart on the first line of the instance of the section, and end at the last line of the section.
+*** Token consumption is higher than ''Split''.
+** ''Quote'': Ask the AI to quote the entire section content.
+*** Think of it like the '''Data Section''' '''''Extract Method''''' of ''Simple''. In this case you would create an extractor that would return an entire section instance.
+*** Token consumption is higher than ''BoundingQuotes''. This is the most expensive option, but may prove to be the most accurate for highlighting.
+* '''''Default Table Alignment''''': This is the default alignment mode for all descendant '''Data Tables'''.
+** ''None'': The table will not be aligned to the document and no highlighting will be produced.
+** ''Split'': Ask the AI to quote the first line of each row, and then split the input at these positions.
+*** Useful when considering multi-line table rows, but could be functional for single line table rows.
+*** Token consumption is higher than ''None''.
+** ''BoundingQuotes'': Ask the AI to quote the first and last lines of the row.
+*** This is only really useful when considering multi-line table rows.
+*** Token consumption is higher than ''Split''.
+** ''Quote'': Ask the AI to quote the entire row.
+*** Token consumption is higher than ''BoundingQuotes''. This is the most expensive option, but may prove to be the most accurate for highlighting.
+=== Alignment Properties of Data Elements ===
+These properties are set on individual '''Data Elements'''. Think of thees as "overrides" to the "global" properties set on parent containers.
+* '''''AI Extract Field Options''''' is a property found on '''Data Fields''' that is set by configuring its sub-property '''''Field Alignment'''''. The settings found in the drop-down for '''''Field Alignment''''' mirror those seen in the '''''Default Field Alignment''''' property described above.
+* '''''AI Extract Section Options''''' is a property found on '''Data Sections''' that is set by configuring its sub-properties '''''Section Alignment''''' and '''''Field Alignment'''''. The settings in the drop-down for '''''Section Alignment''''' mirror those seen in the '''''Default Section Alignment''''' property described above. The settings in the drop-down for '''''Field Alignment''''' mirror those seen in the '''''Default Field Alignment''''' property described above, and affect descendant '''Data Fields'''.
+* '''''AI Extract Table Options''''' is a property found on '''Data Tables''' that is set by configuring its sub-properties '''''Table Alignment''''' and '''''Field Alignment'''''. The settings in the drop-down for '''''Table Alignment''''' mirror those seen in the '''''Default Table Alignment''''' property described above. The settings in the drop-down for '''''Field Alignment''''' mirror those seen in the '''''Default Field Alignment''''' property described above, and affect descendant '''Data Columns'''. Think of the "label" in this case as the column header.
+</div>
 == How To ==