Vertical Wrap Detection: Difference between revisions

From Grooper Wiki
Created page with "right <blockquote style="font-size:14pt"> '''''Vertical Wrap''''' is a property of the '''Content Model''' ''Labeling Behavior..."
 
m Dgreenwood moved page Vertical Wrap (Property) to Vertical Wrap Detection without leaving a redirect
 
(15 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[File:Vertical-wrap-stacked-v-simple.png|right]]
<blockquote>{{#lst:Glossary|Vertical Wrap Detection}}</blockquote>


<blockquote style="font-size:14pt">
'''Vertical Wrap''' is a powerful extraction feature in Grooper that enables the detection and extraction of multi-line text segments that are stacked vertically within a document. This is especially useful for labels, headers, or values that are split across multiple lines due to formatting, such as table headers or field names in forms. By leveraging Vertical Wrap, Grooper can accurately group and extract text that would otherwise be missed or fragmented, improving data extraction from complex or variable document layouts.
'''''Vertical Wrap''''' is a property of the '''Content Model''' ''[[Labeling Behavior]]'' and certain '''Extractor Types''' used to provide simplified extraction of stacked labels.
</blockquote>


== About ==
== What is vertical wrap for? ==


Stacked labels are simply multi-word labels whose words are aligned vertically on multiple lines. In other words, they are "stacked" on top of each other.  You can contrast this with simple labels which appear on a single line of the document.
Vertical Wrap is designed to address scenarios where important information—such as field labels or table headers—is presented in a vertical arrangement. For example, a label like "Purchase Order Number" may appear as:


In the before times (before version 2021), stacked labels presented somewhat of a challenge.  For simple labels, the approach is, well, simple. We use regular expression to match the label. Do you want to match the label "Zip Code"?  Your regex pattern is simply <code>Zip Code</code>.
{|style="margin:auto" cellpadding=10
<br clear = all>
|
<pre>
Purchase
Order
Number
</pre>
|
or
|
<pre>
Purchase
Order Number
</pre>
|
or
|
<pre>
Purchase Order
Number
</pre>
|
or
|
<pre>
Purchase Order Number
</pre>
|}
 
Vertical Wrap allows Grooper to recognize these stacked words as a single logical label or value, ensuring that extraction rules remain robust even when document formatting varies.
 
== How does Vertical Wrap Detection work? ==
 
When Vertical Wrap is enabled, Grooper analyzes the spatial relationship between lines of text. It groups vertically adjacent text segments based on configurable criteria, such as:
 
* '''Maximum Line Spacing''': Controls how far apart vertically adjacent lines can be and still be grouped.
* '''Alignment''': Ensures that only lines with matching horizontal alignment (left, center, right, or any combination) are grouped.
* '''Alignment Tolerance''': Allows for minor misalignments due to scanning or OCR inaccuracies.
* '''Allow Horizontal Rule''': Permits horizontal lines (such as underlines or table rules) between wrapped lines.
* '''Allow Vertical Rule''': Permits vertical lines (such as table cell borders) between wrapped segments.
 
Once grouped, the combined text is compared to the set of search terms or values defined in the extractor’s vocabulary. If a match is found, the value is extracted as a single entity.
 
== Where is Vertical Wrap Detection used in Grooper? ==
 
Vertical Wrap Detection is an available option for the following Grooper Value Extractors and Behaviors:
* '''[[Labeling Behavior]]'''
** When applied to a Content Type, this Behavior enables label-driven extraction and provides centralized configuration for label matching parameters, including Vertical Wrap.
** Various "label set enabled" functionality utilizes its Vertical Wrap settings.
* '''[[List Match]]''':
** This Value Extractor is used to extract values from document text that match any entry in a list of search terms.
** Vertical Wrap enables detection of terms split across multiple lines, such as stacked field labels and column headers in tabular data.
* '''[[Label Match]]'''
** This Value Extractor works identically to List Match but inherits the Fuzzy Matching, Vertical Wrap, and Constrained Wrap settings configured in the document's Labeling Behavior.
**Vertical Wrap enables detection of terms split across multiple lines, such as stacked field labels and column headers in tabular data.


{|cellpadding=10 cellspacing=5
'''Vertical Wrap Detection is enabled and configured for these items using their "Vertical Wrap" property.'''
|valign=top style="width:40%"|
However, for stacked labels, it's a little trickier.  Regular expression matches a regex pattern against the entire document as one big text string.  By itself, it doesn't have the capability to match labels stacked on top of each other because it just matches against the text flow character by character.


Instead, we had to use a '''Data Type''', collated as an ''Ordered Array'', using the '''''Vertical Layout''''' mode, looking for each line of the stacked label as the array elements, and usually specifying some minimum distance between the  words in the label to throw out false positive results.
=== How to configure Vertical Wrap Detection ===


You can see here an example of how this was done.
Vertical Wrap Detection is configured through the "Vertical Wrap" property in supported Value Extractors and Behaviors.
# This is the parent '''Data Type''' (also the object we have selected in the Node Tree).
# The two child extractors return the results of each line.
# The '''Data Type''' is configured to use the ''Ordered Array'' option for its '''''Collation''''', enabling '''''Vertical Layout''''' mode.
# The '''Data Type''' returns the label, looking for the word "ZIP" stacked on top of "CODE".


To enable Vertical Wrap Detection:


Seems like a lot of work to find the label "ZIP CODE", right?
# Select an item that can use Vertical Wrap Detection (such as a [[List Match]], [[Label Match]], or [[Labeling Behavior]] configuration).
# Locate the "Vertical Wrap" property.
# Enable it.


Starting in version '''2021''', there is a much easier way of doing this through the '''''Vertical Wrap''''' property.
In most cases, the default settings work well. However, several configurable properties are available to better target edge cases.
|
 
[[File:Vertical-wrap-about-01.png]]
Vertical Wrap Detection settings include:
|}
 
* "Maximum Line Spacing": Set as a percentage of font size to control vertical grouping.
* '''Alignment''': Choose required horizontal alignment for grouped lines.
* '''Alignment Tolerance''': Specify tolerance in inches for alignment analysis.
* '''Allow Horizontal Rule''': Enable or disable horizontal dividers between lines.
* '''Allow Vertical Rule''': Enable or disable vertical dividers between segments.
 
==Tips==
* Vertical Wrap Detection is used on structured and semi-structured documents to capture short text segments that span multiple lines..
* '''''You should not use Vertical Wrap Detection on unstructured documents or paragraphs. It is not designed to work for these scenarios. Use [[Paragraph Detection]] instead.'''''
* Use Vertical Wrap Detection for extracting multi-line labels, table headers, or any field that may be split vertically.
* Adjust "Maximum Line Spacing", "Alignment", and "Alignment Tolerance" settings to fine-tune which lines are grouped.
* Enable or disable Horizontal and Vertical Rule options based on the presence of lines between wrapped text segments.
* Test extraction with representative samples to verify matching results and adjust settings as needed.
 
== Summary ==
 
Vertical Wrap is an essential feature for extracting multi-line, vertically arranged text in Grooper. By configuring Vertical Wrap options in supported extractors and behaviors, users can ensure accurate and consistent data extraction from documents with complex or variable layouts.

Latest revision as of 15:16, 10 September 2025

Vertical Wrap Detection enables simplified extraction of multi-line text segments that are stacked vertically within a document. Vertical Wrap Detection can be used by Content Types configured with a Labeling Behavior and by the List Match and Label Match Value Extractors.

  • "Vertical Wrap Detection" is the embedded object that actually performs wrap detection in Grooper. Vertical Wrap Detection is enabled and configured with the "Vertical Wrap" property found in configuration items that support it.

Vertical Wrap is a powerful extraction feature in Grooper that enables the detection and extraction of multi-line text segments that are stacked vertically within a document. This is especially useful for labels, headers, or values that are split across multiple lines due to formatting, such as table headers or field names in forms. By leveraging Vertical Wrap, Grooper can accurately group and extract text that would otherwise be missed or fragmented, improving data extraction from complex or variable document layouts.

What is vertical wrap for?

Vertical Wrap is designed to address scenarios where important information—such as field labels or table headers—is presented in a vertical arrangement. For example, a label like "Purchase Order Number" may appear as:

Purchase
Order
Number

or

Purchase
Order Number

or

Purchase Order
Number

or

Purchase Order Number

Vertical Wrap allows Grooper to recognize these stacked words as a single logical label or value, ensuring that extraction rules remain robust even when document formatting varies.

How does Vertical Wrap Detection work?

When Vertical Wrap is enabled, Grooper analyzes the spatial relationship between lines of text. It groups vertically adjacent text segments based on configurable criteria, such as:

  • Maximum Line Spacing: Controls how far apart vertically adjacent lines can be and still be grouped.
  • Alignment: Ensures that only lines with matching horizontal alignment (left, center, right, or any combination) are grouped.
  • Alignment Tolerance: Allows for minor misalignments due to scanning or OCR inaccuracies.
  • Allow Horizontal Rule: Permits horizontal lines (such as underlines or table rules) between wrapped lines.
  • Allow Vertical Rule: Permits vertical lines (such as table cell borders) between wrapped segments.

Once grouped, the combined text is compared to the set of search terms or values defined in the extractor’s vocabulary. If a match is found, the value is extracted as a single entity.

Where is Vertical Wrap Detection used in Grooper?

Vertical Wrap Detection is an available option for the following Grooper Value Extractors and Behaviors:

  • Labeling Behavior
    • When applied to a Content Type, this Behavior enables label-driven extraction and provides centralized configuration for label matching parameters, including Vertical Wrap.
    • Various "label set enabled" functionality utilizes its Vertical Wrap settings.
  • List Match:
    • This Value Extractor is used to extract values from document text that match any entry in a list of search terms.
    • Vertical Wrap enables detection of terms split across multiple lines, such as stacked field labels and column headers in tabular data.
  • Label Match
    • This Value Extractor works identically to List Match but inherits the Fuzzy Matching, Vertical Wrap, and Constrained Wrap settings configured in the document's Labeling Behavior.
    • Vertical Wrap enables detection of terms split across multiple lines, such as stacked field labels and column headers in tabular data.

Vertical Wrap Detection is enabled and configured for these items using their "Vertical Wrap" property.

How to configure Vertical Wrap Detection

Vertical Wrap Detection is configured through the "Vertical Wrap" property in supported Value Extractors and Behaviors.

To enable Vertical Wrap Detection:

  1. Select an item that can use Vertical Wrap Detection (such as a List Match, Label Match, or Labeling Behavior configuration).
  2. Locate the "Vertical Wrap" property.
  3. Enable it.

In most cases, the default settings work well. However, several configurable properties are available to better target edge cases.

Vertical Wrap Detection settings include:

  • "Maximum Line Spacing": Set as a percentage of font size to control vertical grouping.
  • Alignment: Choose required horizontal alignment for grouped lines.
  • Alignment Tolerance: Specify tolerance in inches for alignment analysis.
  • Allow Horizontal Rule: Enable or disable horizontal dividers between lines.
  • Allow Vertical Rule: Enable or disable vertical dividers between segments.

Tips

  • Vertical Wrap Detection is used on structured and semi-structured documents to capture short text segments that span multiple lines..
  • You should not use Vertical Wrap Detection on unstructured documents or paragraphs. It is not designed to work for these scenarios. Use Paragraph Detection instead.
  • Use Vertical Wrap Detection for extracting multi-line labels, table headers, or any field that may be split vertically.
  • Adjust "Maximum Line Spacing", "Alignment", and "Alignment Tolerance" settings to fine-tune which lines are grouped.
  • Enable or disable Horizontal and Vertical Rule options based on the presence of lines between wrapped text segments.
  • Test extraction with representative samples to verify matching results and adjust settings as needed.

Summary

Vertical Wrap is an essential feature for extracting multi-line, vertically arranged text in Grooper. By configuring Vertical Wrap options in supported extractors and behaviors, users can ensure accurate and consistent data extraction from documents with complex or variable layouts.