Vertical Wrap Detection: Difference between revisions

From Grooper Wiki
No edit summary
m Dgreenwood moved page Vertical Wrap (Property) to Vertical Wrap Detection without leaving a redirect
 
(12 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[File:Vertical-wrap-stacked-v-simple.png|right]]
<blockquote>{{#lst:Glossary|Vertical Wrap Detection}}</blockquote>


<blockquote style="font-size:14pt">
'''Vertical Wrap''' is a powerful extraction feature in Grooper that enables the detection and extraction of multi-line text segments that are stacked vertically within a document. This is especially useful for labels, headers, or values that are split across multiple lines due to formatting, such as table headers or field names in forms. By leveraging Vertical Wrap, Grooper can accurately group and extract text that would otherwise be missed or fragmented, improving data extraction from complex or variable document layouts.
'''''Vertical Wrap''''' is a property of the '''Content Model''' ''[[Labeling Behavior]]'' and certain '''Extractor Types''' used to provide simplified extraction of stacked labels.
</blockquote>


== About ==
== What is vertical wrap for? ==


Stacked labels are simply multi-word labels whose words are aligned vertically on multiple lines. In other words, they are "stacked" on top of each other.  You can contrast this with simple labels which appear on a single line of the document.
Vertical Wrap is designed to address scenarios where important information—such as field labels or table headers—is presented in a vertical arrangement. For example, a label like "Purchase Order Number" may appear as:


In the before times (before version '''2021'''), stacked labels presented somewhat of a challenge.  For simple labels, the approach is, well, simple.  We use regular expression to match the label.  Do you want to match the label "ZIP CODE"?  Your regex pattern is simply <code>ZIP CODE</code>.
{|style="margin:auto" cellpadding=10
<br clear = all>
|
<pre>
Purchase
Order
Number
</pre>
|
or
|
<pre>
Purchase
Order Number
</pre>
|
or
|
<pre>
Purchase Order
Number
</pre>
|
or
|
<pre>
Purchase Order Number
</pre>
|}


{|cellpadding=10 cellspacing=5
Vertical Wrap allows Grooper to recognize these stacked words as a single logical label or value, ensuring that extraction rules remain robust even when document formatting varies.
|valign=top style="width:40%"|
However, for stacked labels, it's a little trickier.  Regular expression matches a regex pattern against the entire document as one big text string.  By itself, it doesn't have the capability to match labels stacked on top of each other because it just matches against the text flow character by character.


Instead, we had to use a '''Data Type''', collated as an ''Ordered Array'', using the '''''Vertical Layout''''' mode, looking for each line of the stacked label as the array elements, and usually specifying some minimum distance between the  words in the label to throw out false positive results.
== How does Vertical Wrap Detection work? ==


You can see here an example of how this was done.
When Vertical Wrap is enabled, Grooper analyzes the spatial relationship between lines of text. It groups vertically adjacent text segments based on configurable criteria, such as:
# This is the parent '''Data Type''' (also the object we have selected in the Node Tree).
# The two child extractors return the results of each line.
# The '''Data Type''' is configured to use the ''Ordered Array'' option for its '''''Collation''''', enabling '''''Vertical Layout''''' mode.
# The '''Data Type''' returns the label, looking for the word "ZIP" stacked on top of "CODE".


* '''Maximum Line Spacing''': Controls how far apart vertically adjacent lines can be and still be grouped.
* '''Alignment''': Ensures that only lines with matching horizontal alignment (left, center, right, or any combination) are grouped.
* '''Alignment Tolerance''': Allows for minor misalignments due to scanning or OCR inaccuracies.
* '''Allow Horizontal Rule''': Permits horizontal lines (such as underlines or table rules) between wrapped lines.
* '''Allow Vertical Rule''': Permits vertical lines (such as table cell borders) between wrapped segments.


Seems like a lot of work to find the label "ZIP CODE", right?
Once grouped, the combined text is compared to the set of search terms or values defined in the extractor’s vocabulary. If a match is found, the value is extracted as a single entity.


Starting in version '''2021''', there is a much easier way of doing this through the '''''Vertical Wrap''''' property.
== Where is Vertical Wrap Detection used in Grooper? ==
|
 
[[File:Vertical-wrap-about-01.png]]
Vertical Wrap Detection is an available option for the following Grooper Value Extractors and Behaviors:
|}
* '''[[Labeling Behavior]]'''
** When applied to a Content Type, this Behavior enables label-driven extraction and provides centralized configuration for label matching parameters, including Vertical Wrap.
** Various "label set enabled" functionality utilizes its Vertical Wrap settings.
* '''[[List Match]]''':
** This Value Extractor is used to extract values from document text that match any entry in a list of search terms.
** Vertical Wrap enables detection of terms split across multiple lines, such as stacked field labels and column headers in tabular data.
* '''[[Label Match]]'''
** This Value Extractor works identically to List Match but inherits the Fuzzy Matching, Vertical Wrap, and Constrained Wrap settings configured in the document's Labeling Behavior.
**Vertical Wrap enables detection of terms split across multiple lines, such as stacked field labels and column headers in tabular data.
 
'''Vertical Wrap Detection is enabled and configured for these items using their "Vertical Wrap" property.'''
 
=== How to configure Vertical Wrap Detection ===
 
Vertical Wrap Detection is configured through the "Vertical Wrap" property in supported Value Extractors and Behaviors.


Currently, the '''''Vertical Wrap''''' property is accessible at two points in '''Grooper'''.
To enable Vertical Wrap Detection:
# When using the ''List Match'' '''''Extractor Type'''''.
# When using the '''Content Model''' ''Labeling Behavior''.


<tabs style="margin:20px">
# Select an item that can use Vertical Wrap Detection (such as a [[List Match]], [[Label Match]], or [[Labeling Behavior]] configuration).
<tab name="Vertical Wrap and List Match" style="margin:20px">
# Locate the "Vertical Wrap" property.
=== Vertical Wrap and List Match ===
# Enable it.
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
At any point you can use the ''List Match'' '''''Extractor Type''''' you can enable vertical wrapping.


# Here, we've created and selected a '''Value Reader'''.
In most cases, the default settings work well. However, several configurable properties are available to better target edge cases.
# We've set its '''''Extractor Type''''' to ''List Match''.
# We have a single label in our '''''Local Entries''''' list of labels, <code>ZIP CODE</code>
# As you can see, it returns the simple label.
# However, it does not return the stacked label yet.
|
[[File:Vertical-wrap-about-01.png]]
|-
|valign=top|
We can get ''both'' the simple and stacked label to match using the '''''Vertical Wrap''''' property.  For the ''List Match'' '''''Extractor Type''''', vertical wrapping is enabled using the '''''Vertical Wrap''''' property in the "Properties" tab.


# Navigate to the "Properties" tab.
Vertical Wrap Detection settings include:
# Change the '''''Vertical Wrap''''' property from ''Disabled'' to ''Enabled''.
#* This property is found under the '''''Options''''' property heading.
# Now both the simple and stacked labels are matched and returned!
|
[[File:Vertical-wrap-about-02.png]]
|}
</tab>
<tab name="Labeling Behavior and Vertical Wrap" style="margin:20px">
=== Labeling Behavior and Vertical Wrap ===


{|cellpadding="10" cellspacing="5"
* "Maximum Line Spacing": Set as a percentage of font size to control vertical grouping.
|-style="background-color:#36b0a7; color:white"
* '''Alignment''': Choose required horizontal alignment for grouped lines.
|style="font-size:14pt"|'''FYI'''||''Labeling Behavior'' is a '''Content Type''' '''''Behavior''''' that utilizes a document's labels for a variety of document processing purposes. For more information on the ''Labeling Behavior'' functionality, visit the [[Labeling Behavior]] article
* '''Alignment Tolerance''': Specify tolerance in inches for alignment analysis.
|}
* '''Allow Horizontal Rule''': Enable or disable horizontal dividers between lines.
* '''Allow Vertical Rule''': Enable or disable vertical dividers between segments.


'''''Vertical Wrap''''' is enabled by default when adding the ''Labeling Behavior'' to a '''Content Model'''.
==Tips==
* Vertical Wrap Detection is used on structured and semi-structured documents to capture short text segments that span multiple lines..
* '''''You should not use Vertical Wrap Detection on unstructured documents or paragraphs. It is not designed to work for these scenarios. Use [[Paragraph Detection]] instead.'''''
* Use Vertical Wrap Detection for extracting multi-line labels, table headers, or any field that may be split vertically.
* Adjust "Maximum Line Spacing", "Alignment", and "Alignment Tolerance" settings to fine-tune which lines are grouped.
* Enable or disable Horizontal and Vertical Rule options based on the presence of lines between wrapped text segments.
* Test extraction with representative samples to verify matching results and adjust settings as needed.


# Here, we've added the ''Labeling Behavior'' to our '''Content Model''' using its '''''Behaviors''''' property.
== Summary ==
# As you can see, the '''''Vertical Wrap''''' property is ''Enabled'' by default.
|
[[File:Vertical-wrap-about-03.png]]
|-
|valign=top|


</tab>
Vertical Wrap is an essential feature for extracting multi-line, vertically arranged text in Grooper. By configuring Vertical Wrap options in supported extractors and behaviors, users can ensure accurate and consistent data extraction from documents with complex or variable layouts.
</tabs>

Latest revision as of 15:16, 10 September 2025

Vertical Wrap Detection enables simplified extraction of multi-line text segments that are stacked vertically within a document. Vertical Wrap Detection can be used by Content Types configured with a Labeling Behavior and by the List Match and Label Match Value Extractors.

  • "Vertical Wrap Detection" is the embedded object that actually performs wrap detection in Grooper. Vertical Wrap Detection is enabled and configured with the "Vertical Wrap" property found in configuration items that support it.

Vertical Wrap is a powerful extraction feature in Grooper that enables the detection and extraction of multi-line text segments that are stacked vertically within a document. This is especially useful for labels, headers, or values that are split across multiple lines due to formatting, such as table headers or field names in forms. By leveraging Vertical Wrap, Grooper can accurately group and extract text that would otherwise be missed or fragmented, improving data extraction from complex or variable document layouts.

What is vertical wrap for?

Vertical Wrap is designed to address scenarios where important information—such as field labels or table headers—is presented in a vertical arrangement. For example, a label like "Purchase Order Number" may appear as:

Purchase
Order
Number

or

Purchase
Order Number

or

Purchase Order
Number

or

Purchase Order Number

Vertical Wrap allows Grooper to recognize these stacked words as a single logical label or value, ensuring that extraction rules remain robust even when document formatting varies.

How does Vertical Wrap Detection work?

When Vertical Wrap is enabled, Grooper analyzes the spatial relationship between lines of text. It groups vertically adjacent text segments based on configurable criteria, such as:

  • Maximum Line Spacing: Controls how far apart vertically adjacent lines can be and still be grouped.
  • Alignment: Ensures that only lines with matching horizontal alignment (left, center, right, or any combination) are grouped.
  • Alignment Tolerance: Allows for minor misalignments due to scanning or OCR inaccuracies.
  • Allow Horizontal Rule: Permits horizontal lines (such as underlines or table rules) between wrapped lines.
  • Allow Vertical Rule: Permits vertical lines (such as table cell borders) between wrapped segments.

Once grouped, the combined text is compared to the set of search terms or values defined in the extractor’s vocabulary. If a match is found, the value is extracted as a single entity.

Where is Vertical Wrap Detection used in Grooper?

Vertical Wrap Detection is an available option for the following Grooper Value Extractors and Behaviors:

  • Labeling Behavior
    • When applied to a Content Type, this Behavior enables label-driven extraction and provides centralized configuration for label matching parameters, including Vertical Wrap.
    • Various "label set enabled" functionality utilizes its Vertical Wrap settings.
  • List Match:
    • This Value Extractor is used to extract values from document text that match any entry in a list of search terms.
    • Vertical Wrap enables detection of terms split across multiple lines, such as stacked field labels and column headers in tabular data.
  • Label Match
    • This Value Extractor works identically to List Match but inherits the Fuzzy Matching, Vertical Wrap, and Constrained Wrap settings configured in the document's Labeling Behavior.
    • Vertical Wrap enables detection of terms split across multiple lines, such as stacked field labels and column headers in tabular data.

Vertical Wrap Detection is enabled and configured for these items using their "Vertical Wrap" property.

How to configure Vertical Wrap Detection

Vertical Wrap Detection is configured through the "Vertical Wrap" property in supported Value Extractors and Behaviors.

To enable Vertical Wrap Detection:

  1. Select an item that can use Vertical Wrap Detection (such as a List Match, Label Match, or Labeling Behavior configuration).
  2. Locate the "Vertical Wrap" property.
  3. Enable it.

In most cases, the default settings work well. However, several configurable properties are available to better target edge cases.

Vertical Wrap Detection settings include:

  • "Maximum Line Spacing": Set as a percentage of font size to control vertical grouping.
  • Alignment: Choose required horizontal alignment for grouped lines.
  • Alignment Tolerance: Specify tolerance in inches for alignment analysis.
  • Allow Horizontal Rule: Enable or disable horizontal dividers between lines.
  • Allow Vertical Rule: Enable or disable vertical dividers between segments.

Tips

  • Vertical Wrap Detection is used on structured and semi-structured documents to capture short text segments that span multiple lines..
  • You should not use Vertical Wrap Detection on unstructured documents or paragraphs. It is not designed to work for these scenarios. Use Paragraph Detection instead.
  • Use Vertical Wrap Detection for extracting multi-line labels, table headers, or any field that may be split vertically.
  • Adjust "Maximum Line Spacing", "Alignment", and "Alignment Tolerance" settings to fine-tune which lines are grouped.
  • Enable or disable Horizontal and Vertical Rule options based on the presence of lines between wrapped text segments.
  • Test extraction with representative samples to verify matching results and adjust settings as needed.

Summary

Vertical Wrap is an essential feature for extracting multi-line, vertically arranged text in Grooper. By configuring Vertical Wrap options in supported extractors and behaviors, users can ensure accurate and consistent data extraction from documents with complex or variable layouts.