Vertical Wrap Detection

From Grooper Wiki
(Redirected from Vertical Wrap)

Vertical Wrap Detection enables simplified extraction of multi-line text segments that are stacked vertically within a document. Vertical Wrap Detection can be used by Content Types configured with a Labeling Behavior and by the List Match and Label Match Value Extractors.

  • "Vertical Wrap Detection" is the embedded object that actually performs wrap detection in Grooper. Vertical Wrap Detection is enabled and configured with the "Vertical Wrap" property found in configuration items that support it.

Vertical Wrap is a powerful extraction feature in Grooper that enables the detection and extraction of multi-line text segments that are stacked vertically within a document. This is especially useful for labels, headers, or values that are split across multiple lines due to formatting, such as table headers or field names in forms. By leveraging Vertical Wrap, Grooper can accurately group and extract text that would otherwise be missed or fragmented, improving data extraction from complex or variable document layouts.

What is vertical wrap for?

Vertical Wrap is designed to address scenarios where important information—such as field labels or table headers—is presented in a vertical arrangement. For example, a label like "Purchase Order Number" may appear as:

Purchase
Order
Number

or

Purchase
Order Number

or

Purchase Order
Number

or

Purchase Order Number

Vertical Wrap allows Grooper to recognize these stacked words as a single logical label or value, ensuring that extraction rules remain robust even when document formatting varies.

How does Vertical Wrap Detection work?

When Vertical Wrap is enabled, Grooper analyzes the spatial relationship between lines of text. It groups vertically adjacent text segments based on configurable criteria, such as:

  • Maximum Line Spacing: Controls how far apart vertically adjacent lines can be and still be grouped.
  • Alignment: Ensures that only lines with matching horizontal alignment (left, center, right, or any combination) are grouped.
  • Alignment Tolerance: Allows for minor misalignments due to scanning or OCR inaccuracies.
  • Allow Horizontal Rule: Permits horizontal lines (such as underlines or table rules) between wrapped lines.
  • Allow Vertical Rule: Permits vertical lines (such as table cell borders) between wrapped segments.

Once grouped, the combined text is compared to the set of search terms or values defined in the extractor’s vocabulary. If a match is found, the value is extracted as a single entity.

Where is Vertical Wrap Detection used in Grooper?

Vertical Wrap Detection is an available option for the following Grooper Value Extractors and Behaviors:

  • Labeling Behavior
    • When applied to a Content Type, this Behavior enables label-driven extraction and provides centralized configuration for label matching parameters, including Vertical Wrap.
    • Various "label set enabled" functionality utilizes its Vertical Wrap settings.
  • List Match:
    • This Value Extractor is used to extract values from document text that match any entry in a list of search terms.
    • Vertical Wrap enables detection of terms split across multiple lines, such as stacked field labels and column headers in tabular data.
  • Label Match
    • This Value Extractor works identically to List Match but inherits the Fuzzy Matching, Vertical Wrap, and Constrained Wrap settings configured in the document's Labeling Behavior.
    • Vertical Wrap enables detection of terms split across multiple lines, such as stacked field labels and column headers in tabular data.

Vertical Wrap Detection is enabled and configured for these items using their "Vertical Wrap" property.

How to configure Vertical Wrap Detection

Vertical Wrap Detection is configured through the "Vertical Wrap" property in supported Value Extractors and Behaviors.

To enable Vertical Wrap Detection:

  1. Select an item that can use Vertical Wrap Detection (such as a List Match, Label Match, or Labeling Behavior configuration).
  2. Locate the "Vertical Wrap" property.
  3. Enable it.

In most cases, the default settings work well. However, several configurable properties are available to better target edge cases.

Vertical Wrap Detection settings include:

  • "Maximum Line Spacing": Set as a percentage of font size to control vertical grouping.
  • Alignment: Choose required horizontal alignment for grouped lines.
  • Alignment Tolerance: Specify tolerance in inches for alignment analysis.
  • Allow Horizontal Rule: Enable or disable horizontal dividers between lines.
  • Allow Vertical Rule: Enable or disable vertical dividers between segments.

Tips

  • Vertical Wrap Detection is used on structured and semi-structured documents to capture short text segments that span multiple lines..
  • You should not use Vertical Wrap Detection on unstructured documents or paragraphs. It is not designed to work for these scenarios. Use Paragraph Detection instead.
  • Use Vertical Wrap Detection for extracting multi-line labels, table headers, or any field that may be split vertically.
  • Adjust "Maximum Line Spacing", "Alignment", and "Alignment Tolerance" settings to fine-tune which lines are grouped.
  • Enable or disable Horizontal and Vertical Rule options based on the presence of lines between wrapped text segments.
  • Test extraction with representative samples to verify matching results and adjust settings as needed.

Summary

Vertical Wrap is an essential feature for extracting multi-line, vertically arranged text in Grooper. By configuring Vertical Wrap options in supported extractors and behaviors, users can ensure accurate and consistent data extraction from documents with complex or variable layouts.