Labeling Behavior (Behavior)

From Grooper Wiki
Revision as of 16:53, 15 October 2025 by Dgreenwood (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 20232021

The 2025 Label Sets Articles are still in Progress. For a more comprehensive view of Label Sets and Labeling Behavior, look at the 2023 version of the Labeling Behavior wiki article.

A Labeling Behavior extends "label set" functionality to description Document Types. This allows you to collect field labels and other labels present on a document and use them in a variety of ways. This includes functionality for classification, field extraction, table extraction, and section extraction.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2025). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

What does a Labeling Behavior do?

A Labeling Behavior is a configuration applied to a Content Type that enables Label Sets. Adding a Labeling Behavior does the following:

  • Makes Label Sets available for the Content Type and its descendants.
  • Adds a Labels tab to the Design Page for managing Label Sets.
  • Allows configuration of label and header matching options, such as similarity thresholds and fuzzy matching options.
  • Enables rapid onboarding of new Document Types by mapping document labels to Data Elements.

Why use a Labeling Behavior?

So now we know what adding a Labeling Behavior does, why would we want to add a Labeling Behavior to our model in the first place?

  • Adding a Labeling Behavior streamlines extraction from semi-structured documents where field locations and labels may vary.
  • It reduces setup time for new Document Types—simply create a new Document Type and configure the Label Sets for each type.
  • It improves extraction accuracy by allowing flexible label matching and handling of OCR or typographical errors.
  • It supports classification and extraction for a wide range of document layouts.

Where is Labeling Behavior located?

A Labeling Behavior is configured on a Content Type within the Content Model. See the following instructions on how to access it:

  1. In your node tree, select the desired Content Type (normally the Content Model itself but Label Sets can also be applied to other Content Types such as a Document Type or Content Category).
  2. In the property grid, locate the Behaviors section.
  3. Click the "+" icon and select Labeling Behavior.
  4. Configure any Labeling Behavior properties you wish to change.
  5. Click "OK".
  6. Save your changes to the Content Type.
  7. After adding, refresh the Content Type to access the Labels tab.
  8. Use the Labels tab to define one or more Label Sets, mapping document text to Data Elements.
  9. Configure the properties of Labeling Behavior as needed (see below).

Labeling Behavior properties

Each property controls how labels are matched and extracted. Adjust these to fit your document set:

"Label Similarity"
The minimum similarity required for a label match to occur. Value is between 0.01 and 1.0 (1.0 = exact match).
Why use it? Lower values allow for fuzzy matching (helpful for typos or OCR errors). Higher values require stricter matches.
Example: If your documents have consistent labels, set this to 0.95 or higher. For variable or noisy documents, use 0.8.
"Header Similarity"
The minimum similarity required for a header match (used for tables and sections). Value is between 0.01 and 1.0.
Why use it? It controls how closely table or section headers must match. Lower values help with inconsistent headers.
Example: If table headers are sometimes misspelled, set this to 0.8.
"Weightings"
Specifies weighting factors for fuzzy label and header matching. Adjusts the cost of character swaps, insertions, and deletions.
Why use it? Fine-tune matching for common OCR or typographical errors (e.g., 'O' vs. '0').
Example: If 'O' and '0' are often confused, lower the swap cost for this pair.
"Constrained Wrap"
Enables or disables constrained wrap detection for label matching. Allows labels that wrap across lines at allowed positions (spaces, hyphens).
Why use it? Useful for multi-line labels in tables or forms.
Example: "Date of
Service" can be matched as "Date of Service" if enabled.
"Vertical Wrap"
Enables or disables vertical wrap detection for label matching. Allows multi-word labels stacked vertically to be matched as a single label.
Why use it? Useful for forms or tables with vertically arranged labels.
Example:
Purchase
Order
Number
will be matched as "Purchase Order Number" if enabled.

Example scenario

Suppose you process invoices from different vendors, each using different terms for the same field (e.g., "Invoice Number", "Inv #", "Bill No."). By enabling Labeling Behavior and defining a Label Sets for each vendor, you can map all variations to a single Data Field, making extraction more reliable and reducing maintenance.

Related concepts