Labeled Value (Value Extractor)

From Grooper Wiki

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 2023

Labeled Value is a Value Extractor that identifies and extracts a value next to a label. This is one of the most commonly used extractors to extract data from structured documents (such as a standardized form) and static values on semi-structured documents (such as the header details on an invoice).

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.

Introduction

A Labeled Value in Grooper is a method for extracting data from documents where a specific piece of information (the value) is identified by a nearby descriptive text (the label). This approach is common in forms, invoices, and other semi-structured documents, where fields such as "Invoice Total: $1,234.56" or "Date of Birth: 01/01/2000" appear.

A Label is the descriptive text that gives context to the data being extracted (for example, "Invoice Total" or "Date of Birth"). A Value is the actual data associated with that label (such as "$1,234.56" or "01/01/2000"). Labels are essential because they clarify what the value means, ensuring that extracted data is accurate and meaningful.

We use Labeled Values in Grooper to:

  • Increase accuracy by associating data with its context.
  • Simplify configuration for documents with similar data but different layouts or terminology.
  • Speed up onboarding of new document types by focusing on label-value relationships.

However, there are some drawbacks:

  • Labeled Value extraction may not work well if labels are missing or inconsistent.
  • Complex layouts or noisy documents may require additional configuration.

How to

Labeled Values work by pairing a label with its corresponding value based on their spatial relationship in the document. The Labeled Value extractor identifies candidate labels and values, then determines which pairs belong together by analyzing their proximity and layout.

For example, in a Data Field for "Invoice Total", the extractor can be configured to recognize label variants like "Invoice Amount", "Total Due", or "Amount Owed", and match them with the correct value (such as a currency amount).

Labeled Values are used in many contexts within Grooper, including:

  • Data Fields in a Data Model (e.g., extracting patient names, dates, or totals)
  • Batch Processes that require field-level data extraction
  • Automated classification and data validation

How to Configure a Labeled Value Extractor

You can use the Labeled Value extractor anywhere you can select an Extractor, but this example uses a Data Field.

  1. In Grooper Design Studio, navigate to your Data Model and select the desired Data Field.
  2. In the Data Field's property grid, find the "Value Extractor" property.
  3. Click the "☰" to the right of the property to access the drop down.
  4. Choose Labeled Value from the list of available extractors.
  5. Set the "Label Extractor" property to an extractor to match all possible label variants for your field (e.g., "Invoice Total", "Total Due"). Configure your chosen extractor to collect the labels.
    • A List Match extractor is one of the more common extractors used to extract the labels for a Labeled Value.
  6. Set the "Value Extractor" property to an extractor to match the expected value format (e.g., using a Pattern Match extractor to extract currencies). Configure your chosen extractor to collect the values.
  7. Save your changes and test extraction on sample documents.

Advanced options

The Labeled Value extractor includes advanced properties to fine-tune extraction:

Maximum Distance
Sets how far the value can be from the label (in units like inches or millimeters). This helps control which values are considered "close enough" to a label to be paired.
Maximum Noise
Sets how many unrelated characters are allowed in the region between the label and value. This helps prevent incorrect pairings in cluttered layouts.

Adjusting Maximum Distance and Maximum Noise

  1. Select your Data Field and open the Labeled Value extractor's properties.
  2. Find the "Maximum Distance" property. Enter values for left, top, right, and bottom distances (e.g., 2in for right).
  3. Find the "Maximum Noise" property. Enter the maximum number of noise characters allowed (e.g., 5).
  4. Save and test extraction. If values are missed, increase the distances or noise limit. If incorrect values are paired, decrease them.

Guidance:

  • Use smaller distances and lower noise limits for tightly grouped, clean documents.
  • Increase these settings for documents with more variation or clutter.

Labeled Value with Label Sets

What are Label Sets?

A Label Set is a mapping between Data Elements (such as Data Fields) and the text labels used to identify them on a specific document type. Label Sets enable rapid onboarding of new document types by allowing you to define which labels correspond to which fields.

Pros and cons of using Label Sets with Labeled Values

Pros:

  • Centralizes label management for each document type.
  • Makes it easy to update or add new label variants without editing extractors.
  • Supports rapid onboarding and template creation.

Cons:

  • Requires initial setup of Label Sets for each document type.
  • May not work for fields without labels or with highly variable layouts.

Step-by-step: Collecting Data Field labels for a Label Set

  1. In Grooper Design Studio, select your Content Type.
  2. Go to the "Labels" tab (visible if a Labeling Behavior is defined).
  3. For each Data Field, highlight the label text on a sample document and assign it to the field.
  4. Repeat for all fields you want to include in the Label Set.
  5. Save the Label Set.

Step-by-step: Configuring a Data Field to use a Label Set with Labeled Value

  1. Select the Data Field you want to configure.
  2. Set the "Extractor" property to Labeled Value.
  3. Leave the "Label Extractor" property blank. This tells Grooper to use the Label Set for label detection.
  4. (Optional) Leave the "Value Extractor" blank if you want to extract all content to the right or below the label, or use a static value from the Label Set.
  5. Save and test extraction.

Labeled Value summary

Labeled Values in Grooper provide a powerful way to extract data by associating labels with their corresponding values. They are flexible, support advanced options for handling complex layouts, and integrate seamlessly with Label Sets for rapid onboarding and template management.

See Also: