Labeled Value (Value Extractor)

From Grooper Wiki

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 2023

Labeled Value is a Value Extractor that identifies and extracts a value next to a label. This is one of the most commonly used extractors to extract data from structured documents (such as a standardized form) and static values on semi-structured documents (such as the header details on an invoice).

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2025). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

Introduction

A Labeled Value in Grooper is a method for extracting data from documents where a specific piece of information (the value) is identified by a nearby descriptive text (the label). This approach is common in forms, invoices, and other semi-structured documents, where fields such as "Invoice Total: $1,234.56" or "Date of Birth: 01/01/2000" appear.

A Label is the descriptive text that gives context to the data being extracted (for example, "Invoice Total" or "Date of Birth"). A Value is the actual data associated with that label (such as "$1,234.56" or "01/01/2000"). Labels clarify what the value means, ensuring that extracted data is accurate and meaningful.

We use Labeled Values in Grooper to:

  • Increase accuracy by associating data with its context.
  • Simplify configuration for documents with similar data but different layouts or terminology.
  • Speed up onboarding of new Document Types by focusing on label-value relationships.

However, there are some drawbacks:

  • Labeled Value extraction may not work well if labels are missing or inconsistent.
  • Complex layouts or noisy documents may require additional configuration.

How to

Labeled Values work by pairing a label with its corresponding value based on their spatial relationship in the document. The Labeled Value extractor identifies candidate labels and values, then determines which pairs belong together by analyzing their proximity and layout.

For example, in a Data Field for "Invoice Total", the extractor can be configured to recognize label variants like "Invoice Amount", "Total Due", or "Amount Owed", and match them with the correct value (such as a currency amount).

Labeled Values are used in many contexts within Grooper, including:

  • Data Fields in a Data Model (e.g., extracting patient names, dates, or totals)
  • Batch Processes that require field-level data extraction
  • Automated classification and data validation

How to Configure a Labeled Value Extractor

You can use the Labeled Value extractor anywhere you can select an Extractor, but this example uses a Data Field.

  1. In your Node Tree, navigate to your Data Model and select the desired Data Field.
  2. In the Data Field's property grid, find the "Value Extractor" property.
  3. Click the "☰" to the right of the property to access the drop down.
  4. Choose Labeled Value from the list of available extractors.
  5. Expand out the Labeled Value's sub properties.
  6. Set the "Label Extractor" property to an extractor to match all possible label variants for your field (e.g., "Invoice Total", "Total Due"). Configure your chosen extractor to collect the labels.
    • A List Match extractor is one of the more common extractors used to extract the labels for a Labeled Value.
  7. Set the "Value Extractor" property to an extractor to match the expected value format (e.g., using a Pattern Match extractor to extract currencies). Configure your chosen extractor to collect the values.
  8. Save your changes and test extraction on sample documents.

Advanced options

The Labeled Value extractor includes advanced properties to fine-tune extraction:

Maximum Distance
Sets how far the value can be from the label (in inches). This helps control which values are considered "close enough" to a label to be paired.
Maximum Noise
Sets how many unrelated alphanumeric characters are allowed in the region between the label and value. This helps prevent incorrect pairings in cluttered layouts.

Adjusting Maximum Distance and Maximum Noise

  1. Select your object in the Node Tree with the Labeled Value extractor set on it.
  2. Expand the Labeled Value extractor's sub properties.
  3. Find the "Maximum Distance" property and expand its sub properties.
    • Edit the values for Left, Top, Right, and Bottom distances as needed for your extraction. The number will need to be given in inches (e.g., 3in for 3 inches).
  4. Find the "Maximum Noise" property. Edit the maximum number of noise characters allowed as needed for your extraction (e.g., 5).
  5. Save and test extraction. If values are missed or incorrect values are paired with a label, try adjusting these two properties further.

Guidance:

  • Use smaller distances and lower noise limits for tightly grouped, clean documents.
  • Increase these settings for documents with more variation or clutter.

In the example below, we go through several examples of how both the Maximum Distance and Maximum Noise properties can be configured to return values.

Labeled Value with Label Sets

What are Label Sets?

A Label Set is a mapping between Data Elements (such as Data Fields) and the text labels used to identify them on a specific Document Type. Label Sets enable rapid onboarding of new Document Types by allowing you to define which labels correspond to which fields.

For more information on Label Sets, visit our Label Sets wiki article.

Pros and cons of using Label Sets with Labeled Values

Pros:

  • Centralizes label management for each Document Type.
  • Makes it easy to update or add new label variants without editing extractors.
  • Supports rapid onboarding and template creation.

Cons:

  • Requires multiple Document Types, one for each label difference.
  • Requires initial setup of Label Sets for each Document Type.
  • Will not work with unstructured documents without labels.

Step-by-step: Collecting Data Field labels for a Label Set

Make sure you have a Labeling Behavior set on your Content Type, otherwise you will not have access to the "Labels" tab.

  1. Set the Data Field's Value Extractor property to a Labeled Value.
  2. Leave the Label Extractor and Value Extractor for the Labeled Value blank for now.
  3. In your Node Tree, select your Content Type.
  4. Go to the "Labels" tab (visible if a Labeling Behavior is defined).
  5. Make sure each document in your Batch is classified.
  6. For each Data Field, use one of three methods to collect the label:
    • Type in the label text from the document into the Label text box in the Labels panel.
    • Make sure your cursor is inside the Label text box and double click on the label on the document.
    • Make sure your cursor is inside the Label text box, click the Rubberband icon at the top of the Labels panel, and draw a box around the label you want to collect.
      • The third option is recommended as it avoids missing parts of the label or typos when entering in the information.
      • If there is an error in your labels, a red thumbs down icon will appear next to the collected label.
  7. Repeat for all Data Fields you want to include in the Label Set.
  8. Save the Label Set.
  9. Repeat for each document in your Batch.

Step-by-step: Configuring a Data Field to use a Label Set with Labeled Value

Once you have your labels collected, you can then configure your Labeled Value the same way as you would any Labeled Value extractor with one difference. You do not have to set a Label Extractor! We already have that covered with our Label Set.

  1. Select the Data Field you want to configure.
  2. Set the "Value Extractor" property to Labeled Value if not already set.
    • This should have previously been set when collecting the labels, but everything in this section can be done before the labels are collected.
  3. Leave the "Label Extractor" property blank. This tells Grooper to use the Label Set for label detection.
  4. Set a Value Extractor on the Labeled Value to return the type of data you want extracted.
    • Optionally, you can leave the "Value Extractor" blank if you want to extract all content to the right or below the label. It is recommended to always use a Value Extractor when possible though.
  5. Save and test extraction.

Labeled Value summary

Labeled Values in Grooper provide a powerful way to extract data by associating labels with their corresponding values. They are flexible, support advanced options for handling complex layouts, and integrate seamlessly with Label Sets for rapid onboarding and template management.

See Also: