Labeled Value (Value Extractor): Difference between revisions

From Grooper Wiki
// via Wikitext Extension for VSCode
// via Wikitext Extension for VSCode
Line 84: Line 84:
=== What are Label Sets? ===
=== What are Label Sets? ===


A [[Label Set]] is a mapping between Data Elements (such as Data Fields) and the text labels used to identify them on a specific document type. Label Sets enable rapid onboarding of new document types by allowing you to define which labels correspond to which fields.
A [[Label Sets|Label Set]] is a mapping between Data Elements (such as Data Fields) and the text labels used to identify them on a specific document type. Label Sets enable rapid onboarding of new document types by allowing you to define which labels correspond to which fields.


=== Pros and cons of using Label Sets with Labeled Values ===
=== Pros and cons of using Label Sets with Labeled Values ===
Line 99: Line 99:
=== Step-by-step: Collecting Data Field labels for a Label Set ===
=== Step-by-step: Collecting Data Field labels for a Label Set ===


# In Grooper Design Studio, select your Content Type.
Make sure you have a [[Labeling Behavior]] set on your Content Type, otherwise you will not have access to the "Labels" tab.
 
# Set the Data Field's Value Extractor property to a Labeled Value.
# Leave the Label Extractor and Value Extractor for the Labeled Value blank for now.
# In your Node Tree, select your Content Type.
# Go to the "Labels" tab (visible if a Labeling Behavior is defined).
# Go to the "Labels" tab (visible if a Labeling Behavior is defined).
# For each Data Field, highlight the label text on a sample document and assign it to the field.
# Make sure each document in your Batch is classified.
# For each Data Field, use one of three methods to collect the label:
#* Type in the label text from the document into the Label text box in the Labels panel.
#* Make sure your cursor is inside the Label text box and double click on the label on the document.
#* Make sure your cursor is inside the Label text box, click the Rubberband icon at the top of the Labels panel, and draw a box around the label you want to collect.  
# Repeat for all fields you want to include in the Label Set.
# Repeat for all fields you want to include in the Label Set.
# Save the Label Set.
# Save the Label Set.
# Repeat for each document in your Batch.
<div style="position: relative; box-sizing: content-box; max-height: 80vh; max-height: 80svh; width: 100%; aspect-ratio: 1.7777777777777777; padding: 40px 0 40px 0;"><iframe src="https://app.supademo.com/embed/cmgs1s3e1036yx20i25mcdf5n?embed_v=2&utm_source=embed" loading="lazy" title="03 Collecting Labels" allow="clipboard-write" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>


=== Step-by-step: Configuring a Data Field to use a Label Set with Labeled Value ===
=== Step-by-step: Configuring a Data Field to use a Label Set with Labeled Value ===


# Select the Data Field you want to configure.
# Select the Data Field you want to configure.
# Set the "Extractor" property to '''Labeled Value'''.
# Set the "Value Extractor" property to '''Labeled Value''' if not already set.
#* This should have previously been set when collecting the labels, but everything in this section can be done before the labels are collected.
# Leave the "Label Extractor" property blank. This tells Grooper to use the Label Set for label detection.
# Leave the "Label Extractor" property blank. This tells Grooper to use the Label Set for label detection.
# (Optional) Leave the "Value Extractor" blank if you want to extract all content to the right or below the label, or use a static value from the Label Set.
# Set a Value Extractor on the Labeled Value to return the type of data you want extracted.
#* Optionally, you can leave the "Value Extractor" blank if you want to extract all content to the right or below the label. It is recommended to always use a Value Extractor when possible though.
# Save and test extraction.
# Save and test extraction.
<div style="position: relative; box-sizing: content-box; max-height: 80vh; max-height: 80svh; width: 100%; aspect-ratio: 1.7777777777777777; padding: 40px 0 40px 0;"><iframe src="https://app.supademo.com/embed/cmgs46sid01jxyh0ig16hwl6s?embed_v=2&utm_source=embed" loading="lazy" title="04 Configuring the Labeled Value" allow="clipboard-write" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>


== Labeled Value summary ==
== Labeled Value summary ==

Revision as of 10:12, 15 October 2025

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 2023

Labeled Value is a Value Extractor that identifies and extracts a value next to a label. This is one of the most commonly used extractors to extract data from structured documents (such as a standardized form) and static values on semi-structured documents (such as the header details on an invoice).

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.

Introduction

A Labeled Value in Grooper is a method for extracting data from documents where a specific piece of information (the value) is identified by a nearby descriptive text (the label). This approach is common in forms, invoices, and other semi-structured documents, where fields such as "Invoice Total: $1,234.56" or "Date of Birth: 01/01/2000" appear.

A Label is the descriptive text that gives context to the data being extracted (for example, "Invoice Total" or "Date of Birth"). A Value is the actual data associated with that label (such as "$1,234.56" or "01/01/2000"). Labels are essential because they clarify what the value means, ensuring that extracted data is accurate and meaningful.

We use Labeled Values in Grooper to:

  • Increase accuracy by associating data with its context.
  • Simplify configuration for documents with similar data but different layouts or terminology.
  • Speed up onboarding of new document types by focusing on label-value relationships.

However, there are some drawbacks:

  • Labeled Value extraction may not work well if labels are missing or inconsistent.
  • Complex layouts or noisy documents may require additional configuration.

How to

Labeled Values work by pairing a label with its corresponding value based on their spatial relationship in the document. The Labeled Value extractor identifies candidate labels and values, then determines which pairs belong together by analyzing their proximity and layout.

For example, in a Data Field for "Invoice Total", the extractor can be configured to recognize label variants like "Invoice Amount", "Total Due", or "Amount Owed", and match them with the correct value (such as a currency amount).

Labeled Values are used in many contexts within Grooper, including:

  • Data Fields in a Data Model (e.g., extracting patient names, dates, or totals)
  • Batch Processes that require field-level data extraction
  • Automated classification and data validation

How to Configure a Labeled Value Extractor

You can use the Labeled Value extractor anywhere you can select an Extractor, but this example uses a Data Field.

  1. In Grooper Design Studio, navigate to your Data Model and select the desired Data Field.
  2. In the Data Field's property grid, find the "Value Extractor" property.
  3. Click the "☰" to the right of the property to access the drop down.
  4. Choose Labeled Value from the list of available extractors.
  5. Set the "Label Extractor" property to an extractor to match all possible label variants for your field (e.g., "Invoice Total", "Total Due"). Configure your chosen extractor to collect the labels.
    • A List Match extractor is one of the more common extractors used to extract the labels for a Labeled Value.
  6. Set the "Value Extractor" property to an extractor to match the expected value format (e.g., using a Pattern Match extractor to extract currencies). Configure your chosen extractor to collect the values.
  7. Save your changes and test extraction on sample documents.

Advanced options

The Labeled Value extractor includes advanced properties to fine-tune extraction:

Maximum Distance
Sets how far the value can be from the label (in units like inches or millimeters). This helps control which values are considered "close enough" to a label to be paired.
Maximum Noise
Sets how many unrelated characters are allowed in the region between the label and value. This helps prevent incorrect pairings in cluttered layouts.

Adjusting Maximum Distance and Maximum Noise

  1. Select your object in the Node Tree with the Labeled Value extractor set on it.
  2. Expand the Labeled Value extractor's sub properties.
  3. Find the "Maximum Distance" property and expand its sub properties.
    • Edit the values for left, top, right, and bottom distances as needed for your extraction. The number will need to be given in inches (e.g., 3in for 3 inches).
  4. Find the "Maximum Noise" property. Edit the maximum number of noise characters allowed as needed for your extraction (e.g., 5).
  5. Save and test extraction. If values are missed or incorrect values are paired with a label, try adjusting these two properties.

Guidance:

  • Use smaller distances and lower noise limits for tightly grouped, clean documents.
  • Increase these settings for documents with more variation or clutter.

In the example below, we go through several examples of how both the Maximum Distance and Maximum Noise properties can be configured to return values.

Labeled Value with Label Sets

What are Label Sets?

A Label Set is a mapping between Data Elements (such as Data Fields) and the text labels used to identify them on a specific document type. Label Sets enable rapid onboarding of new document types by allowing you to define which labels correspond to which fields.

Pros and cons of using Label Sets with Labeled Values

Pros:

  • Centralizes label management for each document type.
  • Makes it easy to update or add new label variants without editing extractors.
  • Supports rapid onboarding and template creation.

Cons:

  • Requires initial setup of Label Sets for each document type.
  • May not work for fields without labels or with highly variable layouts.

Step-by-step: Collecting Data Field labels for a Label Set

Make sure you have a Labeling Behavior set on your Content Type, otherwise you will not have access to the "Labels" tab.

  1. Set the Data Field's Value Extractor property to a Labeled Value.
  2. Leave the Label Extractor and Value Extractor for the Labeled Value blank for now.
  3. In your Node Tree, select your Content Type.
  4. Go to the "Labels" tab (visible if a Labeling Behavior is defined).
  5. Make sure each document in your Batch is classified.
  6. For each Data Field, use one of three methods to collect the label:
    • Type in the label text from the document into the Label text box in the Labels panel.
    • Make sure your cursor is inside the Label text box and double click on the label on the document.
    • Make sure your cursor is inside the Label text box, click the Rubberband icon at the top of the Labels panel, and draw a box around the label you want to collect.
  7. Repeat for all fields you want to include in the Label Set.
  8. Save the Label Set.
  9. Repeat for each document in your Batch.

Step-by-step: Configuring a Data Field to use a Label Set with Labeled Value

  1. Select the Data Field you want to configure.
  2. Set the "Value Extractor" property to Labeled Value if not already set.
    • This should have previously been set when collecting the labels, but everything in this section can be done before the labels are collected.
  3. Leave the "Label Extractor" property blank. This tells Grooper to use the Label Set for label detection.
  4. Set a Value Extractor on the Labeled Value to return the type of data you want extracted.
    • Optionally, you can leave the "Value Extractor" blank if you want to extract all content to the right or below the label. It is recommended to always use a Value Extractor when possible though.
  5. Save and test extraction.

Labeled Value summary

Labeled Values in Grooper provide a powerful way to extract data by associating labels with their corresponding values. They are flexible, support advanced options for handling complex layouts, and integrate seamlessly with Label Sets for rapid onboarding and template management.

See Also: