Label Sets (Functionality): Difference between revisions

From Grooper Wiki
// via Wikitext Extension for VSCode
// via Wikitext Extension for VSCode
 
(One intermediate revision by the same user not shown)
Line 173: Line 173:


<div style="position: relative; box-sizing: content-box; max-height: 80vh; max-height: 80svh; width: 100%; aspect-ratio: 1.7777777777777777; padding: 40px 0 40px 0;"><iframe src="https://app.supademo.com/embed/cmgzc0hme00ocwv0isjdyvi2q?embed_v=2&utm_source=embed" loading="lazy" title="02 Collecting Labels for Data Tables" allow="clipboard-write" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
<div style="position: relative; box-sizing: content-box; max-height: 80vh; max-height: 80svh; width: 100%; aspect-ratio: 1.7777777777777777; padding: 40px 0 40px 0;"><iframe src="https://app.supademo.com/embed/cmgzc0hme00ocwv0isjdyvi2q?embed_v=2&utm_source=embed" loading="lazy" title="02 Collecting Labels for Data Tables" allow="clipboard-write" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>
=== Footer Labels ===
Sometimes when extracting Data Tables from documents using Label Sets, Grooper might detect extra rows of a table past where the table ends. We can use footer labels to tell Grooper where to stop looking for information and minimize the potential for false positive results.
# Navigate to the Content Type where your Labeling Behavior is configured.
# Click over to the "Labels" tab.
# For adding a Footer Label to a Data Table, click inside of the text box of the Data Table label.
# Click on the Add a New Label icon at the top of the Labels panel.
# Click on "Add Footer".
# Click inside the text box for the Footer Label.
# Use your preferred method to collect the Footer Label on the document. It is recommended to use the rubber band method.
#* You can use any text segment on the document for a Footer Label. Choose something that comes close after the end of the table on the document.
# Save your changes.
# Test your extraction.
<div style="position: relative; box-sizing: content-box; max-height: 80vh; max-height: 80svh; width: 100%; aspect-ratio: 1.7777777777777777; padding: 40px 0 40px 0;"><iframe src="https://app.supademo.com/embed/cmhch7jkt004x300hgs69q01u?embed_v=2&utm_source=embed" loading="lazy" title="05 The Footer Label" allow="clipboard-write" frameborder="0" webkitallowfullscreen="true" mozallowfullscreen="true" allowfullscreen style="position: absolute; top: 0; left: 0; width: 100%; height: 100%;"></iframe></div>


{|class="attn-box"
{|class="attn-box"

Latest revision as of 07:35, 31 October 2025

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

The 2025 Label Sets Articles are still in Progress. For a more comprehensive view of Label Sets and Labeling Behavior, look at the 2023 version of the Labeling Behavior wiki article.

Label Sets are collections of label definitions used in Grooper to identify and extract information from documents. A label set maps document text—such as field names, headers, or column titles—to corresponding Data Field, Data Section, or Data Table elements in the Data Model. Label sets are essential for automating extraction and classification, especially in environments where document layouts and terminology may vary.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2025). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

What are Label Sets?

A Label Set is a group of labels associated with a specific Document Type. Each label represents a possible way a data element might be named or presented in a document. For example, a Label Set for invoices might include "Invoice Number", "Inv #", and "Bill No.", all mapped to the same Data Field.

Label Sets are managed using the "Labels" tab on the Design Page for any Content Type with Labeling Behavior enabled.


Label Sets are best for structured and semi-structured documents, where data elements are consistently labeled, but layouts may vary. For example, invoices from different vendors may use different terms for the same field ("Invoice Number", "Inv #", "Bill No."), but all can be mapped to a single Data Field using a Label Set.

Label Sets are not suitable for unstructured documents.

FYI

The term "Label Sets" does not refer to a specific node or property in Grooper, but rather is used as an umbrella term to refer to the different uses of the Labeling Behavior in Grooper.

Why use Label Sets?

Label Sets provide several important benefits:

  • Rapid onboarding: New document types can be supported quickly by creating a new Label Set on a new Document Type, without changing extraction logic.
  • Consistency: Ensures uniform extraction and classification across documents, even when layouts or terminology differ.
  • Flexibility: Supports multiple label variations for the same data element, accommodating differences between vendors or formats.
  • Scalability: Enables a single designer to create hundreds of templates efficiently.

However, there are some drawbacks:

  • Maintenance: Label Sets must be updated as document layouts or business requirements change.
  • Limitations with unstructured documents: Label Sets do not work with unstructured documents, where data is present without identifiable labels. In these cases, custom extraction logic is required.

Examples

  • An accounts payable process may use a Label Set containing "Vendor Name", "Invoice Date", and "Amount Due" for field extraction.
  • A medical record extraction may use a Label Set with "Patient Name", "Date of Birth", and "Diagnosis".
  • Table extraction might use column headers like "Item", "Quantity", and "Price" mapped to a Data Table.

What can we use Label Sets for?

Label Sets are used for a variety of extraction and classification tasks:

  • Field extraction
  • Table extraction
  • Section extraction
  • Document classification

Each of these use cases leverages Label Sets to improve accuracy and reduce manual configuration.

How to collect labels

To use Label Sets, you must first collect Labels for your Data Model. This process involves selecting the text on sample documents that corresponds to each Data Field, Data Table, or Data Section. You must also be using a "Label Set Aware" extractor on your Data Element. The term "Label Set Aware" just means that the extractor recognizes and can use Label Sets in its extraction.

Examples of "Label Set Aware" extractors include:

To access the "Labels" tab of the Content Type, you must first configure a Labeling Behavior on that Content Type. For more information on how to do this, visit the Labeling Behavior wiki page.

Collecting Labels for Data Fields

When collecting labels for Data Fields, you generally are going to collect a single label for each Data Field in your Data Model. There are times you may need to collect additional labels called custom labels to help narrow down your extraction. How to collect these labels is detailed below.

  1. Navigate to the Data Field you want to use Label Sets with.
  2. Set a "Label Set Aware" extractor on the Value Extractor property.
    • You must do this first in order to collect labels for the Data Field.
  3. In your Node Tree, navigate to the Content Type with the Labeling Behavior applied.
  4. Go to the "Labels" tab (visible when the Labeling Behavior is enabled).
  5. If not already classified, assign a Document Type to each document in your Batch.
  6. Select the first document in your Batch.
  7. Select the Data Field for which you want to collect a label and use one of three methods to collect the label:
    • Type in the text of the label into the text box of the Data Field.
    • Click inside of the Data Field's text box so the cursor appears, then double click the label on the document. Make sure to correct or add any missing parts of the label.
    • Click inside of the Data Field's text box so the cursor appears. Click the Rubberband icon at the top of the Labels panel. Draw a box around the label on the document. (This is the recommended method for collecting labels.)
  8. Repeat for each document with a different Document Type.

Custom Labels

Custom Labels are user-defined labels that do not necessarily map directly to a Data Element, but are needed for extraction, classification, or to handle special cases. They are helpful for providing additional context or narrowing the scope of where Grooper should look to find labels and data.

Custom Labels as Parent Labels

A Parent Label allows you to define a hierarchical relationship between labels, requiring a label in your Label Set to occur inside the specified Parent Label. A Parent Label also ONLY allows other labels that have a Parent Label set to look inside of that Parent for a result. If a label does not have a Parent Label set, it will only look for label matches outside of the Parent Label text.

FYI

Custom Labels can also be used to help with classification by providing alternate or synthetic labels for Document Types. This is useful when documents do not have consistent labels, or when you need to support multiple naming conventions. For more information, visit the Labelset-Based Classification wiki article.

When to use Parent Labels:

  • When you have two or more labels that use the same text and need to give more context to return the correct label, and thus the correct value.
  • When the text used as the label appears identically multiple times in the document. Context must be given as to which label should be returned.

How to assign a Parent Label:

  1. Navigate to the Content Type in your Node Tree that has the Labeling Behavior configured on it.
  2. Click over to the "Labels" tab.
  3. Assuming Data Field Labels have already been collected, click inside the Data Model text box in the Labels panel, making sure your cursor is inside the box.
  4. Click the Add Label icon at the top of the Labels panel.
  5. In the popup, enter in a name for your label and click the "Add Custom" button.
    • It is advised to use a name that describes what the Custom Label will be used for.
  6. Use your preferred method to collect the custom label from the document. The Rubber Band method is recommended for accuracy.
  7. Locate the label you want to set the Custom Label as the Parent of. Click the blue thumbs up icon next to the label.
  8. In the pop up, locate the Parent property.
  9. Click the "☰" to the right of the Parent property to access the drop down and select the desired Custom Label.
  10. Click "Save".
  11. Click the save icon at the top of the Labels panel to save the changes made to the Label Set.
  12. Test your extraction to determine if your labels are collecting the right values.


Static Labels

A Static Label is a label that collects a static value for a Data Element. Static Labels are helpful for structured documents with a piece of information that will always be the same for every document assigned that Document Type. Examples of this can include a company name or phone number.

How to assign a Static Label:

  1. Select a Content Type that has been configured with a Labeling Behavior.
  2. Click on the "Labels" tab.
  3. Navigate to the Data Element you want to assign the static label to and click inside its text box in the Labels panel.
  4. Click on the "Add a Label" icon at the top of the Labels panel.
  5. On the pop up, click "Add Static".
  6. A new label will appear under the Data Element called "Static".
  7. Make sure your cursor is inside the Static label text box and collect the value on the document you want to be extracted for the Data Element using your preferred method.
    • It is recommended to use the rubber band method of collecting the label.
  8. The text box directly to the right of the Data Element should remain blank.
  9. Click the save icon at the top of the Labels panel to save your changes.
  10. Test your extraction.

Collecting Labels for Data Tables

While similar to collecting labels for Data Fields, there are some differences to keep in mind when collecting labels for Data Tables and Data Columns. First, while not absolutely necessary, it is considered best practice to collect a full Header Label for your Data Table as well as collecting the headers for each of the Data Columns individually.

The Header Label on the Data Table acts as a parent label to the Data Columns. This helps minimize false positives for your table extraction.

  1. Navigate to the Data Table node in your Node Tree.
  2. Set the Value Extractor property to a Label Set Aware extractor such as Tabular Layout or Row Match.
  3. Navigate to the Content Type the Labeling Behavior is set on.
  4. Click on the "Labels" tab.
  5. Make sure the documents in your Batch have been assigned a Document Type.
  6. Click inside the text box next to the Data Table node in the Labels Panel so the cursor is inside the text box.
  7. Click the Rubberband icon at the top of the Labels panel, and draw a box on the document around the full header label of the table. This should include all the headers for your columns.
  8. Next, click inside the text box next to the first Data Column so the cursor is inside the text box.
  9. Click the Rubberband icon at the top of the Labels panel, and draw a box on the document around the individual column header for the Data Column.
  10. Repeat the process for the rest of the Data Columns.

Footer Labels

Sometimes when extracting Data Tables from documents using Label Sets, Grooper might detect extra rows of a table past where the table ends. We can use footer labels to tell Grooper where to stop looking for information and minimize the potential for false positive results.

  1. Navigate to the Content Type where your Labeling Behavior is configured.
  2. Click over to the "Labels" tab.
  3. For adding a Footer Label to a Data Table, click inside of the text box of the Data Table label.
  4. Click on the Add a New Label icon at the top of the Labels panel.
  5. Click on "Add Footer".
  6. Click inside the text box for the Footer Label.
  7. Use your preferred method to collect the Footer Label on the document. It is recommended to use the rubber band method.
    • You can use any text segment on the document for a Footer Label. Choose something that comes close after the end of the table on the document.
  8. Save your changes.
  9. Test your extraction.

There is more to consider when setting up Table Extraction using Label Sets. For more information on how to properly extract information with Tables using Label Sets, visit our Tabular Layout and Row Match articles.

Additional how to guides