Label Sets (Functionality)

From Grooper Wiki

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

The 2025 Label Sets Articles are still in Progress. For a more comprehensive view of Label Sets and Labeling Behavior, look at the 2023 version of the Labeling Behavior wiki article.

Label Sets are collections of label definitions used in Grooper to identify and extract information from documents. A label set maps document text—such as field names, headers, or column titles—to corresponding Data Field, Data Section, or Data Table elements in the Data Model. Label sets are essential for automating extraction and classification, especially in environments where document layouts and terminology may vary.

What are Label Sets?

A Label Set is a group of labels associated with a specific Document Type. Each label represents a possible way a data element might be named or presented in a document. For example, a Label Set for invoices might include "Invoice Number", "Inv #", and "Bill No.", all mapped to the same Data Field.

Label Sets are managed using the "Labels" tab on the Design Page for any Content Type with Labeling Behavior enabled.


Label Sets are best for structured and semi-structured documents, where data elements are consistently labeled, but layouts may vary. For example, invoices from different vendors may use different terms for the same field ("Invoice Number", "Inv #", "Bill No."), but all can be mapped to a single Data Field using a Label Set.

Label Sets are not suitable for unstructured documents.

FYI

The term "Label Sets" does not refer to a specific node or property in Grooper, but rather is used as an umbrella term to refer to the different uses of the Labeling Behavior in Grooper.

Why use Label Sets?

Label Sets provide several important benefits:

  • Rapid onboarding: New document types can be supported quickly by creating a new Label Set on a new Document Type, without changing extraction logic.
  • Consistency: Ensures uniform extraction and classification across documents, even when layouts or terminology differ.
  • Flexibility: Supports multiple label variations for the same data element, accommodating differences between vendors or formats.
  • Scalability: Enables a single designer to create hundreds of templates efficiently.

However, there are some drawbacks:

  • Maintenance: Label Sets must be updated as document layouts or business requirements change.
  • Limitations with unstructured documents: Label Sets do not work with unstructured documents, where data is present without identifiable labels. In these cases, custom extraction logic is required.

Examples

  • An accounts payable process may use a Label Set containing "Vendor Name", "Invoice Date", and "Amount Due" for field extraction.
  • A medical record extraction may use a Label Set with "Patient Name", "Date of Birth", and "Diagnosis".
  • Table extraction might use column headers like "Item", "Quantity", and "Price" mapped to a Data Table.

What can we use Label Sets for?

Label Sets are used for a variety of extraction and classification tasks:

  • Field extraction
  • Table extraction
  • Section extraction
  • Document classification

Each of these use cases leverages Label Sets to improve accuracy and reduce manual configuration.

How to collect labels

To use Label Sets, you must first collect Labels for your Data Model. This process involves selecting the text on sample documents that corresponds to each Data Field, Data Table, or Data Section. You must also be using a "Label Set Aware" extractor on your Data Element. The term "Label Set Aware" just means that the extractor recognizes and can use Label Sets in its extraction.

Examples of "Label Set Aware" extractors include:

  • Labeled Value
  • Labeled OMR
  • Tabular Layout
  • Row Match
  • Transaction Detection
  • Nested Tables

Collecting Labels for Data Fields

  1. Navigate to the Data Field you want to use Label Sets with.
  2. Set a "Label Set Aware" extractor on the Value Extractor property.
    • You must do this first in order to collect labels for the Data Field.
  3. In your Node Tree, navigate to the Content Type with the Labeling Behavior applied.
  4. Go to the "Labels" tab (visible when the Labeling Behavior is enabled).
  5. If not already classified, assign a Document Type to each document in your Batch.
  6. Select the first document in your Batch.
  7. Select the Data Field for which you want to collect a label and use one of three methods to collect the label:
    • Type in the text of the label into the text box of the Data Field.
    • Click inside of the Data Field's text box so the cursor appears, then double click the label on the document. Make sure to correct or add any missing parts of the label.
    • Click inside of the Data Field's text box so the cursor appears. Click the Rubberband icon at the top of the Labels panel. Draw a box around the label on the document. (This is the recommended method for collecting labels.)
  8. Repeat for each document with a different Document Type.

Collecting Labels for Data Tables

  1. Navigate to the Data Table node in your Node Tree.
  2. Set the Value Extractor property to a Label Set Aware extractor such as Tabular Layout or Row Match.
  3. Navigate to the Content Type the Labeling Behavior is set on.
  4. Click on the "Labels" tab.
  5. Make sure the documents in your Batch have been assigned a Document Type.
  6. Click inside the text box next to the Data Table node in the Labels Panel so the cursor is inside the text box.
  7. Click the Rubberband icon at the top of the Labels panel, and draw a box on the document around the full header label of the table. This should include all the headers for your columns.
  8. Next, click inside the text box next to the first Data Column so the cursor is inside the text box.
  9. Click the Rubberband icon at the top of the Labels panel, and draw a box on the document around the individual column header for the Data Column.
  10. Repeat the process for the rest of the Data Columns.


How to guides

For more information on how to use Label Sets, see the linked How To articles below.