Label Sets (Functionality): Difference between revisions

Revision as of 10:51, 15 October 2025

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

⚠	The 2025 Label Sets Articles are still in Progress. For a more comprehensive view of Label Sets and Labeling Behavior, look at the 2023 version of the Labeling Behavior wiki article.

Label Sets are collections of label definitions used in Grooper to identify and extract information from documents. A label set maps document text—such as field names, headers, or column titles—to corresponding Data Field, Data Section, or Data Table elements in the Data Model. Label sets are essential for automating extraction and classification, especially in environments where document layouts and terminology may vary.

What are Label Sets?

A Label Set is a group of labels associated with a specific Document Type. Each label represents a possible way a data element might be named or presented in a document. For example, a Label Set for invoices might include "Invoice Number", "Inv #", and "Bill No.", all mapped to the same Data Field. Label Sets are managed using the "Labels" tab on the Design Page for any Content Type with Labeling Behavior enabled.

Label Sets are best for structured and semi-structured documents, where data elements are consistently labeled, but layouts may vary. For example, invoices from different vendors may use different terms for the same field ("Invoice Number", "Inv #", "Bill No."), but all can be mapped to a single Data Field using a Label Set.

Label Sets are not suitable for unstructured documents.

Why use Label Sets?

Label Sets provide several important benefits:

Rapid onboarding: New document types can be supported quickly by creating a new Label Set on a new Document Type, without changing extraction logic.
Consistency: Ensures uniform extraction and classification across documents, even when layouts or terminology differ.
Flexibility: Supports multiple label variations for the same data element, accommodating differences between vendors or formats.
Scalability: Enables a single designer to create hundreds of templates efficiently.

However, there are some drawbacks:

Maintenance: Label Sets must be updated as document layouts or business requirements change.
Limitations with unstructured documents: Label Sets do not work with unstructured documents, where data is present without identifiable labels. In these cases, custom extraction logic is required.

Examples

An accounts payable process may use a Label Set containing "Vendor Name", "Invoice Date", and "Amount Due" for field extraction.
A medical record extraction may use a Label Set with "Patient Name", "Date of Birth", and "Diagnosis".
Table extraction might use column headers like "Item", "Quantity", and "Price" mapped to a Data Table.

What can we use Label Sets for?

Label Sets are used for a variety of extraction and classification tasks:

Field extraction
Table extraction
Section extraction
Document classification

Each of these use cases leverages Label Sets to improve accuracy and reduce manual configuration. For more details, see the linked articles.

How to guides

@@ Line 45: / Line 45: @@
 == How to guides ==
 * [[Labeling Behavior]]
-* [[Labeled Value with Label Sets]]
+* [[Labeled Value (Value Extractor)#Labeled Value With Label Sets|Labeled Value with Label Sets]]
 * [[Labeled OMR with Label Sets]]
 * [[Tabular Layout with Label Sets]]