2023:Labeling Behavior (Behavior)

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.

The Labeling Behavior is a Content Type Behavior designed to collect and utilize a document's field labels in a variety of ways. This includes functionality for classification and data extraction.

The Labeling Behavior functionality allows Grooper users to quickly onboard new Document Types for structured and semi-structured forms, utilizing labels as a thumbprint for classification and data extraction purposes. Once the Labeling Behavior is enabled, labels are identified and collected using the "Labels" tab of Document Types. These "Label Sets" can then be used for the following purposes:

Document classification - Using the Labelset-Based Classification Method
Field based data extraction - Primarily using the Labeled Value Extractor Type
Tabular data extraction - Primarily using a Data Table object's Tabular Layout Extract Method
Sectional data extraction - Primarily using a Data Section object's Transaction Detection Extract Method

FYI

The Labeling Behavior and its functionality discussed in this article are often referred to as "Label Set Behavior" or simply "Label Sets".

About

Labels serve an important function on documents. They give the reader critical context to understand where data is located and what it means. How do you know the difference between the date on an invoice document indicating when the invoice was sent and the date indicating when you should pay the invoice? It's the labels. The labels are what distinguishes one type of date from another. For example, "Invoice Date" for the date the invoice was sent and "Due Date" for the date you need to pay by.

Labels can be a way of classifying documents as well. What does one individual label tell you about a document? Well, maybe not much. However, if you take them all together, they can tell you quite a bit about the kind of document you're looking at. For example, a W-4 employee withholding form is going to use different labels than an employee healthcare enrollment form. These are two very different documents collecting very different information. The labels used to collect this information are thus different as well.

Furthermore, you can even tell the difference between two very closely related documents using labels as well. For example, two different invoices from two different vendors may share some similarity in the labels they use to detail information. But there will be some differences as well. These differences can be useful identifiers to distinguish one from the other. Put all together, labels can act as a thumbprint Grooper can use to classify a document as one Document Type or another.

Even though these two invoices share some labels (highlighted in blue), there are others that are unique to each one (highlighted in yellow). This awareness of how one kind of invoice from one vendor uses labels differently from another can give you a method of classifying these documents using their label sets.

The Labeling Behavior is built on these concepts, collecting and utilizing labels for Document Types in a Content Model for classification and data extraction purposes.

As a Behavior, the Labeling Behavior is enabled on a Content Type object in Grooper.