2021:Labeling Behavior (Behavior): Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Line 39: Line 39:


The ''Labeling Behavior'' is built on these concepts, collecting and utilizing labels for '''Document Types''' in a '''Content Model''' for classification and data extraction purposes.
The ''Labeling Behavior'' is built on these concepts, collecting and utilizing labels for '''Document Types''' in a '''Content Model''' for classification and data extraction purposes.
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
As a '''''Behavior''''', the ''Labeling Behavior'' is enabled on a '''Content Type''' object in '''Grooper'''. 
{|cellpadding="10" cellspacing="5"
|-style="background-color:#f89420; color:white"
|style="font-size:22pt"|'''⚠'''||While you ''can'' enable ''Labeling Behavior'' on any '''Content Type''', in almost all cases, you will want to enable this '''''Behavior''''' on the '''Content Model'''.
|}
# Here, we have selected a '''Content Model''' in the Node Tree.
# To add a '''''Behavior''''', select the '''''Behaviors''''' property and press the ellipsis button at the end.
# This will bring up a dialogue window to add various behaviors to the '''Content Model''', including the ''Labeling Behavior''
# Add the ''Labeling Behavior'' using the "Add" button.
# Select ''Labeling Behavior'' from the listed options.
|
[[File:Labeling-behavior-about-04.png]]
|-
|valign=top|
# Once added, you will see a ''Labeling Behavior'' item added to the '''''Behaviors''''' list.
# Selecting the ''Labeling Behavior'' in the list, you will see property configuration options in the right panel.
#* The configuration options in the property panel pertain to [[Fuzzy RegEx|fuzzy matching]] collected labels as well as constrained and vertical wrapping capabilities to target stacked labels.
#* By default, '''Grooper''' presumes you will want to use some fuzzy matching and enable constrained and vertical wrapping.  These defaults work well for most use cases.  However, you can adjust these properties here as needed.
# Press the "OK" button to finish adding the ''Labeling Behavior'' and exit this window.
|
[[File:Labeling-behavior-about-05.png]]
|-
|valign=top|
Once the ''Labeling Behavior'' is enabled, the next big step is collecting label sets for the various '''Document Types''' in your '''Content Model'''.
# With the ''Labeling Behavior'' enabled, you will now see a "Labels" tab present for the '''Content Model'''.
#* This tab is also now present for each individual '''Document Type''' as well.
# Label sets are collected in this tab for each '''Document Type''' in the '''Content Model'''.
Each '''Document Type''' has its own set of labels used to define information on the document.  For example, the "Factura" '''Document Type''' in this '''Content Model''' uses the label "PO Number" to call out the purchase order number on this invoice document.  A different '''Document Type''', corresponding to a different invoice format, might use a different label such as "Purchase Order Number" or "PO #".
#<li value=3> Ultimately, this is the data we want to collect using the '''Content Model's''' '''Data Model'''.
# We use the "Labels" tab to collect labels corresponding to the various '''Data Elements''' ('''Data Fields''', '''Data Tables''', and '''Data Sections''') of the '''Data Model'''.
#* This provides a user interface to enter a label identifying the value you wish to collect for the '''Data Elements'''.
# For example, the label "PO Number" identifies the purchase order number for this invoice.
# Therefore, the label "PO Number" is collected for the "Purchase Order Number" '''Data Field''' in the '''Data Model'''.
For more information on collecting label sets for the '''Document Types''' in your '''Content Model''' see the [[#Collect Label Sets|How To]] section of this article.
|
[[File:Labeling-behavior-about-06.png]]
|-
|valign=top|
Once label sets are collected for each '''Document Type''', they can be used for classification and data extraction purposes.
For example, labels were used in this case to:
# Classify the document, assinging it the "Factura" '''Document Type'''.
# Extract all the '''Data Fields''' seen here, collecting field based data from the document.
# Extract the "Line Items" '''Data Table''', collecting the tabular data seen here.
For more information on how to use labels for these purposes, see the [[#How To|how to section of this article]].
|
[[File:Labeling-behavior-about-07.png]]
|}


== How To ==
== How To ==


=== Enable the Labeling Behavior and Collect Label Sets ===
=== Collect Label Sets ===


=== Use Label Sets for Classification ===
=== Use Label Sets for Classification ===

Revision as of 11:30, 25 March 2021

2021

This article is in development for the upcoming version of Grooper, Grooper 2021. Labeling Behavior is a new Content Type Behavior option in 2021. This information is incomplete and/or may change by the time of release.

The Labeling Behavior is a Content Type Behavior designed to collect and utilize a document's field labels in a variety of ways. This includes functionality for classification and data extraction.

The Labeling Behavior functionality allows Grooper users to quickly onboard new Document Types for structured and semi-structured forms, utilizing labels as a thumbprint for classification and data extraction purposes. Once the Labeling Behavior is enabled, labels are identified and collected using the "Labels" tab of Document Types. These "Label Sets" can then be used for the following purposes:

  • Document classification - Using the Labelset-Based Classification Method
  • Field based data extraction - Using the Labeled Value Extractor Type
  • Tabular data extraction - Using a Data Table object's Tabular Layout Extract Method
  • Sectional data extraction - Using a Data Section object's Transaction Detection Extract Method


About

Labels serve an important function on documents. They give the reader critical context to understand where data is located and what it means. How do you know the difference between the date on an invoice document indicating when the invoice was sent and the date indicating when you should pay the invoice? It's the labels. The labels are what distinguishes one type of date from another. For example, "Invoice Date" for the date the invoice was sent and "Due Date" for the date you need to pay by.

Labels can be a way of classifying documents as well. What does one individual label tell you about a document? Well, maybe not much. However, if you take them all together, they can tell you quite a bit about the kind of document you're looking at. For example, a W-4 employee withholding form is going to use different labels than an employee healthcare enrollment form. These are two very different documents collecting very different information. The labels used to collect this information are thus different as well.

Furthermore, you can even tell the difference between two very closely related documents using labels as well. For example, two different invoices from two different vendors may share some similarity in the labels they use to detail information. But there will be some differences as well. These differences can be useful identifiers to distinguish one from the other. Put all together, labels can act as a thumbprint Grooper can use to classify a document as one Document Type or another.

Even though these two invoices share some labels (highlighted in blue), there are others that are unique to each one (highlighted in yellow). This awareness of how one kind of invoice from one vendor uses labels differently from another can give you a method of classifying these documents using their label sets.


The Labeling Behavior is built on these concepts, collecting and utilizing labels for Document Types in a Content Model for classification and data extraction purposes.

As a Behavior, the Labeling Behavior is enabled on a Content Type object in Grooper.

While you can enable Labeling Behavior on any Content Type, in almost all cases, you will want to enable this Behavior on the Content Model.
  1. Here, we have selected a Content Model in the Node Tree.
  2. To add a Behavior, select the Behaviors property and press the ellipsis button at the end.
  3. This will bring up a dialogue window to add various behaviors to the Content Model, including the Labeling Behavior
  4. Add the Labeling Behavior using the "Add" button.
  5. Select Labeling Behavior from the listed options.

  1. Once added, you will see a Labeling Behavior item added to the Behaviors list.
  2. Selecting the Labeling Behavior in the list, you will see property configuration options in the right panel.
    • The configuration options in the property panel pertain to fuzzy matching collected labels as well as constrained and vertical wrapping capabilities to target stacked labels.
    • By default, Grooper presumes you will want to use some fuzzy matching and enable constrained and vertical wrapping. These defaults work well for most use cases. However, you can adjust these properties here as needed.
  3. Press the "OK" button to finish adding the Labeling Behavior and exit this window.

Once the Labeling Behavior is enabled, the next big step is collecting label sets for the various Document Types in your Content Model.

  1. With the Labeling Behavior enabled, you will now see a "Labels" tab present for the Content Model.
    • This tab is also now present for each individual Document Type as well.
  2. Label sets are collected in this tab for each Document Type in the Content Model.

Each Document Type has its own set of labels used to define information on the document. For example, the "Factura" Document Type in this Content Model uses the label "PO Number" to call out the purchase order number on this invoice document. A different Document Type, corresponding to a different invoice format, might use a different label such as "Purchase Order Number" or "PO #".

  1. Ultimately, this is the data we want to collect using the Content Model's Data Model.
  2. We use the "Labels" tab to collect labels corresponding to the various Data Elements (Data Fields, Data Tables, and Data Sections) of the Data Model.
    • This provides a user interface to enter a label identifying the value you wish to collect for the Data Elements.
  3. For example, the label "PO Number" identifies the purchase order number for this invoice.
  4. Therefore, the label "PO Number" is collected for the "Purchase Order Number" Data Field in the Data Model.

For more information on collecting label sets for the Document Types in your Content Model see the How To section of this article.

Once label sets are collected for each Document Type, they can be used for classification and data extraction purposes.

For example, labels were used in this case to:

  1. Classify the document, assinging it the "Factura" Document Type.
  2. Extract all the Data Fields seen here, collecting field based data from the document.
  3. Extract the "Line Items" Data Table, collecting the tabular data seen here.

For more information on how to use labels for these purposes, see the how to section of this article.


How To

Collect Label Sets

Use Label Sets for Classification

Use Label Sets for Field Based Extraction

Use Label Sets for Tabular Extraction

Use Label Sets for Sectional Extraction

Additional Information

Include information in this section on the following topics if not able to flesh it out in the About or How To sections. And probably this section will be helpful even if you do talk about it earlier. There's space in Design Studio to detail this information in a help panel.

Header, Footer, and Static Labels

Layout Options

Version Differences

2021

The Labeling Behavior is brand new functionality in Grooper version 2021. Prior to this version, its functionality may have been able to be approximated by other objects and their properties (For example, a Data Type using the Key-Value Pair collation is at least in some ways similar to how the Labeled Value Extractor Type works). However, creation of label sets using Document Types and their implementation described above was not available prior to version 2021.