2021:Labeling Behavior (Behavior): Difference between revisions
Dgreenwood (talk | contribs) No edit summary |
Dgreenwood (talk | contribs) No edit summary |
||
| Line 103: | Line 103: | ||
=== Collect Label Sets === | === Collect Label Sets === | ||
<tabs style="margin:20px"> | |||
<tab name="Navigate to the Labels UI" style="margin:20px"> | |||
=== Navigate to the Labels UI === | |||
{|cellpadding=10 cellspacing=5 | |||
|valign=top style="width:40%"| | |||
Collecting labels for the '''Document Types''' in your '''Content Model''' will be the first thing you want to do after enabling the ''Labeling Behavior''. Labels for each '''Data Element''' in the '''Document Type's''' '''Data Model''' are defined using the "Labels" tab of the '''Content Model'''. | |||
# Navigate to the "Labels" tab of the '''Content Model'''. | |||
# With a '''Batch''' selected in the "Batch Selector" window panel, select a document folder. | |||
# Press the "Set Type..." button to set the '''Document Type''' whose labels you wish to collect. | |||
# This will bring up the "Set Content Type" window. | |||
# From this window, select the '''Document Type''' for the selected document folder whose labels you wish to collect. | |||
#* In this case, this document is an invoice from "Factura Technology Corp". We have selected the "Factura" '''Document Type'''. | |||
# Press "OK" to finish. | |||
{|cellpadding="10" cellspacing="5" | |||
|-style="background-color:#36b0a7; color:white" | |||
|style="font-size:14pt"|'''FYI'''||If you haven't added a '''Document Type''' for the selected document folder yet, you can use the "Create Type" button instead to both create a new '''Document Type''' and set it. | |||
|} | |||
| | |||
[[File:Labeling-behavior-about-08.png]] | |||
|- | |||
|valign=top| | |||
# Upon setting the '''Document Type''' the document folder is assigned the selected '''Document Type''' | |||
#* Or in other words, this document is now classified as a "Factura" document. | |||
# Upon setting a '''Document Type''', that '''Document Type's''' '''Data Model''' and its child '''Data Elements''' will appear in the label collection UI. | |||
#* Labels are primarily collected as they correspond to '''Data Elements''' in a '''Data Model'''. However, we will see how to add custom labels that don't correlate to a '''Data Element''' as well by the end of this tutorial. Custom labels are often used as additional features for document classification. | |||
| | |||
[[File:Labeling-behavior-about-09.png]] | |||
|} | |||
</tab> | |||
<tab name="Collect Labels" style="margin:20px"> | |||
=== Collect Labels === | |||
Now that this document has been classified (assigned a '''Document Type''' from our '''Content Model'''), we can collect labels for its '''Document Type'''. This can be done in one of two ways: | |||
# Lassoing text in the "Document Viewer" | |||
# Typing them in manually. | |||
{|cellpadding="10" cellspacing="5" | |||
|-style="background-color:#ed2330; color:white" | |||
|style="font-size:14pt"|'''❕'''||Going forward, this tutorial presumes you have obtained machine readable text from these documents, either OCR'd text or native text, via the '''Recognize''' activity. | |||
|} | |||
{|cellpadding=10 cellspacing=5 | |||
|valign=top style="width:40%"| | |||
Generally the quickest way is by simply lassoing the label in the "Document Viewer". | |||
# Select the '''Data Element''' whose label you wish to collect. | |||
#* Here, we are selecting the "Invoice Number" '''Data Field'''. | |||
# Press the "Select Region" button. | |||
# With your cursor, lasso around the text label on the document. | |||
| | |||
[[File:Labeling-behavior-about-10.png]] | |||
|- | |||
|valign=top| | |||
# Upon lassoing the label in the Document Viewer, the OCR'd or native text behind the selected region will be used to populate the '''Data Element's''' label. | |||
#* At this point, the label for the "Invoice Number" '''Data Field''' is now "Invoice Number" because that's the text data we selected. Whatever text characters you lasso with your cursor will be assigned as the label. | |||
# Notice this label also now appears in the "Header" tab below. That's because we had the Header tab selected when we lassoed the label. | |||
#* The text collected here ("Invoice Number") is the Header label for the "Invoice Number" '''Data Field'''. | |||
#* We'll talk about the difference between Header, Footer, and Static labels later. This will be important when using labels for data extraction purposes. | |||
| | |||
[[File:Labeling-behavior-about-11.png]] | |||
|- | |||
|valign=top| | |||
If you choose, you may also manually enter a label for a '''Data Element''' by simply typing it into the text box. | |||
# Here we've selected the "Purchase Order Number" '''Data Field''' and entered "PO Number". | |||
# This will correspond to the label "PO Number" on the document itself. | |||
| | |||
[[File:Labeling-behavior-about-12.png]] | |||
|- | |||
|valign=top| | |||
# Upon entering the label into the text box, just you'll see the label in the Header tab, just like we saw when we collected a label by lassoing the text on the Document Viewer. | |||
# Notice as well, there is a green checkmark next to the "Header" tab (and the box below is highlighted green). | |||
#* This means the text label is matching something on the document. If it did not, you would see a red "X" next to the Header tab and the box below would be highlighted red. | |||
# Also note, since this label is being returned on this document, we can verify it in the Document Viewer. The selected '''Data Field''' ("Purchase Order Number") and it's text label are highlighted green on the document, indicating 1) it was successfully located on the document and 2) where it was located. | |||
| | |||
[[File:Labeling-behavior-about-13.png]] | |||
|} | |||
</tab> | |||
</tabs> | |||
=== Use Label Sets for Classification === | === Use Label Sets for Classification === | ||
| Line 117: | Line 202: | ||
=== Header, Footer, and Static Labels === | === Header, Footer, and Static Labels === | ||
=== Custom Labels === | |||
=== Layout Options === | === Layout Options === | ||
Revision as of 14:18, 25 March 2021
|
2021 |
This article is in development for the upcoming version of Grooper, Grooper 2021. Labeling Behavior is a new Content Type Behavior option in 2021. This information is incomplete and/or may change by the time of release. |
The Labeling Behavior is a Content Type Behavior designed to collect and utilize a document's field labels in a variety of ways. This includes functionality for classification and data extraction.
The Labeling Behavior functionality allows Grooper users to quickly onboard new Document Types for structured and semi-structured forms, utilizing labels as a thumbprint for classification and data extraction purposes. Once the Labeling Behavior is enabled, labels are identified and collected using the "Labels" tab of Document Types. These "Label Sets" can then be used for the following purposes:
- Document classification - Using the Labelset-Based Classification Method
- Field based data extraction - Using the Labeled Value Extractor Type
- Tabular data extraction - Using a Data Table object's Tabular Layout Extract Method
- Sectional data extraction - Using a Data Section object's Transaction Detection Extract Method
About

Labels serve an important function on documents. They give the reader critical context to understand where data is located and what it means. How do you know the difference between the date on an invoice document indicating when the invoice was sent and the date indicating when you should pay the invoice? It's the labels. The labels are what distinguishes one type of date from another. For example, "Invoice Date" for the date the invoice was sent and "Due Date" for the date you need to pay by.
Labels can be a way of classifying documents as well. What does one individual label tell you about a document? Well, maybe not much. However, if you take them all together, they can tell you quite a bit about the kind of document you're looking at. For example, a W-4 employee withholding form is going to use different labels than an employee healthcare enrollment form. These are two very different documents collecting very different information. The labels used to collect this information are thus different as well.
Furthermore, you can even tell the difference between two very closely related documents using labels as well. For example, two different invoices from two different vendors may share some similarity in the labels they use to detail information. But there will be some differences as well. These differences can be useful identifiers to distinguish one from the other. Put all together, labels can act as a thumbprint Grooper can use to classify a document as one Document Type or another.
The Labeling Behavior is built on these concepts, collecting and utilizing labels for Document Types in a Content Model for classification and data extraction purposes.
|
As a Behavior, the Labeling Behavior is enabled on a Content Type object in Grooper.
|
|||
|
|||
|
Once the Labeling Behavior is enabled, the next big step is collecting label sets for the various Document Types in your Content Model.
Each Document Type has its own set of labels used to define information on the document. For example, the "Factura" Document Type in this Content Model uses the label "PO Number" to call out the purchase order number on this invoice document. A different Document Type, corresponding to a different invoice format, might use a different label such as "Purchase Order Number" or "PO #".
For more information on collecting label sets for the Document Types in your Content Model see the How To section of this article. |
|||
|
Once label sets are collected for each Document Type, they can be used for classification and data extraction purposes. For example, labels were used in this case to:
For more information on how to use labels for these purposes, see the how to section of this article. |
How To
Collect Label Sets
|
Collecting labels for the Document Types in your Content Model will be the first thing you want to do after enabling the Labeling Behavior. Labels for each Data Element in the Document Type's Data Model are defined using the "Labels" tab of the Content Model.
|
|||
|
Collect Labels
Now that this document has been classified (assigned a Document Type from our Content Model), we can collect labels for its Document Type. This can be done in one of two ways:
- Lassoing text in the "Document Viewer"
- Typing them in manually.
| ❕ | Going forward, this tutorial presumes you have obtained machine readable text from these documents, either OCR'd text or native text, via the Recognize activity. |
|
Generally the quickest way is by simply lassoing the label in the "Document Viewer".
|
|
|
|
|
If you choose, you may also manually enter a label for a Data Element by simply typing it into the text box.
|
|
|
Use Label Sets for Classification
Use Label Sets for Field Based Extraction
Use Label Sets for Tabular Extraction
Use Label Sets for Sectional Extraction
Additional Information
Include information in this section on the following topics if not able to flesh it out in the About or How To sections. And probably this section will be helpful even if you do talk about it earlier. There's space in Design Studio to detail this information in a help panel.
Custom Labels
Layout Options
Version Differences
2021
The Labeling Behavior is brand new functionality in Grooper version 2021. Prior to this version, its functionality may have been able to be approximated by other objects and their properties (For example, a Data Type using the Key-Value Pair collation is at least in some ways similar to how the Labeled Value Extractor Type works). However, creation of label sets using Document Types and their implementation described above was not available prior to version 2021.











