2021:Labeling Behavior (Behavior): Difference between revisions
Dgreenwood (talk | contribs) |
Dgreenwood (talk | contribs) |
||
| Line 450: | Line 450: | ||
While the data we want from these documents is the same, there is some variation in the labels used for each different document type. If we wanted to distinguish these four documents from each other by classifying using the ''Label Match'' '''''Classification Method'''''. This is all done measuring the similarity between the collected label sets for each '''Document Type'''. | While the data we want from these documents is the same, there is some variation in the labels used for each different document type. If we wanted to distinguish these four documents from each other by classifying using the ''Label Match'' '''''Classification Method'''''. This is all done measuring the similarity between the collected label sets for each '''Document Type'''. | ||
How is '''Document Type''' "B" different from '''Document Type''' "A"? It uses the label <code>SSN:</code> instead of <code>Social Security Number:</code>. | How is '''Document Type''' "B" different from '''Document Type''' "A"? | ||
* It uses the label <code>SSN:</code> instead of <code>Social Security Number:</code>. | |||
How is '''Document Type''' "D" different from '''Document Type''' "A"? It uses the labels <code>SSN:</code> instead of <code>Social Security Number:</code>, <code>DOB:</code> instead of <code>Date of Birth:</code>, and <code>Phone #:</code> instead of <code>Phone Number</code>. | How is '''Document Type''' "C" different from '''Document Type''' "A"? | ||
* It uses the labels <code>SSN:</code> instead of <code>Social Security Number:</code> and <code>DOB:</code> instead of <code>Date of Birth:</code>. | |||
How is '''Document Type''' "D" different from '''Document Type''' "A"? | |||
* It uses the labels <code>SSN:</code> instead of <code>Social Security Number:</code>, <code>DOB:</code> instead of <code>Date of Birth:</code>, and <code>Phone #:</code> instead of <code>Phone Number</code>. | |||
| | | | ||
{| | {| | ||
Revision as of 10:24, 29 April 2021
|
2021 |
This article is in development for the upcoming version of Grooper, Grooper 2021. Labeling Behavior is a new Content Type Behavior option in 2021. This information is incomplete and/or may change by the time of release. |
The Labeling Behavior is a Content Type Behavior designed to collect and utilize a document's field labels in a variety of ways. This includes functionality for classification and data extraction.
The Labeling Behavior functionality allows Grooper users to quickly onboard new Document Types for structured and semi-structured forms, utilizing labels as a thumbprint for classification and data extraction purposes. Once the Labeling Behavior is enabled, labels are identified and collected using the "Labels" tab of Document Types. These "Label Sets" can then be used for the following purposes:
- Document classification - Using the Labelset-Based Classification Method
- Field based data extraction - Using the Labeled Value Extractor Type
- Tabular data extraction - Using a Data Table object's Tabular Layout Extract Method
- Sectional data extraction - Using a Data Section object's Transaction Detection Extract Method
FYI: The Labeling Behavior and its functionality discussed in this article are often referred to as "Label Set Behavior" or simply "Label Sets".
About
| ⍗ |
You may download and import the file below into your own Grooper environment (version 2021). This contains the Batch(es) with the example document(s) discussed in this article and the Content Model(s) configured according to the How To section's instructions. |

Labels serve an important function on documents. They give the reader critical context to understand where data is located and what it means. How do you know the difference between the date on an invoice document indicating when the invoice was sent and the date indicating when you should pay the invoice? It's the labels. The labels are what distinguishes one type of date from another. For example, "Invoice Date" for the date the invoice was sent and "Due Date" for the date you need to pay by.
Labels can be a way of classifying documents as well. What does one individual label tell you about a document? Well, maybe not much. However, if you take them all together, they can tell you quite a bit about the kind of document you're looking at. For example, a W-4 employee withholding form is going to use different labels than an employee healthcare enrollment form. These are two very different documents collecting very different information. The labels used to collect this information are thus different as well.
Furthermore, you can even tell the difference between two very closely related documents using labels as well. For example, two different invoices from two different vendors may share some similarity in the labels they use to detail information. But there will be some differences as well. These differences can be useful identifiers to distinguish one from the other. Put all together, labels can act as a thumbprint Grooper can use to classify a document as one Document Type or another.
The Labeling Behavior is built on these concepts, collecting and utilizing labels for Document Types in a Content Model for classification and data extraction purposes.
|
As a Behavior, the Labeling Behavior is enabled on a Content Type object in Grooper.
|
|||
|
|||
|
Once the Labeling Behavior is enabled, the next big step is collecting label sets for the various Document Types in your Content Model.
Each Document Type has its own set of labels used to define information on the document. For example, the "Factura" Document Type in this Content Model uses the label "PO Number" to call out the purchase order number on this invoice document. A different Document Type, corresponding to a different invoice format, might use a different label such as "Purchase Order Number" or "PO #".
For more information on collecting label sets for the Document Types in your Content Model see the How To section of this article. |
|||
|
Once label sets are collected for each Document Type, they can be used for classification and data extraction purposes. For example, labels were used in this case to:
For more information on how to use labels for these purposes, see the How To section of this article. |
How To
The Labeling Behavior (often referred to as "Label Set Behavior" or just "Label Sets") are well suited for structured and semi-structured document sets. Label Sets are particularly useful for situations where you have multiple variations for one kind of document or another. While the information you want to extract from the document set may be the same from variation to variation, how the data is laid out and labeled may be very different from one variation of the document to another. Label Sets allow you to quickly onboard new Document Types to capture new form structures.
|
We will use invoices for the document set in the following tutorials. In a perfect world, you'd create a Content Model with a single "Invoice" Document Type whose Data Model would successfully extract all Data Elements for all invoices from all vendors every time no matter what. This often not the case. You may find you need to add multiple Document Types to account for variations of an invoice from multiple vendors. Label Sets give you a method of quickly adding to Document Types to model new variations. In our case, we will presume we need to create one Document Type for each vendor. We will start with five Document Types for invoices from five vendors.
|
| ⍗ |
You may download and import the file below into your own Grooper environment (version 2021). This contains the Batch(es) with the example document(s) discussed in this tutorial and the Content Model(s) configured according to the instructions. |
Collect Label Sets
|
Collecting labels for the Document Types in your Content Model will be the first thing you want to do after enabling the Labeling Behavior. Labels for each Data Element in the Document Type's Data Model are defined using the "Labels" tab of the Content Model.
|
|||
|
Collect Field Labels
Now that this document has been classified (assigned a Document Type from our Content Model), we can collect labels for its Document Type. This can be done in one of three ways:
- Lassoing the label in the "Document Viewer".
- Double-clicking the label in the Document Viewer.
- Typing the label in manually.
| ‼ | Going forward, this tutorial presumes you have obtained machine readable text from these documents, either OCR'd text or native text, via the Recognize activity. |
|
Generally the quickest way is by simply lassoing the label in the "Document Viewer".
|
|||
|
|||
|
If you choose, you may also manually enter a label for a Data Element by simply typing it into the text box.
|
|||
|
|||
|
Collect Table and Column Labels
|
Table and column labels can be used for tabular data extraction as well, setting a Data Table object to use the Tabular Layout Extract Method. When collecting labels for this method of table extraction, keep in mind you must collect the individual column headers, and may optionally collect both the full row of column header labels as well. While it is optional, it is generally regarded as best practice to capture the full row of column header labels. This will generally increase the accuracy of your column label extraction. We will do both in this tutorial.
This may seem like you are duplicating your efforts but it is often critical to do both in order for the Tabular Layout Extract Method to map the table's structure and ultimately collect the table's data.
|
|
|
|
|
|
|
Auto Map Labels
|
As you add labels for each Document Type, you may find some documents have labels in common. For example, there are only so many ways to label an invoice number. It might be "Invoice Number", "Invoice No", "Invoice #" or even just "Invoice". Some invoices are going to use one label, others another. When collecting labels for multiple Document Types you can use the "Auto Map" feature to automatically add labels you've previously collected on another Document Type.
|
|
|
Grooper will search the document's text for labels matching those previously collected on other Document Types.
If a match is not found, the Data Element's label is left blank.
As you keep collecting labels for more and more Document Types, the Auto Map feature will pick up more and more labels, allowing you to quickly onboard new Document Types. |
|
|
Be aware, you may still need to validate the auto mapped values and make adjustments.
|
Collect Custom Labels
It's important to keep in mind labels are collected for corresponding Data Elements in a Data Model. You collect one label per Data Element (Data Field, Data Section, Data Table or Data Column). What if you want to collect a label that is distinct from a Data Element, one that doesn't necessarily have to do with a value collected by your Data Model? And why would you even want to?
That's what "Custom Labels" are for. Custom labels serve two primary functions:
- Providing additional labels for classification purposes.
- Providing context labels when a Data Element's label matches multiple points on a document
|
Custom Labels may only be added to Data Model, Data Section or Data Table objects' labels. Put another way, any Data Element in the Data Model's hierarchy that can have child Data Elements can have custom labels. When used for classification purposes, custom labels are typically added to the Data Model itself.
|
|
|
|
|
You may add more Custom Labels to the selected Data Element by repeating the process described above.
|
Custom Labels as Context Labels
|
Some labels are more specific than others. The label "Invoice Date" is more specific than the label "Date". If you see the label "Invoice Date" you know the date you're looking at is the date the invoice was generated. The label "Date" may refer to the invoice's generation date or it could be part of another label like "Due Date". However, some invoice formats will label the invoice date as simply "Date".
This can present a challenge for data extraction. The possibilities for false-positive results tend to crop up the more generic the label used to identify a desired value. There are three separate date values identified by the word "Date" (in full or in part) on this document. |
This is the second reason Custom Labels are typically added for a Document Type, to provide extra context for generic labels, especially when they produce multiple results on a document, leading to false-positive data extraction.
There are two steps to adding and using a Custom Label for this purpose:
- Add the Custom Label.
- Marry the Custom Label with the Data Element's label.
We will refer to this type of a Custom Label as a "Context Label" from here out.
|
The only "trick" to this is adding the Context Label to the appropriate level of the Data Model's hierarchy. Remember, a Custom Label may only be added to a Data Model, Data Section or Data Table object. We cannot add a Custom Label to a Data Field, such as the "Invoice Number" Data Field. To add a Context Label a Data Field can use, we must add the Custom Label to its direct parent Data Element.
|
|
|
|
|
Now that we've added the label, we need to marry the Custom Label with the Data Field its giving extra context to. This is done with the Parent property of a Data Field label.
|
|
|
Use Label Sets for Classification
Label Sets can be used for classifying documents using the Label Match Classification Method. For structured and semi-structured forms labels end up being a way of identifying a document. Without the field data entered, the the labels are really what define the document. You know what kind of document you're looking at based on what kind of information is presented and in the case of Label Match classification how that data is labeled. Even when those labels are very similar from one variant to the next, they end up being a thumbprint of that variant. For example, you might use Label Match classification to create Document Types for different variations of invoices from different vendors. The information presented on each variant from each vendor will be more or less the same, and some labels will be more commonly used by different vendors (such as "Invoice Number"). However, if there is enough variation in the set of labels, you can easily differentiate an invoice from one vendor verses another just based on the variation in labels.
|
Take these four "documents". Each one is collecting the same information:
So we might have five Data Fields in our Data Model, one for each piece of information. We'd also collect one label for each Data Field as well. While the data we want from these documents is the same, there is some variation in the labels used for each different document type. If we wanted to distinguish these four documents from each other by classifying using the Label Match Classification Method. This is all done measuring the similarity between the collected label sets for each Document Type. How is Document Type "B" different from Document Type "A"?
How is Document Type "C" different from Document Type "A"?
How is Document Type "D" different from Document Type "A"?
|
|
Use Label Sets for Field Based Extraction
Using the Labeled Value Extractor Type with Label Sets
Intro to The Labeled Value Extractor
For most static field based extraction, the Labeling Behavior leverages the Labeled Value Extractor Type. Let's first briefly examine how Labeled Value works outside of the Labeling Behavior functionality.
As the name implies, Labeled Value extractor is designed to return labeled values. A common feature of structured forms is to divide information across a series of fields. But it's not as if you just have a bunch of data randomly strewn throughout the document. Typically, the field's value will be identified by some kind of label. These labels provide the critical context to what the data refers to.
Labeled Value relies on the spatial relationship between the label and the value. Most often labels and their corresponding values are aligned in one of two ways.
|
1. The value will be to the right of the label. |
|
|
2. The value will be below the label. |
Labeled Value uses two extractors itself, one to find the label and another for the value. If the two extractors results are aligned horizontally or vertically within a certain amount of space (according to how the Labeled Value extractor is configured), the value's result is returned.
|
However, the Labeled Value extractor's set up is a little different when combining it with the Labeling Behavior. The end result is a simpler configuration, utilizing collected labels for the Label Extractor.
Label Sets and Labeled Value
|
Since this Content Model utilizes the Labeling Behavior, at least part of the setup described in the previous tab was unnecessary. If you've collected a label for the Data Field and that Data Field's Value Extractor is set to Labeled Value, there is no need to configure a Label Extractor. Instead, Grooper will pass through the collected label to the Labeled Value extractor.
|
|
|
| ⚠ |
While you can get a result without configuring the Labeled Value extractor's Value Extractor, that doesn't mean you should. It is considered best practice to always configure the Value Extractor. |
Best Practice Considerations
While you can get a result without configuring the Labeled Value extractor's Value Extractor, that doesn't mean you should. It is considered best practice to always configure the Value Extractor.
So, why is it considered best practice to do so. The short answer is to increase the accuracy of your data extraction. A simple segment could be anything. If you know the data you're trying to extract has a certain pattern to it, you should target that data according to its pattern. Dates, for example, follow a few different patterns. Maybe it's "07/20/1969" or "07-20-69" or "July 20, 1969", but you know it's a date because it has a specific syntax or pattern to it. To increase the accuracy of your extraction, you should configure the Value Reader with an extractor that returns the kind of data you're attempting to return.
|
We can see fairly quickly why leaving the Labeled Value extractor's Value Extractor unconfigured is not ideal.
|
|
|
|
|
Configuring the Labeled Value extractor's Value Extractor also gives you the myriad of functionalities available to extractors. For example, Fuzzy RegEx is one of the main ways Grooper gets around poor OCR data at the time of extraction. When the text data is just a couple characters off of the extractor's regex pattern, Fuzzy RegEx can not only match the imperfect data but "swap" the wrong characters for the right ones, effectively cleansing your result.
|
|
|
However, that's just a single character off from being the right result. We could build an extractor to return currency values looking to make fuzzy swaps like this, both matching text that is slightly off and reformatting the result to match a valid currency format. If we used that extractor as the Labeled Value extractor's Value Extractor it would not only find the segment but also reformat the result, swapping the mis-OCR'd period for what it should be, a comma. And we've done just that.
|
Additional Considerations When Using Labeled Value with Label Sets
Custom Labels to Exclude Results
|
Continuing from the tutorial above's discussion of an unconfigured Labeled Value Value Extractor, let's examine the results of the "Purchase Order Number" Data Field.
This is obviously not what we want. We want the purchase order number listed below it. Ultimately, we will follow best practice and configure the Labeled Value extractor's Value Extractor property. However, before we do, this gives us an opportunity to demonstrate some additional functionality of the Labeling Behavior. This data "Order Date Customer No. Salesperson Order No. Ship Via" is itself comprised of labels pointing to various values on the document. Even though we haven't set up Data Fields in this Data Model to capture the values they point to, we know this is data we don't want. In general, you don't want to use Grooper to extract labels, you want to extract values. |
What's happening here is Grooper is returning all the text on this single line until a collected label in this Document Type's label set is located. In this case, the label Terms was collected for the "Payment Terms" Data Field. None of the text between the label PO Number and the label Terms have been collected in the label set. So, the Labeled Value extractor returns all the text to the right of the "PO Number" Data Field's label (PO Number) and the next encountered label (Terms), resulting in "Order Date Customer No. Salesperson Order Number Ship Via".
|
⚠ |
This is very specific functionality to the Labeled Value extractor and its interaction with label sets. It will only behave this way if you:
|
|
This may be clearer if we add a Custom Label to the label set.
|
|||
|
|||
|
If we were to go one step further and add a There is no text between the Data Field's label and another label in the label set, the Labeled Value Extractor will return absolutely nothing at all.
|
|||
|
HOWEVER, this was not the right solution for this problem. This was only an educational exercise to make you aware of how labels in a label set interact with the Labeled Value extractor when its Value Extractor is left unconfigured.
|
Maximum Noise
|
The Maximum Noise property of the Labeled Value extractor controls the maximum number of "noise characters" allowed in the "bounding-region" of a label-value pair. Now, what does that mean? Let's look at an example, using the "Remit Address" Data Field of our example Data Model.
What gives? It has to do with these "noise characters" mentioned above. |
|
|
Noise characters are any letters and digits falling within the bounding region defined by a label value. For our example, the bounding region looks like this.
|
|
|
The noise characters are any letters or numbers within this rectangle other than the label or the value. The highlighted characters in the image would be the noise characters for our example. The Maximum Noise property allows you to configure how many of these non-label and non-value characters should exist in the bounding box. You don't typically expect to find a bunch of text between a label and a value. The Maximum Noise property acts as an additional filter to avoid returning results too far away from the label. Where the Maximum Distance filters out results that are physically a set distance from the label, the Maximum Noise filters results that have lots of text between them and the label. The default being 5, there can be a maximum of 5 letter or number characters between the label and value. However, in our case, we have more than 5. We have 15 ("FacturaTechnolo").
|
| FYI |
Noise characters are only letters and digits. Spaces, punctuation marks, and control characters are NOT considered noise characters, even if present in the bounding region. |
|
With this in mind, all we need to do to the "Remit Address" Data Field to successfully collect the result at time of extraction is increase the number of allowable noise characters.
|
Using Static Labels for Data Field Extraction
Collecting Static Labels
|
The Data Field elements have a unique label option, the Static label. This label option is useful for situations where the label itself is what you want to extract.
|
|
|
What we really want to do is collect a piece of information that is the same for every single document of one Document Type. We expect the vendor's name "Factura Technology Corp" to be present for every document assigned the "Factura" Document Type during classification. Furthermore, we always expect it to be "Factura Technology Corp" and not something else. Therefore, the vendor's name is "static" for the Document Type. It's present on every Document Type and the same value for every Document Type. You know what else is static on structured and semi-structured forms? Labels! Just in this case the label "Factura Technology Corp" is itself the value we want to return. This is what a Static label is for.
|
Returning the Static Label
|
Now that the Static label is collected, how does Grooper know to return it during extraction when the Extract activity runs? The short answer is the Labeled Value extractor type will do this for us. With "Factura Technology Corp" collected as a Static label, and the "Vendor Name" Data Field configured to utilize the Labeled Value extractor, it will return the Static label itself as the result.
|
Use Label Sets for Tabular Extraction
Label Sets and Tabular Layout
Many tables label the columns so the reader knows what the data in that column corresponds to. How do you know the unit price for an item on an invoice? Typically, that item is in a table and one of the columns of that table is labeled "Unit Price" or something similar. Once you read the labels for each column (also called "column headers"), you the reader know where the table begins (below the column headers) and can identify the data in each row (by understanding what the column headers refer to).
This is also the basic idea behind the Tabular Layout Extraction Method. It too utilizes column header labels to "read" tables on documents, or at least as the step number one in modeling the table's structure so that Grooper can extract data from each cell in the table.
Furthermore, using the Tabular Layout method, collected label sets using a Labeling Behavior can also be used to extract data from tables on documents. In this case, the labels collected for the Data Column children of a Data Table are utilized to help model the table's structure.
Once the column header locations are established, the next requirement is a way to understand how many rows are in the table. This is done by configuring at least one Data Column's Value Extractor property. Generally, there is at least one column in a table that is always present for every row in the table. If you can use an extractor to locate that data below its corresponding column header, that gives you a way of finding each row in the table.
And last there are a few other considerations you might need to make. Is every row in the table a single line or are the rows "multiline"? Do you need to clean up the data the Tabular Layout initially extracts for a column by normalizing it with an extractor? Do you need to establish a table "footer" to limit the number of rows extracted?
This tutorial will cover the basic configuration of the Tabular Layout Extraction Method using collected Label Sets and address a few of these considerations.
|
The basic steps will be as follows:
In a perfect world, you're done at that point. As you can see in this example, we've populated a table. Data is collected for all four Data Columns for each row on the document. However, the world is rarely perfect. We will discuss some further configuration considerations to help you get the most out of this table extraction method in the "Additional Considerations" section below. |
Collect Labels
See the above how to (Collect Label Sets) for a full explanation of how to collect labels for Document Types in a Content Model. The following tutorial will presume you have general familiarity with collecting labels.
|
As far as strict requirements for collecting labels for tabular data extraction goes, you must at minimum collect a label for each Data Column you wish to extract. For this "Stuff and Things" Document Type, one column header label has been collected for each of the four Data Column children of the "Line Items" Data Table.
|
|
|
You may optionally collect a label for the entire row of column header labels. This label is collected for the parent Data Table object's label.
It is generally considered best practice to capture a header row label for the Data Table. But if it's optional, why do it? What is the benefit of this label? |
The answer has to do with imperfect OCR text data and Fuzzy RegEx. Fuzzy RegEx provides a way for regular expression patterns to match in Grooper when the text data doesn't strictly match the pattern. The difference between the regex pattern Grooper and the character string "Gro0per" is just off by a single character. An OCR engine misreading an "o" character for a zero is not uncommon by any means, but a standard regex pattern of Grooper will not match the string "Gro0per". The pattern expects there to be an "o" where there is a zero.
Using Fuzzy RegEx instead of regular regex, Grooper will evaluate the difference between the regex pattern and the string. If it's similar enough (if it falls within a percentage similarity threshold) Grooper will return it as a match.
- FYI "similarity" may also be referred to as "confidence" when evaluating (or scoring) fuzzy match results. Grooper is more or less confident the result matches the regex pattern based on the fuzzy regex similarity between the pattern and the imperfect text data. A similarity of 90% and a confidence score of 90% are functionally the same thing (One could argue there is a difference between these two terms when Fuzzy Match Weightings come into play, but that's a whole different topic. And you may encounter Grooper users who use the terms "similarity" and "confidence" interchangeably regardless. Visit the Fuzzy RegEx article if you would like to learn more).
|
So how does this apply to the Data Table's column header row label? The short answer is it provides a way to increase the accuracy of Data Column column header labels by "boosting" the similarity of the label to imperfect OCR results.
As we will see, capturing the full row of column header labels will boost the similarity, allowing the label to match without altering the Label Behavior's fuzzy match settings. |
|
|
First, notice what's happened when we lassoed the row of column header labels.
|
|
Not magic. Just math. The Data Table's column header row label is much much longer than a single Data Column's column header label. There are just more characters in "Qty. Qty. Item Number Description Unit Price Extended Price\r\nOrd. Shp." than "Description" (70 vs 11). Where the "Description" Data Column's label is roughly 82% similar to the text data (9 out of 11 characters), the "Line Item" Data Table's label, comprised of the whole row of column labels, is roughly 96% similar to the text data (67 out of 70 characters). Utilizing a Data Table label allows you to hijack the whole row's similarity score when a single Data Column's similarity threshold. If the label can be matched as a part of the larger whole, its confidence score goes up much further than by itself. The Data Table's larger label of the full row of column labels gives extra context to the "Description" Data Column's label, providing more information about what is and is not an appropriate match. So why is it considered best practice to capture a label for the Data Table? OCR errors are unpredictable. The set of examples you worked with when architecting this solution may have been fairly clean with good OCR reads. That may not always be the case. Capturing a Data Table label for the column label row will act as a safety net to avoid unforeseen problems in the future. |
Assign a Data Column's Value Extractor
Step 1 is done. We've collected labels for the "Line Item" Data Table and its Data Columns for each Document Type in this Content Model. Step 2 is configuring and assigning a Value Extractor for at least one Data Column.
Why is this necessary? Think about what we've done so far. We've collected labels for the Data Columns. Grooper now has a way to figure out where the columns are on the document. But what does it know about the rows?
Rows come under columns. We know that much. So, Grooper at least knows to look for rows underneath the collected Data Column labels. But that's about it. It doesn't know the size of each row. It doesn't know the spacing between the rows. Probably most importantly, it doesn't know how many rows there are. Tables tend to be dynamic. They may have 3 rows on one document and 300 on the next. Grooper needs a way of detecting this.
|
Indeed, if we were to test extraction with just labels collected, we would not get any result whatsoever.
|
|
|
This is why we need a Data Column's Value Extractor property configured, to give the Extract activity an awareness of the rows beneath the column labels. The key thing to keep in mind is this data must be present on every row. You'll want to pick a column whos data is always present for every row, where it would be considered invalid if the information wasn't in that cell for a given row. In our case, we will choose the "Quantity" Data Column. We always expect there to be a quantity listed for the line item on the invoice, even if that quantity is just "1".
|
|
|
This is the pattern we will use for the "Quantity" Data Column's Value Extractor.
We get a bunch of other hits as well. This is a very generic extractor matching very generic numerical data.
|
For fairly simple table structures we now have the two things the Tabular Layout method needs to extract data:
- Collected labels for the Data Column labels (and optionally the whole row of column labels for the Data Table)
- Configured at least one Data Column with its Value Extractor configured.
Now, all we need to do is tell Data Table object we want to use the Tabular Layout method. We do this by setting its Extract Method property to Tabular Layout.
Set Extract Method to Tabular Layout and Test
|
A Data Table's extraction method is set using the Extract Method property. To enable the Tabular Layout method, do the following.
|
|
|
Now, let's test out what we have and see what we get!
For the Tabular Layout method, the Data Table is populated using primarily two pieces of information.
|
|
|
With these pieces of information, the Tabular Layout method can start to determine the table's structure. If you know where the columns are and how big they are, and you know how many rows there are, you pretty much know what the table looks like. This allows Grooper to create data instances for each cell in the table.
|
Additional Considerations
Multiline Rows
Data Column Value Extractors
Data Element Overrides
Use Label Sets for Sectional Extraction
Additional Information
Include information in this section on the following topics if not able to flesh it out in the About or How To sections. And probably this section will be helpful even if you do talk about it earlier. There's no space in Design Studio to detail this information in a help panel.
Custom Labels
Layout Options
Version Differences
2021
The Labeling Behavior is brand new functionality in Grooper version 2021. Prior to this version, its functionality may have been able to be approximated by other objects and their properties (For example, a Data Type using the Key-Value Pair collation is at least in some ways similar to how the Labeled Value Extractor Type works). However, creation of label sets using Document Types and their implementation described above was not available prior to version 2021.
































































