Key-Value Pair (Collation Provider): Difference between revisions
Dgreenwood (talk | contribs) No edit summary |
Dgreenwood (talk | contribs) |
||
| Line 107: | Line 107: | ||
They '''''Layout''''' method will define how the "Key Extractor" and "Value Extractor" results are spatially related to each other. Is the key above the value? Is it to the left of it? The right? Configuring these settings will dictate where you expect to find the value in relation to the key. | They '''''Layout''''' method will define how the "Key Extractor" and "Value Extractor" results are spatially related to each other. Is the key above the value? Is it to the left of it? The right? Configuring these settings will dictate where you expect to find the value in relation to the key. | ||
The '''''Layout''''' may be: | {|cellpadding=10 cellspacing=5 | ||
# Horizontal | |style="width:40%" valign=top| | ||
# Vertical | All Collation Providers have their own set of configurable properties, including ''Key-Value Pair''. | ||
# Flow | # To view the ''Key-Value'' pair properties, either double-click the '''''Collation''''' property or single click the arrow to the left of the property. | ||
# To select and configure a '''''Layout''''' enable one of the three Layout Providers. | |||
#* The '''''Layout''''' may be: | |||
#** Horizontal | |||
#** Vertical | |||
#** Flow | |||
| | |||
[[File:Key-Value Pair - Grooper Screenshots 09.png]] | |||
|} | |||
{|cellpadding="10" cellspacing="5" | |||
|-style="background-color:#36b0a7; color:white" | |||
|style="font-size:14pt"|'''FYI'''||It is possible to enable multiple '''''Layouts''''' and use multiple layouts on a single '''Data Type''', but do so with caution. | |||
If there is only a ''single'' key found by the "Key Extractor" (This is often the case), only a ''single'' result will return, using a single '''''Layout'''''. Be careful when enabling multiple '''''Layouts''''' as there is a certain order of operations when it comes to which layout is used first. It is as follows: First, the '''''Horizontal Layout's''''' result is returned. If there is no result from that layout, then the '''''Flow Layout's''''' result is returned. Last, if no other layouts produce results, the '''''Vertical Layout's''''' result is returned. | |||
If there are ''multiple'' keys, this can get even more complicated with each key using its own '''''Layout''''', according to this order of operations. | |||
You may find it easier or more prudent to create a separate ''Key-Value Pair'' collated '''Data Type''' for each '''''Layout''''' as children of a parent '''Data Type'''. You then have more control of which result is returned via the '''''Order By''''' property. | |||
|} | |||
</tab> | |||
<tab name="Horizontal Layout" style="margin:20px"> | |||
=== The Horizontal Layout === | |||
{|cellpadding=10 cellspacing=5 | {|cellpadding=10 cellspacing=5 | ||
|style="width:40%" valign=top| | |style="width:40%" valign=top| | ||
# View the ''Key-Value Pair'' configuration properties by expanding the the '''''Collation''''' property. | # View the ''Key-Value Pair'' configuration properties by expanding the the '''''Collation''''' property. | ||
# Select '''''Horizontal Layout'''''. | # Select '''''Horizontal Layout'''''. | ||
| Line 123: | Line 143: | ||
[[File:Key-Value Pair - Grooper Screenshots 05.png]] | [[File:Key-Value Pair - Grooper Screenshots 05.png]] | ||
|} | |} | ||
</tab> | |||
<tab name="Vertical Layout" style="margin:20px"> | |||
=== The Vertical Layout === | |||
{|cellpadding=10 cellspacing=5 | {|cellpadding=10 cellspacing=5 | ||
|style="width:40%" valign=top| | |style="width:40%" valign=top| | ||
# View the ''Key-Value Pair'' configuration properties by expanding the the '''''Collation''''' property. | # View the ''Key-Value Pair'' configuration properties by expanding the the '''''Collation''''' property. | ||
# Select '''''Vertical Layout'''''. | # Select '''''Vertical Layout'''''. | ||
| Line 139: | Line 161: | ||
[[File:Key-Value Pair - Grooper Screenshots 06.png]] | [[File:Key-Value Pair - Grooper Screenshots 06.png]] | ||
|- | |- | ||
| | ||style="width:40%" valign=top| | ||
'''Resolving Common Problems''' | |||
Coincidentally, this could be the right value you're looking for, but more often than not, this will produce ''false positive'' results. Typically, keys are much closer to their values than what we see here. We need a way to restrict the space between the two extractor results to toss out these false positives. | Coincidentally, this could be the right value you're looking for, but more often than not, this will produce ''false positive'' results. Typically, keys are much closer to their values than what we see here. We need a way to restrict the space between the two extractor results to toss out these false positives. | ||
| Line 168: | Line 192: | ||
{|cellpadding=10 cellspacing=5 | {|cellpadding=10 cellspacing=5 | ||
|style="width:40%" valign=top| | |style="width:40%" valign=top| | ||
'''Flow | </tab> | ||
<tab name="Flow Layout" style="margin:20px"> | |||
=== The Flow Layout === | |||
The '''Flow Layout''' is a little different. Instead of using the linear horizontal or vertical relationship between the key and value data instances, it uses their relationship in the "flow" of the text data. This method travels from the key looking for the value as an English reader would, starting at the key, going character-by-character (typically) left to right and line-by-line top to bottom. | |||
# View the ''Key-Value Pair'' configuration properties by expanding the the '''''Collation''''' property. | # View the ''Key-Value Pair'' configuration properties by expanding the the '''''Collation''''' property. | ||
# Select '''''Flow Layout'''''. | # Select '''''Flow Layout'''''. | ||
Revision as of 09:20, 27 August 2020

Key-Value Pair is a Collation Provider for Data Type extractors. It uses the layout relationship between a key and a value on a document to return a result.
Key-Value Pair collation is one of the most commonly used Collation Providers. It provides an excellent way to extract data when a value exists next to a label on a document, whether next to it horizontally, vertically, or even in a "right-to-left & top-to-bottom" text flow.
About
The Key-Value Pair Collation Provider utilizes the spatial relationship between two related extractor results to return a single result, typically looking for a piece of data (the value) next to a label (the key).
|
For structured documents, it is common for a piece of data to be identified by some sort of label, usually to the left of it, or above it. In these images, the field label, highlighted in blue, identifies the field's value, highlighted in yellow. We use this kind of labeling relationship to identify data on documents all the time. The Key-Value Pair Collation Provider is perfectly suited to use these labeling relationships. Key-Value Pair collated Data Types (often just referred to as Key-Value Pairs) collate the results of two extractors, a "key extractor" and a "value extractor". The "key extractor" will locate the label (or whatever context is being used to return the data you want). The "value extractor" will return all possible values matching the data you want to return. Once collated, the Key-Value Pair will return the closest value to the key, according to the assigned Layout Settings (The top image uses a Horizontal Layout because the labels are aligned next to each other horizontally. The bottom uses a Vertical Layout). |
|
Key-Value Pair collation also has applications in unstructured document processing. Unstructured documents convey information in paragraphs and sentences more than they do with structured fields. Because of this, the value may not be horizontally or vertically aligned, but somewhere before or after a labeling key in the text flow. For these situations, the Flow Layout can be used, which will use the relationship between the key and the value in the text data's right-to-left and top-to bottom text flow. A Key-Value Pair could be build to extract the driver name (highlighted here in yellow), using the phrase "driver's name" in the text flow before it. |
How To
Create A Key-Value Pair Extractor
Create a Data Type
Data Type extractors use Collation Providers to combine, filter, or otherwise manipulate extraction results. Collation Providers are set using the Data Type's Collation property.
|
So, the very first thing to do is create a Data Type. Here, we are creating the Data Type in the Local Resources folder of a Content Model.
|
Create the Key Extractor
Key-Value Pair extractors must have exactly two extractors, a "Key Extractor" and a "Value Extractor". The "Value Extractor" is ultimately the value the Key-Value Pair returns. The "Key Extractor" is how you find the value. It's result will be used as a positional anchor to find the value. Our goal with the document seen here is to differentiate between the various "home phone" numbers from the "cell phone" numbers. So, our key extractor simply needs to find the label, "home phone".
|
Create the Value Extractor
For Key-Value Pair extractors, the "Value Extractor" is the extractor looking for the data you ultimately want to return. Here, we're looking for a home phone number. So, we simply need an extractor that finds phone numbers
|
Set the Collation Provider
Once you select Key-Value Pair, you will not see the results list change. It will still appear as if the two child extractors' results are being returned one by one (like the Individual Collation Provider). Some Collation Providers, such as Key-Value Pair, require some configuration before their results are collated. Specifically, you must choose which Layout method is used. |
Configure the Layout Settings
They Layout method will define how the "Key Extractor" and "Value Extractor" results are spatially related to each other. Is the key above the value? Is it to the left of it? The right? Configuring these settings will dictate where you expect to find the value in relation to the key.
|
All Collation Providers have their own set of configurable properties, including Key-Value Pair.
|
| FYI | It is possible to enable multiple Layouts and use multiple layouts on a single Data Type, but do so with caution.
If there is only a single key found by the "Key Extractor" (This is often the case), only a single result will return, using a single Layout. Be careful when enabling multiple Layouts as there is a certain order of operations when it comes to which layout is used first. It is as follows: First, the Horizontal Layout's result is returned. If there is no result from that layout, then the Flow Layout's result is returned. Last, if no other layouts produce results, the Vertical Layout's result is returned. If there are multiple keys, this can get even more complicated with each key using its own Layout, according to this order of operations. You may find it easier or more prudent to create a separate Key-Value Pair collated Data Type for each Layout as children of a parent Data Type. You then have more control of which result is returned via the Order By property. |
The Horizontal Layout
This will look for results of the "Value Extractor" and only return the closest horizontally to the right of the "Key Extractor's" result. None of the other phone numbers on the page are horizontally aligned with the key "home phone'. So, only a single result is returned. |
The Vertical Layout
This will look for results of the "Value Extractor" and only return the closest vertically, below the "Key Extractor's" result. But... Something is not right here. What we want to extract is the phone number underneath the "home phone" label directly below it, as seen in the box labeled "Vertical Layout" on the document.
The "Key Extractor" is set up to return "home phone", which appears three times on this document. The first time it appears is at the time of the page. If you draw a vertical line down from that label, sure enough, there is a phone number there, it's just using the wrong label. |
|||
| style="width:40%" valign=top|
Resolving Common Problems Coincidentally, this could be the right value you're looking for, but more often than not, this will produce false positive results. Typically, keys are much closer to their values than what we see here. We need a way to restrict the space between the two extractor results to toss out these false positives. There are two very common ways to do this for Key-Value Pair Collation, the Maximum Distance and the Enforce Line Boundaries properties.
|
|||
|
The other solution to this problem will only be possible if your document utilizes lines to break up sections or fields. For documents that do this, information is visually divided from other information by putting it in a box. This example also does that. The values horizontally aligned are put in one box. The values vertically aligned are put in another box. Because of this fact, we can take advantage of the Enforce Line Boundaries property.
|
The Flow Layout
The Flow Layout is a little different. Instead of using the linear horizontal or vertical relationship between the key and value data instances, it uses their relationship in the "flow" of the text data. This method travels from the key looking for the value as an English reader would, starting at the key, going character-by-character (typically) left to right and line-by-line top to bottom.
- View the Key-Value Pair configuration properties by expanding the the Collation property.
- Select Flow Layout.
- Change the property from Disabled to Enabled.










