2023:Labeled Value (Value Extractor): Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Line 157: Line 157:


|}
|}
</tab>
<tab Name="The Final Result" style="margin:20px">
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
====The Final Result====
|-
|valign=top style="width:40%"|
Once your '''''Label Extractor''''' and '''''Value Extractor''''' are set up, you should see some results. As you can see in this example, since the "Order Date" was set as the '''''Label Extractor''''', the date value that is returned is the closest to the label.
If you set your '''''Label Extractor''''' and '''''Value Extractor''''' up properly and Grooper is still not returning the results you want, look below at the Advanced Setup section. There you will find information on how to increase the accuracy of your results.
|
[[File:2023-Labeled Value-How To 13.png]]
</tab>
</tab>
[[#Basic Setup|Click here to return to the top of the section]]
[[#Basic Setup|Click here to return to the top of the section]]
</tabs>  
</tabs>  


===Advanced Setup===


<tabs style="margin:20px">
<tabs style="margin:20px">




<tab Name="Configuring Maximum Distance" style="margin:20px">
<tab Name="Maximum Distance" style="margin:20px">


{|cellpadding=10 cellspacing=5
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
|valign=top style="width:40%"|


====Configuring Maximum Distance====
====Maximum Distance====


|-
|-
Line 217: Line 238:
</tab>
</tab>


<tab Name="Maximum Noise" style="margin:20px">
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
====Maximum Noise====
|-
|valign=top style="width:40%|
|
</tabs>
</tabs>


===Advanced Setup===


==See Also==
==See Also==


* [[Key-Value Pair (Collation Provider)]]
* [[Key-Value Pair (Collation Provider)]]

Revision as of 15:48, 14 February 2023

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.

A Labeled Value is an extractor type that can be used when configuring several data extraction tools such as a Value Reader or Data Type. It is designed to return text segments that have a spatial relationship to a defined label.


About

A Labeled Value is configured using two extractors: the Label Extractor and the Value Extractor. When the Label Extractor is set, Grooper uses spatial context to determine what value to return (for more on spatial context, see the Data Context wiki article).

The Labeled Value extracts information similarly to collating a Key-Value Pair. Unlike a Key-Value Pair, the extractor is self-contained in one object. There is no need to set one object as a "Key" and another object as a "Value". Instead, both the "Label" and the "Value" can be set on one object.


How To

Selecting the Extractor

Configuring on a Value Reader

  1. In your Node Tree, create or select a Value Reader.
    • Visit the Value Reader Wiki Page for instructions on how to create a Value Reader.
  2. Select the "Value Reader" tab.
  3. Click the drop down list next to Extractor and select Labeled Value.

Configuring on a Data Type

  1. In your Node Tree, create or select a Data Type.
    • Visit the Data Type Wiki Page for instructions on how to create a Data Type.
  2. Select the "Data Type" tab.
  3. Click the drop down list next to Local Extractor and select Labeled Value.

Configuring on Other Object Types

The Labeled Value extractor can be used on a multitude of object types. For each one, the configuration process is similar to both the Value Reader and Data Type objects. You may have to select a specific tab to find the extractor property.

Click here to return to the top of the section

Basic Setup

Once you select Labeled Value as your extractor, you will need to set both your label and your value to be extracted.

Configuring the Label Extractor

  1. Once Labeled Value is set as the extractor on the object you are configuring, click on the "Tester" tab.
  2. Click the drop down next to the Label Extractor property and select an extractor to use.
    • For the purposes of this example, we are going to use a List Match extractor. However, any extractor can be used to capture the desired label.


  1. Configure your chosen extractor to collect the desired label.

Configuring the Value Extractor

  1. Once Labeled Value is set as the extractor on the object you are configuring, click on the "Tester" tab.
  2. Click the drop down next to the Value Extractor property and select an extractor to use.
    • For the purposes of this example, we are going to use a Reference to another extractor that has been previously configured. However, any extractor can be used to capture the desired values.

  1. Configure your chosen extractor to collect the desired values.
    • When both the Label Extractor and Value Extractor are set up properly, the label will be outlined in blue while the extracted value will be highlighted in green.

The goal when setting up this part of the Labeled Value extractor is not to collect an individual text segment, but rather collect the type of information you are trying to extract. In this example, we are referencing an extractor designed to collect all dates on the document. The Label Extractor will let Grooper know which individual text segment to extract.

The Final Result

Once your Label Extractor and Value Extractor are set up, you should see some results. As you can see in this example, since the "Order Date" was set as the Label Extractor, the date value that is returned is the closest to the label.

If you set your Label Extractor and Value Extractor up properly and Grooper is still not returning the results you want, look below at the Advanced Setup section. There you will find information on how to increase the accuracy of your results.


Click here to return to the top of the section

Advanced Setup


Maximum Distance

Once the Label Extractor and the Value Extractor have been set, check to see if the correct text segment is being extracted. Sometimes Grooper will produce an undesirable result based on layout of the document. When this happens, the Maximum Distance property can be edited to hopefully improve your results.

By default, the Maximum Distance is set at 2in to the right and 2in to the bottom. This setting is telling Grooper to look for a value that is located two inches to the right and two inches below the Label Extractor.

If Grooper finds a result within that 2x2 region, it will return that result as the value. If Grooper finds multiple results within that 2x2 region, it will return the closest result as the value.

In the picture to the right, we can see that the extractor is working appropriately. With the layout of the Order Date and Invoice Date, Grooper is able to pick out the Order Date with the default settings.

However, in this picture, Grooper is extracting the wrong value. There are many times where the spatial layout of a document would require additional configuration. So, why is Grooper grabbing the wrong value?


With the default Maximum Distance selected, all values within this zone are considered. Both the Invoice Date and the Order Date are within the range of the Maximum Distance. Since the Invoice Date is closer to the "Order Date" label, Grooper is returning this as the value.

If we change the Maximum Distance to just "2in" to the right, we see that the zone changes. Only the correct date is within the range of the Maximum Distance.

With the Maximum Distance properly configured, we see that Grooper now extracts the correct value.

<tab Name="Maximum Noise" style="margin:20px">

Maximum Noise




See Also