2023:List Match (Value Extractor): Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
 
(28 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{|cellpadding=10 cellspacing=5 style="margin:12px"
{{AutoVersion}}
|-style="background-color:#ed2330; color:white"
 
|style="font-size:14pt"|'''WIP'''
<blockquote>{{#lst:Glossary|List Match}}</blockquote>
 
{|class="download-box"
|
|
This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansionIt may be incomplete, inaccurate, or stop abruptly.
[[File:Asset 22@4x.png]]
 
|
This tag will be removed upon draft completion.
You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023). The first contains one or more '''Batches''' of sample documentsThe second contains one or more '''Projects''' with resources used in examples throughout this article.  
* [[Media:2023 Wiki List-Match Batch.zip]]
* [[Media:2023 Wiki List-Match Project.zip]]
|}
|}
<blockquote>
A '''''List Match''''' is an extractor type that can be used when configuring several data extraction tools such as a '''Value Reader''' or '''Data Type'''. It is designed to return values matching one or more items in a defined list. By default, the '''''List Match''''' extractor does not use or require regular expressions (regex).
</blockquote>


==About==
==About==
The '''''List Match''''' is one of the the simplest extractors used in Grooper. It is designed to return values matching one or more items in a defined list. This can be used to specific words or full phrases contained within a document. A '''''List Match''''' extractor returns an exact match including any spaces, numbers, punctuation, or special characters.


The '''''List Match''''' is one of the the simplest extractors used in Grooper. It is designed to return values matching one or more items in a defined list. This can be used to extract numbers, specific words, or full phrases contained within a document. A '''''List Match''''' extractor returns an exact match including any spaces, numbers, punctuation, or special characters.
<!--In here put information on typing in the list as a local resource or lexicons-->
To configure a '''''List Match''''', you can input the desired extracted value as a '''''Local Entry''''' or reference a pre-configured '''Lexicon'''.  
To configure a '''''List Match''''', you can input the desired extracted value as a '''''Local Entry''''' or reference a pre-configured '''Lexicon'''.  


Unlike a '''''Pattern Match''''', the '''''List Match''''' extractor does not use or require regular expressions by default, but regex can be enabled in the properties menu. Similar to a '''''Pattern Match''''', '''''Suffix''''' and '''''Prefix Patterns''''' can be added to help anchor the expression and limit the amount of false positives extracted.
Unlike a '''''Pattern Match''''', the '''''List Match''''' extractor does not use or require regular expressions by default, but regex can be enabled in the properties menu. Similar to a '''''Pattern Match''''', '''''Suffix''''' and '''''Prefix Patterns''''' can be added to help anchor the list item and limit the amount of false positives extracted.


==How To==
==How To==
Line 34: Line 31:
If the information you need to extract follows a specific pattern, such as a date or social security number, then it may be better to consider a different extractor like a '''''[[Pattern Match]]'''''.  
If the information you need to extract follows a specific pattern, such as a date or social security number, then it may be better to consider a different extractor like a '''''[[Pattern Match]]'''''.  


A '''''List Match''''' can be configured using a '''''Local Entry''''' or a '''''Lexicon'''''. '''''Local Entries''''' are simple and easy to set up, especially if you only need to add a few entries. If you plan to extract a large amount of information multiple times across different objects, it might be more efficient to set up a '''''Lexicon''''' to reference first.
===Configuring by Object Type===


===Configuring by Object Type===


<tabs>
<tabs style="margin:20px">


<tab Name="Configuring on a Value Reader">
<tab Name="Configuring on a Value Reader" style="margin:20px">
{|cellpadding=10 cellspacing=5
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
|valign=top style="width:40%"|
# In your '''Node Tree''', right click and create a '''Value Reader'''(if you have not already created one).  
 
# Select the '''Value Reader''' to bring up its configuration properties.  
====Configuring on a Value Reader====
 
|-
|valign=top style="width:40%"|
 
# In your '''Node Tree''', create or select a '''Value Reader'''.
#* Visit the [[Value Reader]] Wiki Page for instructions on how to create a '''Value Reader'''.
# Select the "Value Reader" tab.  
# Select the "Value Reader" tab.  
# Click the drop down list next to '''''Extractor''''' and select ''List Match''.  
# Click the drop down list next to '''''Extractor''''' and select ''List Match''.  


|
|
[[File:List Match Steps 3 and 4.png]]
[[File:2023-List Match-How To 01.png]]


|}  
|}  
</tab>
</tab>


<tab Name="Configuring on a Data Type">
<tab Name="Configuring on a Data Type" style="margin:20px">
# In your '''Node Tree''', right click and create a '''Data Type'''(if you have not already created one).  
{|cellpadding=10 cellspacing=5
# Select the '''Data Type''' object to bring up its configuration properties.  
|valign=top style="width:40%"|
# In the ''Data Type'' tab, click the drop down list next to ''Local Extractor'' and select '''''List Match'''''.  
 
====Configuring on a Data Type====
 
|-
|valign=top style="width:40%"|
 
# In your '''Node Tree''', create or select a '''Data Type'''.
#* Visit the [[Data Type]] Wiki Page for instructions on how to create a '''Data Type'''.
# Select the "Data Type" tab.
# Click the drop down list next to '''''Local Extractor''''' and select ''List Match''.  
 
|
[[File:2023-List Match-How To 02.png]]
 
|}
</tab>
 
<tab Name="Configuring on Other Object Types" style="margin:20px">
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|


====Configuring on Other Object Types====
The '''''List Match''''' extractor can be used on a multitude of object types. Any object that has an extractor property can be configured with a '''''List Match'''''.
The configuration process on other objects is identical to both the '''Value Reader''' and '''Data Type''' objects. Simply select a '''''List Match''''' as your extractor.
<br>
Examples where you can use a '''''List Match''''' include:
* A '''Data Type''''s '''''Value Extractor''''' property
* A '''Document Type''''s '''''Positive Extractor''''' property
* The '''''Labeled Value''''' extractor's '''''Label Extractor''''' property
* The '''''Pattern-Based Separation Provider''''''s '''''Value Extractor''''' property
|}
</tab>
</tab>
[[#Configuring by Object Type|Click here to return to the top of the section]]


</tabs>
</tabs>
Line 65: Line 102:
===Local Entries vs Lexicons===
===Local Entries vs Lexicons===


<tabs>
A '''''List Match''''' can be configured using a '''''Local Entry''''' or a '''Lexicon'''. '''''Local Entries''''' are simple and easy to set up, especially if you only need to add a few entries. If you plan to extract a large number of items from a list or plan on building multiple extractors using the same list, it might be more efficient to set up a '''Lexicon''' to reference first.
 
<tabs style="margin:20px">


<tab Name="Configuring Local Entries">
<tab Name="Configuring Local Entries" style="margin:20px">
{|cellpadding=10 cellspacing=5
{|cellpadding=10 cellspacing=5
|valign=top style="width:40%"|
|valign=top style="width:40%"|
====Configuring Local Entries====


|-
|-
|valign=top style="width:40%"|
|valign=top style="width:40%"|
# Save and select the "Tester" tab making sure the ''Expressions'' sub-tab is selected.
 
# Under '''''LOCAL ENTRIES''''', type the desired text to be extracted.  
# For '''Value Readers''', select the object you wish to configure and click "Tester" tab.
#* Hit Enter after each entry to extract multiple text segments under one ''List Match''.  
#* When configuring a '''Data Type''', first click the ellipsis button at the end of the '''''Local Extractor''''' property with ''List Match'' selected to bring up the editing window.
#* If needed, add a '''''Prefix''''' and '''''Suffix Pattern''''' to anchor your extraction (When using tabs as an anchor (<code>\t</code>) make sure ''Tab Marking'' is set to Enabled under ''Preprocessing'' in your ''Properties'' tab).
# Make sure the "Expressions" sub-tab is selected.
 
|
[[File:2023-List Match-How To 03.png]]
 
|-
|valign=top style="width:40%"|
#<li value=3> Under '''''LOCAL ENTRIES''''', type the desired text to be extracted.  
#* Hit Enter after each entry to extract multiple list items under one '''''List Match'''''.  
# If needed, add a '''''Prefix''''' and '''''Suffix Pattern''''' to anchor your extraction to a regex pattern.
#*When using tabs as an anchor (<code>\t</code>) make sure '''''Tab Marking''''' is set to ''Enabled'' under '''''Preprocessing''''' in your "Properties" tab.
# Save and test your extraction.
# Save and test your extraction.


|
|
[[File:List Match Steps 5 and 6.png]]  
[[File:2023-List Match-How To 04.png]]  


|}  
|}  
</tab>
</tab>


<tab Name="Referencing Lexicons">
<tab Name="Referencing Lexicons" style="margin:20px">
# In your '''Node Tree''', right click and create the desired object such as a '''Data Type''' or '''Value Reader'''.
 
# Select the created object to bring up its configuration properties.
{|cellpadding=10 cellspacing=5
# In the ''Value Reader'' tab, click the drop down list next to ''Extractor'' and select '''''List Match'''''.  
|valign=top style="width:40%"|
# Save and select the ''Tester'' tab. Then make sure the ''Properties'' sub-tab is selected.
 
# Click the arrow next to ''Vocabulary'' to access additional properties.  
====Referencing Lexicons====
# Click the ellipsis button next to the ''Included Lexicons'' property. This should open a new window where you can add pre-configured Lexicons.  
 
# In the new window, click through the '''''Projects''''' and '''''Folders''''' until you find the desired Lexicon. Click the check boxes next to the desired Lexicons.
|-
# Click ''OK'' to apply the Lexicon.  
|valign=top style="width:40%"|
# Save and test your extraction.  
 
# For '''Value Readers''', select the object you wish to configure and click the "Tester" tab.
#* When configuring a '''Data Type''', click the ellipsis button at the end of the '''''Local Extractor''''' property with ''List Match'' selected to bring up the editing window before continuing to the next step.  
# Select the "Properties" tab.
 
|
[[File:2023-List Match-How To 05.png]]
 
|-
|valign=top style="width:40%"|
# <li value=3>Click the arrow next to the '''''Vocabulary''''' property to expand its sub-properties.  
# Click the ellipsis button at the end of the '''''Included Lexicons''''' property. This will open a new window where you can add pre-configured '''Lexicons'''.
|
 
[[File:2023-List Match-How To 08.png]]
 
|-
|valign=top style="width:40%"|
 
#<li value=5> In the new window, click through the '''Projects''' and '''Folders''' until you find the desired '''Lexicon'''. Click the check boxes next to the desired '''Lexicons'''.
# Click ''OK'' to apply the '''Lexicons'''.
 
|
[[File:2023-List Match-How To 06.png]]
 
|-
|valign=top style="width:40%"|
#<li value=7> Save and test your extraction.  
 
|
[[File:2023-List Match-How To 07.png]]
 
</tab>
</tab>
 
[[#Local Entries vs Lexicons|Click here to return to the top of the section]]
</tabs>
</tabs>


==See Also==
==See Also==


* [[Value Reader]]
* [[Value Reader]]
* [[Pattern Match]]

Latest revision as of 16:00, 27 August 2025

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

202520242023

List Match is a Value Extractor designed to return values matching one or more items in a defined list. By default, the List Match extractor does not use or require regular expression, but can be configured to utilize regular expression syntax.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

About

The List Match is one of the the simplest extractors used in Grooper. It is designed to return values matching one or more items in a defined list. This can be used to specific words or full phrases contained within a document. A List Match extractor returns an exact match including any spaces, numbers, punctuation, or special characters.

To configure a List Match, you can input the desired extracted value as a Local Entry or reference a pre-configured Lexicon.

Unlike a Pattern Match, the List Match extractor does not use or require regular expressions by default, but regex can be enabled in the properties menu. Similar to a Pattern Match, Suffix and Prefix Patterns can be added to help anchor the list item and limit the amount of false positives extracted.

How To

A List Match is most commonly used when configuring objects such as Value Readers or Data Types. It is great for extracting text information such as:

  • Specific company names
  • Field labels
  • Headers and Footers
  • Full phrases
  • Exact numbers

If the information you need to extract follows a specific pattern, such as a date or social security number, then it may be better to consider a different extractor like a Pattern Match.

Configuring by Object Type

Configuring on a Value Reader

  1. In your Node Tree, create or select a Value Reader.
    • Visit the Value Reader Wiki Page for instructions on how to create a Value Reader.
  2. Select the "Value Reader" tab.
  3. Click the drop down list next to Extractor and select List Match.

Configuring on a Data Type

  1. In your Node Tree, create or select a Data Type.
    • Visit the Data Type Wiki Page for instructions on how to create a Data Type.
  2. Select the "Data Type" tab.
  3. Click the drop down list next to Local Extractor and select List Match.

Configuring on Other Object Types

The List Match extractor can be used on a multitude of object types. Any object that has an extractor property can be configured with a List Match.

The configuration process on other objects is identical to both the Value Reader and Data Type objects. Simply select a List Match as your extractor.


Examples where you can use a List Match include:

  • A Data Type's Value Extractor property
  • A Document Type's Positive Extractor property
  • The Labeled Value extractor's Label Extractor property
  • The Pattern-Based Separation Provider's Value Extractor property

Click here to return to the top of the section

Local Entries vs Lexicons

A List Match can be configured using a Local Entry or a Lexicon. Local Entries are simple and easy to set up, especially if you only need to add a few entries. If you plan to extract a large number of items from a list or plan on building multiple extractors using the same list, it might be more efficient to set up a Lexicon to reference first.

Configuring Local Entries

  1. For Value Readers, select the object you wish to configure and click "Tester" tab.
    • When configuring a Data Type, first click the ellipsis button at the end of the Local Extractor property with List Match selected to bring up the editing window.
  2. Make sure the "Expressions" sub-tab is selected.

  1. Under LOCAL ENTRIES, type the desired text to be extracted.
    • Hit Enter after each entry to extract multiple list items under one List Match.
  2. If needed, add a Prefix and Suffix Pattern to anchor your extraction to a regex pattern.
    • When using tabs as an anchor (\t) make sure Tab Marking is set to Enabled under Preprocessing in your "Properties" tab.
  3. Save and test your extraction.

Referencing Lexicons

  1. For Value Readers, select the object you wish to configure and click the "Tester" tab.
    • When configuring a Data Type, click the ellipsis button at the end of the Local Extractor property with List Match selected to bring up the editing window before continuing to the next step.
  2. Select the "Properties" tab.

  1. Click the arrow next to the Vocabulary property to expand its sub-properties.
  2. Click the ellipsis button at the end of the Included Lexicons property. This will open a new window where you can add pre-configured Lexicons.

  1. In the new window, click through the Projects and Folders until you find the desired Lexicon. Click the check boxes next to the desired Lexicons.
  2. Click OK to apply the Lexicons.

  1. Save and test your extraction.


Click here to return to the top of the section

See Also