2023:List Match (Value Extractor): Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Line 17: Line 17:
The '''''List Match''''' is one of the the simplest extractors used in Grooper. It is designed to return values matching one or more items in a defined list. This can be used to extract numbers, specific words, or full phrases contained within a document. A '''''List Match''''' extractor returns an exact match including any spaces, numbers, punctuation, or special characters.  
The '''''List Match''''' is one of the the simplest extractors used in Grooper. It is designed to return values matching one or more items in a defined list. This can be used to extract numbers, specific words, or full phrases contained within a document. A '''''List Match''''' extractor returns an exact match including any spaces, numbers, punctuation, or special characters.  


<!--In here put information on typing in the list as a local resource or lexicons-->
To configure a '''''List Match''''', you can input the desired extracted value as a '''''Local Entry''''' or reference a pre-configured '''Lexicon'''.  
To configure a '''''List Match''''', you can input the desired extracted value as a '''''Local Entry''''' or reference a pre-configured '''Lexicon'''.  


Line 58: Line 57:
# In your '''Node Tree''', create or select a '''Data Type'''.
# In your '''Node Tree''', create or select a '''Data Type'''.
#* Visit the [[Data Type]] Wiki Page for instructions on how to create a '''Data Type'''.
#* Visit the [[Data Type]] Wiki Page for instructions on how to create a '''Data Type'''.
# Select the ''Data Type'' tab.
# Select the "Data Type" tab.
# Click the drop down list next to ''Local Extractor'' and select '''''List Match'''''.  
# Click the drop down list next to '''''Local Extractor''''' and select ''List Match''.  


|
|
Line 71: Line 70:
===Local Entries vs Lexicons===
===Local Entries vs Lexicons===


A '''''List Match''''' can be configured using a '''''Local Entry''''' or a '''''Lexicon'''''. '''''Local Entries''''' are simple and easy to set up, especially if you only need to add a few entries. If you plan to extract a large amount of information multiple times across different objects, it might be more efficient to set up a '''''Lexicon''''' to reference first.  
A '''''List Match''''' can be configured using a '''''Local Entry''''' or a '''Lexicon'''. '''''Local Entries''''' are simple and easy to set up, especially if you only need to add a few entries. If you plan to extract a large amount of information multiple times across different objects, it might be more efficient to set up a '''Lexicon''' to reference first.  


<tabs style="margin:20px">
<tabs style="margin:20px">
Line 83: Line 82:
# Select the object you wish to configure and click "Tester" tab.  
# Select the object you wish to configure and click "Tester" tab.  
#* When configuring a '''Data Type''', first click the ellipsis button next to your ''List Match'' selection to bring up the editing window.  
#* When configuring a '''Data Type''', first click the ellipsis button next to your ''List Match'' selection to bring up the editing window.  
# Make sure the ''Expressions'' sub-tab is selected.
# Make sure the "Expressions" sub-tab is selected.


|
|
Line 91: Line 90:
|valign=top style="width:40%"|
|valign=top style="width:40%"|
#<li value=3> Under '''''LOCAL ENTRIES''''', type the desired text to be extracted.  
#<li value=3> Under '''''LOCAL ENTRIES''''', type the desired text to be extracted.  
#* Hit Enter after each entry to extract multiple text segments under one ''List Match''.  
#* Hit Enter after each entry to extract multiple text segments under one '''''List Match'''''.  
# If needed, add a '''''Prefix''''' and '''''Suffix Pattern''''' to anchor your extraction.  
# If needed, add a '''''Prefix''''' and '''''Suffix Pattern''''' to anchor your extraction.  
#*When using tabs as an anchor (<code>\t</code>) make sure ''Tab Marking'' is set to Enabled under ''Preprocessing'' in your ''Properties'' tab.
#*When using tabs as an anchor (<code>\t</code>) make sure '''''Tab Marking''''' is set to ''Enabled'' under '''''Preprocessing''''' in your "Properties" tab.
# Save and test your extraction.
# Save and test your extraction.


Line 111: Line 110:


# Select the object you wish to configure and click the "Tester" tab.  
# Select the object you wish to configure and click the "Tester" tab.  
#* When configuring a '''Data Type''', first click the ellipsis button next to your ''List Match'' selection to bring up the editing window.  
#* When configuring a '''Data Type''', click the ellipsis button next to your ''List Match'' selection to bring up the editing window before continuing to the next step.  
# Select the "Properties" tab and open up the '''''Vocabulary''''' property.  
# Select the "Properties" tab and open up the '''''Vocabulary''''' property.  
# Click the ellipsis button next to the ''Included Lexicons'' property. This should open a new window where you can add pre-configured Lexicons.  
# Click the ellipsis button next to the '''''Included Lexicons''''' property. This should open a new window where you can add pre-configured '''Lexicons'''.  
|
|
[[File:2023-List Match-How To 05.png]]
[[File:2023-List Match-How To 05.png]]
Line 120: Line 119:
|valign=top style="width:40%"|
|valign=top style="width:40%"|


#<li value=4> In the new window, click through the '''''Projects''''' and '''''Folders''''' until you find the desired Lexicon. Click the check boxes next to the desired Lexicons.
#<li value=4> In the new window, click through the '''Projects''' and '''Folders''' until you find the desired '''Lexicon'''. Click the check boxes next to the desired '''Lexicons'''.
# Click ''OK'' to apply the Lexicon.  
# Click ''OK'' to apply the '''Lexicon'''.  


|
|

Revision as of 12:18, 25 January 2023

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.

A List Match is an extractor type that can be used when configuring several data extraction tools such as a Value Reader or Data Type. It is designed to return values matching one or more items in a defined list. By default, the List Match extractor does not use or require regular expressions (regex).

About

The List Match is one of the the simplest extractors used in Grooper. It is designed to return values matching one or more items in a defined list. This can be used to extract numbers, specific words, or full phrases contained within a document. A List Match extractor returns an exact match including any spaces, numbers, punctuation, or special characters.

To configure a List Match, you can input the desired extracted value as a Local Entry or reference a pre-configured Lexicon.

Unlike a Pattern Match, the List Match extractor does not use or require regular expressions by default, but regex can be enabled in the properties menu. Similar to a Pattern Match, Suffix and Prefix Patterns can be added to help anchor the expression and limit the amount of false positives extracted.

How To

A List Match is most commonly used when configuring objects such as Value Readers or Data Types. It is great for extracting text information such as:

  • Specific company names
  • Field labels
  • Headers and Footers
  • Full phrases
  • Exact numbers

If the information you need to extract follows a specific pattern, such as a date or social security number, then it may be better to consider a different extractor like a Pattern Match.

Configuring by Object Type

  1. In your Node Tree, create or select a Value Reader.
    • Visit the Value Reader Wiki Page for instructions on how to create a Value Reader.
  2. Select the "Value Reader" tab.
  3. Click the drop down list next to Extractor and select List Match.

  1. In your Node Tree, create or select a Data Type.
    • Visit the Data Type Wiki Page for instructions on how to create a Data Type.
  2. Select the "Data Type" tab.
  3. Click the drop down list next to Local Extractor and select List Match.

Local Entries vs Lexicons

A List Match can be configured using a Local Entry or a Lexicon. Local Entries are simple and easy to set up, especially if you only need to add a few entries. If you plan to extract a large amount of information multiple times across different objects, it might be more efficient to set up a Lexicon to reference first.

  1. Select the object you wish to configure and click "Tester" tab.
    • When configuring a Data Type, first click the ellipsis button next to your List Match selection to bring up the editing window.
  2. Make sure the "Expressions" sub-tab is selected.

  1. Under LOCAL ENTRIES, type the desired text to be extracted.
    • Hit Enter after each entry to extract multiple text segments under one List Match.
  2. If needed, add a Prefix and Suffix Pattern to anchor your extraction.
    • When using tabs as an anchor (\t) make sure Tab Marking is set to Enabled under Preprocessing in your "Properties" tab.
  3. Save and test your extraction.

  1. Select the object you wish to configure and click the "Tester" tab.
    • When configuring a Data Type, click the ellipsis button next to your List Match selection to bring up the editing window before continuing to the next step.
  2. Select the "Properties" tab and open up the Vocabulary property.
  3. Click the ellipsis button next to the Included Lexicons property. This should open a new window where you can add pre-configured Lexicons.

  1. In the new window, click through the Projects and Folders until you find the desired Lexicon. Click the check boxes next to the desired Lexicons.
  2. Click OK to apply the Lexicon.

  1. Save and test your extraction.



See Also