2023.1:Lexicon (Node Type)
|
STUB |
This article is a stub. It contains minimal information on the topic and should be expanded. |
A lexicon is a list of words, phrases, or other information. Grooper Lexicons are also text-based lists referenced in various ways by other objects.
About
Lexicons are divided into two parts, Type and Language.
|
Type specifies how data entered into the Lexicon will be interpreted. There are three Types:
|
|||
|
The Uses of a Lexicon
Lexicons can be used to:
- Look up values during data extraction
- For example, an extractor could be set up to return first or last names from a Lexicon of common first or last names.
- Translate extracted values from one value to another
- For example, an extractor could be set up to look the full name of a company (ACME Document Corporation) in a Lexicon and translate the result to an abbreviated version (ADC)
- Assign weighting values for fuzzy matching
- Determine the frequency of values within a document set
- and more.
Lexicons in Classification
One of the places where Lexicons shine most is Classification. After all, making use of a document's language is a key point of classification. Lexicons are a great way to help ease the process.
As you can see here, we have two Lexicons we're using for Classification: English Words and English Stop Words. English Words contains every word in the English language, while English Stop Words contains every word that would be considered unimportant and could impede Classification; words like article adjectives, "a", "an", and "the". These words appear frequently just by virtue of their nature within the English language. While they're vital for sentence construction, they can interfere with Classification. As or the second Lexicon, it contains the words most commonly used throughout the English language—words that, unlike the Stop Words, could aid in Classification.
Of course, a Lexicon's only job is to be the dictionary where words are stored. How do they aid in Classification?
Lexicons in Data Extraction
Another area for Lexicons is Word and List Matching. For example, if you have a specific list of numerous names, words, or phrases that you want to capture without making several different Data Types, then a Lexicon can come in handy. Just enter your list of string data, line by line, and use reference your Lexicon for extraction, and Grooper will do the heavy lifting for you, as shown below.