2.90:Fuzzy RegEx (Concept): Difference between revisions
Dgreenwood (talk | contribs) No edit summary |
Dgreenwood (talk | contribs) |
||
| Line 10: | Line 10: | ||
== About == | == About == | ||
{|cellpadding=10 cellspacing=5 | |||
|style="width:40%" valign=top| | |||
''Fuzzy RegEx'' is a '''''Match Mode''''' option for an extractor's regular expression pattern. Any time you can get to a "Pattern Editor" in Grooper you can take advantage of ''Fuzzy RegEx'' including: | |||
* When configuring a '''Data Format''' | |||
* When configuring the '''''Pattern''''' property of a '''Data Type''' | |||
* When choosing the ''Internal'' option for various extractors in a property panel and configuring its '''''Pattern''''' property | |||
* When choosing the ''Text Pattern'' option for various extractors in a property panel and configuring its '''''Pattern''''' property | |||
In the "Pattern Editor", you can enable ''Fuzzy RegEx'' in two steps: | |||
# Navigate to the "Properties" tab. | |||
# Select the '''''Mode''''' property and choose ''Fuzzy RegEx'' | |||
| | |||
[[File:Fuzzy-regex-about-01.png]] | |||
|} | |||
Revision as of 11:57, 12 November 2020
Fuzzy RegEx (also referred to as "fuzzy matching" or "fuzzy mode" or even just "fuzzy") allows regular expression patterns to match text within a set percentage of similarity. This can allow Grooper users to overcome unpredictable OCR errors when extracting data from documents.
Typically, regular expression will either match a string of text or it won't. If you're trying to match a word and the regex pattern is even a single character off from the text data, you will not return a result.
Fuzzy RegEx uses a Levenshtein distance equation to measure the difference between the regular expression and potential text matches. The percentage difference between the regex pattern and the matched text is expressed as a "confidence score" (also as a percentage). If the confidence is above a set threshold, the result is returned. If it is below the threshold, it is discarded.
For example, a text string that is 95% similar to the regex pattern may be off by just a single character. If the Minimum Similarity threshold is set to 90% the result would be returned, even thought the pattern doesn't match the text exactly.
About
|
Fuzzy RegEx is a Match Mode option for an extractor's regular expression pattern. Any time you can get to a "Pattern Editor" in Grooper you can take advantage of Fuzzy RegEx including:
|
