2.90:Fuzzy RegEx (Concept): Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Line 1: Line 1:
<blockquote style="font-size:14pt">
<blockquote style="font-size:14pt">
''Fuzzy RegEx'' allows regular expression patterns to match text within a set percentage of similarity.  This can allow Grooper users to overcome unpredictable OCR errors when extracting data from documents.
''Fuzzy RegEx'' (also referred to as "fuzzy matching" or "fuzzy mode" or even just "fuzzy") allows regular expression patterns to match text within a set percentage of similarity.  This can allow Grooper users to overcome unpredictable OCR errors when extracting data from documents.
</blockquote>
</blockquote>



Revision as of 11:20, 12 November 2020

Fuzzy RegEx (also referred to as "fuzzy matching" or "fuzzy mode" or even just "fuzzy") allows regular expression patterns to match text within a set percentage of similarity. This can allow Grooper users to overcome unpredictable OCR errors when extracting data from documents.

Typically, regular expression will either match a string of text or it won't. If you're trying to match a word and the regex pattern is even a single character off from the text data, you will not return a result.

Fuzzy RegEx uses a Levenshtein distance equation to measure the difference between the regular expression and potential text matches. The percentage difference between the regex pattern and the matched text is expressed as a "confidence score" (also as a percentage). If the confidence is above a set threshold, the result is returned. If it is below the threshold, it is discarded.

For example, a text string that is 95% similar to the regex pattern may be off by just a single character. If the Minimum Similarity threshold is set to 90% the result would be returned, even thought the pattern doesn't match the text exactly.

About