2023:Pattern Match (Value Extractor): Difference between revisions
No edit summary |
|||
| Line 90: | Line 90: | ||
# Notice that only the first date was returned. | # Notice that only the first date was returned. | ||
| | | | ||
[[File:2023 Pattern Match Date Match Steps 2 | [[File:2023 Pattern Match Date Match Steps 1 and 2 redux Copy.png]] | ||
|- | |- | ||
|valign=top style="width:40%"| | |valign=top style="width:40%"| | ||
#<li value= | #<li value=3> Now try: | ||
#*<code>\d{2}/\d{2}/\d{2}</code> | #*<code>\d{2}/\d{2}/\d{2}</code> | ||
# Kind of picks up both, except that the last two digits in the year of the first date aren't returned. So, this regex pattern won't work either. | # Kind of picks up both, except that the last two digits in the year of the first date aren't returned. So, this regex pattern won't work either. | ||
| Line 100: | Line 100: | ||
|- | |- | ||
|valign=top style="width:40%"| | |valign=top style="width:40%"| | ||
#<li value= | #<li value=5> So, how are we going to return both dates completely? Keep in mind that you can dictate a range of values within the curly braces. Hence: | ||
#*<code>\d{2}/\d{2}/\d{2,4}</code> | #*<code>\d{2}/\d{2}/\d{2,4}</code> | ||
#** <code>\d{2,4}</code> tells Grooper to look for anywhere from two to four digits for the year. Since YY and YYYY fall within the range set, the regex pattern will extract them. | #** <code>\d{2,4}</code> tells Grooper to look for anywhere from two to four digits for the year. Since YY and YYYY fall within the range set, the regex pattern will extract them. | ||
Revision as of 09:48, 30 January 2023
| WIP |
This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly. This tag will be removed upon draft completion. |
Pattern Match is an Extractor Type found in Grooper. This extractor primarily uses regular expression (regex) for general data extraction.
About
Pattern Match is one of the most commonly used extractors for general data. As per its name, it extracts data from a document matching a regex pattern entered into the Value Pattern.
This extractor is useful when you want to extract text data matching a particular pattern across a document, such as dates or social security numbers. For example, the format MM/DD/YYYY can be matched with the regex pattern: \d{2}/\d{2}/\d{4}.
For more information on regex, click the following link: RegexOne
How To
Pattern Match can be configured on both Data Type and Value Reader objects.
Configuring by Object Type
Configuring on a Value Reader
|
||
|
||
|
Configuring on a Data Type
The Data Type is a little more involved when picking out Pattern Match.
|
||
|
||
|
This will bring up the Extractor Editor window
|
Regex Examples for Pattern Match
Dates
|
Take note of the format of the date(s) on the document. The document here has dates in both the MM/DD/YYYY and MM/DD/YY format. Thus, we will write a regex pattern that will extract both dates.
|
||
|
||
|
Currency
For this example, the pattern provided will match all currency data listed.
|
|
|
|
|
Prefix and Suffix Patterns
Prefix and Suffix Patterns act as anchors to which you can tether the data you wish to extract. As one would expect, a Prefix Pattern matches what comes before your text matched by regex pattern, a Suffix Pattern is concerned with what comes after.
For example, let's say that you want to extract data on its own line, like the title of a section. While you can enter just the title, you might get false positives if the word(s) that make up the title appear anywhere else on the document. Thus, your Prefix and Suffix Patterns will be:
Prefix Pattern:[\n\t]
Suffix Pattern:[\r\t]













