2023:Pattern Match (Value Extractor): Difference between revisions
| Line 51: | Line 51: | ||
</tab> | </tab> | ||
</tabs> | </tabs> | ||
=== Prefix and Suffix Patterns === | |||
== See Also: == | == See Also: == | ||
* [[Value Reader]] | * [[Value Reader]] | ||
Revision as of 14:29, 17 January 2023
| WIP |
This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly. This tag will be removed upon draft completion. |
Pattern Match is an Extractor Type found in Grooper. This extractor primarily uses regular expression (regex) for general data extraction.
About
Pattern Match is one of the most commonly used extractors for general data. As per its name, it extracts data from a document matching a regex pattern entered into the Value Pattern.
This extractor is useful when you want to extract text data matching a particular pattern across a document, such as dates or social security numbers. For example, the format MM/DD/YYYY can be matched with the regex pattern: \d{2}/\d{2}/\d{4}.
For more information on regex, click the following link: [1]
How To
Pattern Match can be configured on both Data Type and Value Reader objects.
Configuring by Object Type
Configuring on a Value Reader
Upon creating your Value Reader, you will see three tabs, Value Reader, Tester, and Advanced. To create a Value Reader that uses Pattern Match, select the Value Reader tab. From there, on the Extractor property, you will select the icon on the far right, and from the drop-down menu, select Pattern Match. Upon selection, click the Tester tab, and in the Value Pattern box, enter the text, or regex pattern of the text you wish to extract.
Configuring on a Data Type
The Data Type is a little more involved when picking out Pattern Match. To select the extractor, create your Data Type, and select the ellipses icon to the far right of Local Extractor. select Pattern Match form the dropdown menu, save, and click the Tester tab. Notice that the extractor selection has carried over. From there, select the ellipses to the right of the extractor type. This will open a window with the Value Pattern, whereby you can enter a pattern for the text you would like to extract.
Extracting Data
Dates
Now that the extractor has been set up, how to go about making it work? Depends on the data you want. Dates for example come in a variety of formats, but for the sake of simplicity, let's say that you have a date written in the format of MM/DD/YYYY. The expression best suited to extracting the date would be \d{2}/\d{2}/\d{4}. MM/DD/YY? \d{2}/\d{2}/\d{2} (TIP:\d and [0-9] both return digits. \d is better than [0-9] for dates as it makes the regex simpler.) For dates written as MM-DD-YY (or MM-DD-YYYY), substitute the forward slashes (/) with hyphens (-).
Social Security Numbers (SSN)
Social Security Numbers are similar to dates. Technically, they are simpler, as SSNs don't have as many variations (if any) as dates do. SSNs consist of nine digits, formatted ###-##-####. Thus, the regex used in Pattern Match will be:
\d{3}[-]\d{2}[-]\d{4}