Regular Expression (Concept)
STUB |
This article is a stub. It contains minimal information on the topic and should be expanded. |
Regular Expression (or regex) is a standard syntax designed to parse text strings. This is a way of finding information in text. It is the primary method by which Grooper extracts and returns data from documents.
Using a standard syntax, a sequential line of characters is written to match a string of characters in the text. This line of characters written to match text is called a "pattern" and can potentially return multiple strings, not just one value. It will return any string of text matching the pattern.
This syntax can be used to match very specific strings of characters or written more generally to match several permutations of the pattern. For example, one can write a regular expression pattern to match a specific date or any date in a text block.
Literal String Matching
The most basic regex patterns are used to literally match a string of characters, such as a word or phrase. If you want to find the word "cat" in a block of text, the regex pattern Notice as well, what is matched is the string "cat". Even if that string exists in the middle of a word, like "concatenate", the pattern still matches. |
| ||||||
Regex patterns execute sequentially from left to right (just like the left-right read order of the English language). If you break the pattern down character by character, it becomes a little clearer what's happening. First the |
| ||||||
But the pattern continues. The Notice, as the pattern gets more specific, the number of matches decrease. The single letter "c" is more general, producing four results, where even just the two letters "ca" is more specific, producing three. |
| ||||||
By the time you get to the full pattern, If you want to return just the word "cat" and not the segment "cat" in "concatenate", you'd need to adjust your pattern to be even more specific (We will discuss methods to do this later). |
|