Pattern-Based (Collation Provider)

From Grooper Wiki
(Redirected from Pattern-Based)

This article was migrated from an older version and has not been updated for the current version of Grooper.

This tag will be removed upon article review and update.

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 2023.1

Pattern-Based is a Collation Provider option for pin Data Type extractors. Pattern-Based uses regular expressions to sequence returned results into a final result set.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

About

Pattern-Based Collation is a collation method for Data Type extractors that allows you to write a "wrapper" expression that can reference other extractors' results as variables.

Think of it as putting multiple extractors inside one RegEx pattern. When a Data Type that is set to Pattern-Based Collation has at least one child (or referenced) extractor, you can reference that extractor as a variable by preceding it's name with an "@" in the pattern. (This will also bring up the intellisense prompt, which will list out any child extractors that can be referenced.)

Pattern-Based Collation is well-suited to unstructured "natural language" documents. Since extractors are included as inline variables, you can define a more complex context (such as a sentence) surrounding the data you wish to extract.

Consider the following example:

Let's say we wanted to collect the highlighted text:

"entered into this ___ day of _____________ _____"

Using Pattern-Based Collation with the appropriate child or referenced extractors, you could write one single "wrapper" pattern like:

entered into this @Day day of @Month @Year

Pattern-Based Collation is especially useful in contexts where the expressions for the referenced extractors are subject to change. Using the above example, say we were working on a collection of documents that contained 10 unique Document Types that all presented the date in a different verbal format, but always in a way that it contained the day, month, and year. So we build ten different "wrapper" extractors (one for each Document Type), and set them to Pattern-Based Collation. Each one has "Day," "Month," and "Year" selected under "referenced extractors." This way, our ten different contexts (our "wrappers") all rely on the same handful of extractors to pull the same data elements.

How To

In this example, using the Pattern Match Collation, we are going to extract the phrase "entered into this X day of Y Z" where "X" is the day, "Y" is the Month, and "Z" is the year.

Creating the Parent and Child Objects

  1. Make a Data Type with child objects that extract different parts of the text segment you with to return.
    • In this case we have three child objects that extract the Day, Month, and Year.
  2. Alternatively, you can reference other extractors in your project rather than having child objects. Just use the Referenced Extractors property to do so.


  1. The first child object in our example is extracting the day in our pattern.
  2. The Value Reader has been set to a pattern match and the pattern \d{1,2}th has been entered to collect "Xth" where X is a 1 or 2 digit number.
  3. On the page this Value Reader is returning "6th".


  1. The second child object is set to a List Match collecting the month.


  1. The last child object is set to a Pattern Match to collect 4 digit numbers, so it should capture the year.


Setting the Pattern-Based Collation Property

  1. Click on the parent Data Type.
  2. Click on the hamburger icon to the right of the Collation property.
  3. Select Pattern Based from the drop down.


Entering in the Value Pattern

  1. Open up the Collation property and then click the ellipsis icon to the right of the Value Pattern property.


  1. Start writing your pattern in the "Value Pattern" window. When you get to the place where you need to use one of your child extractors, type in the @ symbol.
  2. An intellisense drop down will appear with extractors considered within the scope of the Data Type. Select the desired extractor from the drop down or finish typing it in.


  1. Finish writing your pattern, adding each child or referenced extractor using the @ symbol.
  2. Click "OK" in the top right corner of the window to save.


  1. Now the text segment "entered into this 6th day of November 2016" is being returned.