2021:Data Rule (Node Type)

From Grooper Wiki

2021

This article is in development for the upcoming version of Grooper, Grooper 2021. The Value Reader is a new data extraction object in 2021. This information is incomplete and/or may change by the time of release.

The Data Rule object allows for complex validation and manipulation of a Data Model's Data Elements (Data Fields, Data Sections, and Data Tables) in Grooper.

This allows users to create a conditional hierarchy of actions to take if certain conditions met. These conditions are configured using .NET, LINQ and/or lambda expressions. When the expression is "triggered", either evaluating to "true" or "false", certain actions can be made. These include:

  • Calculate Value - This action sets the value of a Data Field or cells a Data Column, using calculate expressions to perform mathematical or concatenation operations of Data Elements.
  • Clear Item - This action clears the value of a Data Element.
  • Copy Item - This action copies or moves the value of a Data Element.
  • Parse Value - This action uses a regular expression pattern to return part of a Data Field's value or cell in a Data Column's value.
  • Raise Issue - This action adds an issue to the issue log, used for validating a Data Element. This action can also be used to flag the Data Element.

These trigger conditions and subsequent actions set on the Data Rules objects are executed through the Apply Rules activity after data is extracted from an Extract activity.

About

Some Basics About Expressions

Grooper makes use of expressions to validate extracted data and use extracted data to populate fields in a Data Model. Traditionally, this is configured on a Data Field object in a Data Model (or in the case of validating or calculating cells in a table, the Data Column object), using the Default Value, Calculated Value, or Is Valid properties.

For example, let's say we have several documents in a Batch. Each one contains W-2 wage reporting forms for various individuals and we want to do some basic tax filing calculation. In order to find someone's total income, it may not be quite as simple as pulling the listed wages from a single W-2. An individual might have multiple W-2s from multiple employers.

If an individual worked for three different employers over the course of a year, their total income would be the wages from all three W-2s added together. This is where expressions come in handy in Grooper. There is no extractible "Total Wages" field on the document. It's just three pages, each page a different W-2. There is no text data for an extractor to return that corresponds to all three W-2's wages field added together.

But we could create a Data Section in our Data Model to return the wages from each individual W-2 form, by adding a "Wages" Data Field and configuring its extraction. Then, we could create a "Total Wages" Data Field and use a Calculated Value expression to add up the results of each "Wages" Data Field in each section (each W-2 in this case).


Here, we have a simple Content Model set up to solve the problem described above.

  1. It has a Data Section named "W2 Info"
    • It has two child Data Fields: "Fed Wages" and "Employee SSN".
  2. The Data Section is configured to create one section for each W2 or each page in this case.
    • This document has three W2s (or three pages). Hence, we have three sections.
  3. The "W2 Info" Data Section's child "Fed Wages" Data Field is configured to return the "Wages, tips, other compensation" field for the W-2s.
  4. This "Total Income" Data Field will be configured to add up the results of the "Fed Wages" Data Field from each section.

  1. Here, we've selected the "Total Income" Data Field.
  2. Calculated Value property is configured with the expression W2_Info.SumOf("Fed Wages")
    • This will add the results of each "Fed Wages" Data Field in the "W2 Info" Data Section together.
  3. The Calculate Mode is set to Always Set.
    • This will always run the expression regardless of whether the Data Field is populated by an extractor (It's not in this case. The Value Extractor property is blank.).


FYI If you were to press the "Test Extraction" button at this point, the Calculated Value expression would appear to fail. However, it's actually configured correctly to do what we want.

This has to do with the scope we're testing extraction in our Data Model. With the "Total Wages" Data Field selected in the node tree, we've narrowed down the scope of our Data Model to just test within this Data Field's scope. But we're actually looking for information outside the "Total Wages" Data Field's scope. We're using information in the "W2 Info" Data Section's scope to add up the total wages for each W2 in its child "Fed Wages" Data Fields.

Once we select the parent Data Model in the node tree, we will be at a higher scope in the data hierarchy. At that point, the expression will run against the full scope and successfully execute.

  1. Upon extraction...
  2. The expression executes, adding up the "Fed Wages" Data Field values for each section.

Conditional Expressions and Data Rules

You can do a lot with expressions, even applying some conditional logic to their execution. If the condition is met, the expression executes. If not, it doesn't or something other expression executes.

In our example of documents containing W-2 forms we make some assumptions about the document. We assume each document contains a W-2 for a single individual. Each individual should only have one social security number. It would be problematic if their were multiple social security numbers extracted from the W-2 forms. This could indicate there are multiple W-2s for multiple individuals in a single document.

To account for this, you could use a more complex Calculated Value expression to only add up the "Fed Wages" Data Fields if the social security number was the same for each document. If the condition of their only being one social security number for each W-2 is met, the expression to add up the wages would execute. If not, it wouldn't. This is basic conditional logic. If "x" condition is met, then do "y". If there's only one social security number, then add up all the wages. Otherwise, do nothing (or something else). You could go another step further and add an Is Valid expression to flag the document if the social security numbers didn't match, as well.

However, the more complex a Data Model's data hierarchy, generally the more complex these conditions tend to be. The more conditions you add for an expression to execute, the more complex the expression becomes. This can result in very cumbersome expressions that are difficult to form and manage.

This is where the Data Rule object really shines. Data Rules allow you to use Trigger expressions to determine one or multiple subsequent Actions to take if that expressions evaluates to true (or false). This is also basic conditional logic. If the trigger expression is true, do the action. Otherwise, do nothing (or a different action). Furthermore, you can more easily create a complex hierarchy of conditions by adding child Data Rules to parent Data Rules. If the trigger expression evaluates to true, the child Data Rules will execute, with their own triggers and even own child Data Rules. This allows for simpler set up, execution, and management of more complex conditional expressions as well as some actions that fall outside normal expressions you can set up in a Data Field or Data Column.

  1. Here, we've added a Data Rule object to our Content Model's Local Resources folder.
    • As it is configured now, this Data Rule (when executed) will add up "W2 Info" Data Section's "Fed Wages" Data Field values for each document, populating the "Total Income" Data Field. As is currently, it does essentially the exact same thing as the Calculated Value property configuration of the "Total Income" Data Field discussed above.
  2. The first thing you'll always configure for a Data Rule is the Scope property. This determines at what level in a Content Model's Data Model hierarchy the Data Rule executes. This will determine which Data Elements are accessible.
    • In this case, we set the Scope to the full Data Model. This gives the most access to the most Data Elements in the Data Model. If we instead selected the "W2 Info" Data Section as the Scope instead, we would only have access to the Data Elements in that Data Section. This would not include the "Total Income" Data Field, which is what we're trying to populate here.
  3. The Trigger property controls the condition determining whether or not the action executed by the Data Rule is taken or not.
    • The Trigger can be a .NET, LINQ or lambda expression.
    • If left blank, as is the case here, the Trigger defaults to "true". In effect, the Data Rule will always execute its action.
    • We will revisit this Trigger soon.
  4. The True Action property determines what happens if the Trigger evaluates to "true". Or put another way, if the Data Rule's condition is met.
    • There are many possible actions detailed in the Actions section of this article. Each action type has its own set of configuration properties.
    • In this case, we have selected Calculate Value. This action type's configuration is very similar to how a Data Field's Calculated Value property is configured.
  5. The Target Field property controls what Data Field (or Data Columns in the case of cells in a Data Table) is affected by the action.
  6. The Value Expression property controls what populates the Target Field's Data Field. It is what is calculated (or maybe better put how it's calculated).
    • In this case we're using the exact same expression to add up the "Fed Wages" Data Fields in each of the "W2 Info" Data Section's sections as discussed above.

The Trigger

The Trigger property serves the purpose of establishing the condition that must be met in order for the Data Rule's action to be taken. These triggering conditions are also set using expressions. These expressions must return a Boolean "true" or "false" value. If the Trigger expression evaluates to "true", the True Action configuration is executed. If the Trigger expression returns "false", the False Action configured is executed (If it is configured. If left blank, no further action will be taken.)

In our case, something is wrong with our documents if the W2 forms have more than one social security number. Individuals should only have one social security number. If we added up all the wages for multiple W-2s with mismatched social security numbers, we would not be adding up the total income for an individual correctly. We'd end up with inaccurate data.

Luckily, there's an expression we could use to determine if our "W2 Info" Data Section has multiple social security numbers in its sections. This is a good opportunity for a LINQ expression. LINQ (or Language INtegrated Query) expressions are particularly helpful when navigating a Data Model's hierarchical structure to pull information from Data Sections and Data Tables.

Writer's Note: You aren't limited to just LINQ expressions for the Trigger. You can use any expression that returns a Boolean value, including standard.NET expressions, LINQ expressions, and lambda expressions. A LINQ expression just works well for this particular example.

(From sec In W2_Info Select sec.Employee_SSN).Distinct().Count() = 1

Let's break down this expression to understand what's going on.

(From sec In W2_Info Select sec.Employee_SSN).Distinct().Count() = 1

This is the "LINQ-iest" part of the expression. It's querying the extracted instances of our Data Model's objects, to return multiple results.

LINQ expressions always start with From (This indicates the data source from where are you querying the data). Next we declare a type variable we've named sec. We'll use later in the expression to return multiple instances of a Data Field in a Data Section (What you name it doesn't matter, just that you use the same name when you reference the variable later on in the query). The In clause determines the query's scope. We're looking for the "Employee SSN" Data Field in the "W2 Info" Data Section. The "W2 Info" Data Section is our scope. In W2_Info will only query instances (the results of its children Data Fields) in the sections produced by the "W2 Info" Data Section. The Select clause determines what values the query returns (or "selects"). We want information about the "Employee SSN" Data Fields. So, we've entered sec.Employee_SSN. Note, we've referenced the variable we declared at the start of the query, sec to do this.

Now the expression has some information it can work with. In this case, the social security numbers for each W-2 in the document (as returned by the "Employee SSN" Data Field for each section produced by the "W2 Info" Data Section).

(From sec In W2_Info Select sec.Employee_SSN).Distinct().Count() = 1

This part of the expression is counting the number of distinct values returned by the query. If the social security number is the same for each W2, this should evaluate to "1". If not, it will be larger.

(From sec In W2_Info Select sec.Employee_SSN).Distinct().Count() = 1

This is just an equivalency argument to give us a Boolean "true" or "false" value. If the left side of the argument (the expression (From sec In W2_Info Select sec.Employee_SSN).Distinct().Count()) counts a single unique social security number in each section is equivalent to the right side of the argument (i.e. "1 = 1") it will return "true", otherwise "false".

Actions

Once a Data Rule is triggered, what happens next is determined by the True Action and False Action properties. When the Trigger expression evaluates to true, the True Action is executed. When the Trigger expression returns false, the False Action is executed. This determines what action is taken once the trigger condition is met or not met.

This can be one of six choices:

  • Calculate Value
  • Raise Issue
  • Clear Item
  • Copy Item
  • Parse Value
  • Action List

How To

Create a Hierarchy of Data Rules Using Multiple Triggers and Actions