2021:Data Rule (Node Type): Difference between revisions
Dgreenwood (talk | contribs) |
Dgreenwood (talk | contribs) |
||
| Line 242: | Line 242: | ||
|- | |- | ||
|valign=top| | |valign=top| | ||
# | # Press the "Test Rule" button to test the '''Data Rule's''' execution. | ||
# The ''Calculate Value'' action's '''''Value Expression''''' configuration executes. | # The ''Calculate Value'' action's '''''Value Expression''''' configuration executes. | ||
# In this case, the expression adds up all the values in the "Total" column. | # In this case, the expression adds up all the values in the "Total" column. | ||
| Line 266: | Line 266: | ||
|- | |- | ||
|valign=top| | |valign=top| | ||
# | # Press the "Test Rule" button to test the '''Data Rule's''' execution. | ||
# See that the extracted value is replaced by the '''''Value Expression's''''' result. | # See that the extracted value is replaced by the '''''Value Expression's''''' result. | ||
| | | | ||
| Line 356: | Line 356: | ||
<tab name="Clear Item" style="margin:20px"> | <tab name="Clear Item" style="margin:20px"> | ||
=== Clear Item === | === Clear Item === | ||
The ''Clear Item'' action will clear the data in a '''Data Field''' if the '''''Trigger''''' condition is met. ''Clear Item'' will also clear a '''Data Column's''' data if a '''Data Table''' is selected as the '''''Scope'''''. This can provide Grooper users a method of removing data from a field or table column if certain conditions are met. For instance, if you know the data is invalid based of the '''''Trigger''''' expression's true/false evaluation, you may prefer to remove the index data rather than keep the invalid data. This could also be a method of redacting sensitive index data after it is exported to a secure database. (Note: For complete redaction, you would probably also want to use the '''Redact''' activity to black-bar or white out the document's image and a '''Correct''' activity to remove the data from the document's text data as well) | |||
{|cellspacing=10 cellpadding=5 | |||
|style="width:40%" valign=top| | |||
For instance, we've seen already situations where this intangibles table's "Grand Total" field on the document does not actually add up to the summation of the "Total" column. We could use a '''''Trigger''''' expression to check if the "Total" column adds up to the "Grand Total" field and clear the extracted "Grand Total" '''Data Field''' if it does not add up correctly. | |||
# For this document, the extracted values in the "Total" column are themselves accurate. | |||
# However, the extracted "Grand Total" is inaccurate. It does not add up to the summation of all the values in the "Total" '''Data Column'''. | |||
# We will configure this '''Data Rule''' to check if the values in the "Total" '''Data Column''' add up to the value in the "Grand Total" '''Data Field''' and clear it if it does not. | |||
| | |||
[[File:Data-rules-actions-12.png]] | |||
|- | |||
|valign=top| | |||
# We have the '''''Scope''''' set to the '''Content Model's''' '''Data Model''' level. | |||
#* We need access to both the "Intangibles" '''Data Table''' and the "Grand Total" '''Data Field''', both of which are children in the same level of the '''Data Model'''. | |||
# For our '''''Trigger''''' condition, we've used an expression to check if the summation of the "Total" '''Data Column's''' values are not equal to the "Grand Total" '''Data Field'''. | |||
#* <code>Intangibles_Table.SumOf("Total") <> Grand_Total</code> | |||
#* FYI: <code><nowiki><></nowiki></nowiki> | |||
# We've set the '''''True Action''''' property to ''Clear Item''. | |||
# All you need to configure for the ''Clear Item'' action is what field (or '''Data Column''' if scoped at a '''Data Table''' level) should be cleared. | |||
# In this case, we've selected the "Grand Total" '''Data Field''' using the '''''Element''''' property. | |||
| | |||
[[File:Data-rules-actions-13.png]] | |||
|- | |||
|valign=top| | |||
# Press the "Test Rule" button to test the '''Data Rule's''' execution. | |||
# In this case, the '''''Trigger''''' expression evaluates to true. | |||
# As the '''''True Action''''', the ''Clear Item'' action is applied, and the "Grand Total" '''Data Field''' is cleared. | |||
| | |||
[[File:Data-rules-actions-14.png]] | |||
|} | |||
</tab> | </tab> | ||
<tab name="Copy Item" style="margin:20px"> | <tab name="Copy Item" style="margin:20px"> | ||
=== Copy Item === | === Copy Item === | ||
The ''Copy Item'' action will copy a field's value and paste it into another field. Optionally, the value can be moved as well (like a cut and paste operation). This can be useful for situations when you need to move data around in a '''Data Model's''' hierarchy. This provides an easy way to move a '''Data Field's''' value into a '''Data Section's''' single or multiple section instances. | |||
{|cellpadding=10 cellspacing=5 | |||
|style="width:40%" valign=top| | |||
For example, let's say you need to move the "Grand Total" '''Data Field''' into each section instance of a multi-section '''Data Section'''. | |||
# We've configured this '''Data Section''' to return 11 section instances. | |||
# Using the ''Copy Item'' action, we can copy a single '''Data Field'''. | |||
# And then paste it into a '''Data Field''' for each of these 11 sections. | |||
# We will configure this '''Data Rule''' to do this. | |||
| | |||
[[File:Data-rules-actions-15.png]] | |||
|- | |||
|valign=top| | |||
# We've set the '''Data Rule's''' '''''Scope''''' property to the '''Content Model's''' '''Data Model.''' | |||
#* We need access to both the '''Data Model's''' child '''Data Field''' "Grand Total" and its child '''Data Section's''' own child '''Data Field''' "Grand Total Copy". The '''Content Model's''' '''Data Model''' includes all of these '''Data Elements'''. So it is the appropriate data hierarchy scope to use. | |||
# We're leaving the '''''Trigger''''' property blank and going straight to the '''''True Action''''' property. We've set this to ''Copy Item''. | |||
All you need to configure for the ''Copy Item'' action is the '''Data Field''' you're copying and what '''Data Field''' you're pasting the value to. | |||
#<li value=3> The '''''Source Element''''' property is the '''Data Field''' whose value you are copying. | |||
#* In this case the "Grand Total" '''Data Field'''. | |||
# The '''''Target Element''''' property is for the '''Data Field''' you are pasting the value into, populating it with the copied value. | |||
#* In this case the "Grand Total Copy" '''Data Field''' of the "Copy Section" '''Data Section'''. | |||
| | |||
[[File:Data-rules-actions-16.png]] | |||
|- | |||
|valign=top| | |||
# Press the "Test Rule" button to test the '''Data Rule's''' execution. | |||
# The '''''Source Element''''' is copied. | |||
# And it is pasted into the '''''Target Element'''''. | |||
# If copying into a '''Data Section''', as is the case here, the copied '''Data Field''' will populated the targeted '''Data Field''' in each section instance established by the '''Data Section'''. | |||
| | |||
[[File:Data-rules-actions-17.png]] | |||
|} | |||
</tab> | </tab> | ||
<tab name="Parse Value" style="margin:20px"> | <tab name="Parse Value" style="margin:20px"> | ||
=== | === Parse Value === | ||
</tab> | </tab> | ||
<tab name="Action List" style="margin:20px"> | <tab name="Action List" style="margin:20px"> | ||
=== | === Action List === | ||
</tab> | </tab> | ||
</tabs> | </tabs> | ||
== Data Rule Hierarchy == | |||
== How To == | == How To == | ||
=== Create a Hierarchy of Data Rules Using Multiple Triggers and Actions === | === Create a Hierarchy of Data Rules Using Multiple Triggers and Actions === | ||
Revision as of 10:48, 29 January 2021
|
2021 |
This article is in development for the upcoming version of Grooper, Grooper 2021. The Value Reader is a new data extraction object in 2021. This information is incomplete and/or may change by the time of release. |

The Data Rule object allows for complex validation and manipulation of a Data Model's Data Elements (Data Fields, Data Sections, and Data Tables) in Grooper.
This allows users to create a conditional hierarchy of actions to take if certain conditions met. These conditions are configured using .NET, LINQ and/or lambda expressions. When the expression is "triggered", either evaluating to "true" or "false", certain actions can be made. These include:
- Calculate Value - This action sets the value of a Data Field or cells a Data Column, using calculate expressions to perform mathematical or concatenation operations of Data Elements.
- Clear Item - This action clears the value of a Data Element.
- Copy Item - This action copies or moves the value of a Data Element.
- Parse Value - This action uses a regular expression pattern to return part of a Data Field's value or cell in a Data Column's value.
- Raise Issue - This action adds an issue to the issue log, used for validating a Data Element. This action can also be used to flag the Data Element.
These trigger conditions and subsequent actions set on the Data Rules objects are executed through the Apply Rules activity after data is extracted from an Extract activity.
About
Some Basics About Expressions

Grooper makes use of expressions to validate extracted data and use extracted data to populate fields in a Data Model. Traditionally, this is configured on a Data Field object in a Data Model (or in the case of validating or calculating cells in a table, the Data Column object), using the Default Value, Calculated Value, or Is Valid properties.
For example, let's say we have several documents in a Batch. Each one contains W-2 wage reporting forms for various individuals and we want to do some basic tax filing calculation. In order to find someone's total income, it may not be quite as simple as pulling the listed wages from a single W-2. An individual might have multiple W-2s from multiple employers.
If an individual worked for three different employers over the course of a year, their total income would be the wages from all three W-2s added together. This is where expressions come in handy in Grooper. There is no extractible "Total Wages" field on the document. It's just three pages, each page a different W-2. There is no text data for an extractor to return that corresponds to all three W-2's wages field added together.
But we could create a Data Section in our Data Model to return the wages from each individual W-2 form, by adding a "Wages" Data Field and configuring its extraction. Then, we could create a "Total Wages" Data Field and use a Calculated Value expression to add up the results of each "Wages" Data Field in each section (each W-2 in this case).
|
Here, we have a simple Content Model set up to solve the problem described above.
|
|||
|
|||
|
Conditional Expressions and Data Rules
You can do a lot with expressions, even applying some conditional logic to their execution. If the condition is met, the expression executes. If not, it doesn't or something other expression executes.
In our example of documents containing W-2 forms we make some assumptions about the document. We assume each document contains a W-2 for a single individual. Each individual should only have one social security number. It would be problematic if their were multiple social security numbers extracted from the W-2 forms. This could indicate there are multiple W-2s for multiple individuals in a single document.
To account for this, you could use a more complex Calculated Value expression to only add up the "Fed Wages" Data Fields if the social security number was the same for each document. If the condition of their only being one social security number for each W-2 is met, the expression to add up the wages would execute. If not, it wouldn't.
This is basic conditional logic. If "x" condition is met, then do "y". If there's only one social security number, then add up all the wages. Otherwise, do nothing (or something else). You could go another step further and add an Is Valid expression to flag the document if the social security numbers didn't match, as well.
However, the more complex a Data Model's data hierarchy (the more Data Sections and Data Tables it has), generally the more complex these conditions tend to be. The more conditions you add for an expression to execute, the more complex the expression becomes. This can result in very cumbersome expressions that are difficult to form and manage.
This is where the Data Rule object really shines. Data Rules allow you to use Trigger expressions to determine one or multiple subsequent Actions to take if that expressions evaluates to true (or false). This is also basic conditional logic. If the trigger expression is true, do the action. Otherwise, do nothing (or a different action). Furthermore, you can more easily create a complex hierarchy of conditions by adding child Data Rules to parent Data Rules. If the trigger expression evaluates to true, the child Data Rules will execute, with their own triggers and even own child Data Rules. This allows for simpler set up, execution, and management of more complex conditional expressions as well as some actions that fall outside normal expressions you can set up in a Data Field or Data Column.
A Basic Example Data Rule
|
|||
|
Data Rules are executed by the Apply Rules activity. After data is extracted by the Extract activity, any Data Rule referenced by the Apply Rules activity will alter the document's index data according to the Data Rule's configuration. You can test the Data Rule's results in Grooper Design Studio when the Data Rule is selected in the node tree. This will help you verify its configuration, giving you a preview of what would happen if that Data Rule was executed by the Apply Rules activity.
|
The Trigger
The Trigger property serves the purpose of establishing the condition that must be met in order for the Data Rule's action to be taken. These triggering conditions are also set using expressions. These expressions must return a Boolean "true" or "false" value. If the Trigger expression evaluates to "true", the True Action configuration is executed. If the Trigger expression returns "false", the False Action configured is executed (If it is configured. If left blank, no further action will be taken.)
|
In our case, something is wrong with our documents if the W2 forms have more than one social security number. Individuals should only have one social security number. If we added up all the wages for multiple W-2s with mismatched social security numbers, we would not be adding up the total income for an individual correctly. We'd end up with inaccurate data.
|
Luckily, there's an expression we could use to determine if our "W2 Info" Data Section has multiple social security numbers in its sections. This is a good opportunity for a LINQ expression. LINQ (or Language INtegrated Query) expressions are particularly helpful when navigating a Data Model's hierarchical structure to pull information from Data Sections and Data Tables.
Writer's Note: You aren't limited to just LINQ expressions for the Trigger. You can use any expression that returns a Boolean value, including standard.NET expressions, LINQ expressions, and lambda expressions. A LINQ expression just works well for this particular example.
(From sec In W2_Info Select sec.Employee_SSN).Distinct().Count() = 1
Let's break down this expression to understand what's going on.
(From sec In W2_Info Select sec.Employee_SSN).Distinct().Count() = 1
This is the "LINQ-iest" part of the expression. It's querying the extracted instances of our Data Model's objects, to return multiple results.
LINQ expressions always start with From (This indicates the data source from where are you querying the data). Next we declare a type variable we've named sec. We'll use later in the expression to return multiple instances of a Data Field in a Data Section (What you name it doesn't matter, just that you use the same name when you reference the variable later on in the query). The In clause determines the query's scope. We're looking for the "Employee SSN" Data Field in the "W2 Info" Data Section. The "W2 Info" Data Section is our scope. In W2_Info will only query instances (the results of its children Data Fields) in the sections produced by the "W2 Info" Data Section. The Select clause determines what values the query returns (or "selects"). We want information about the "Employee SSN" Data Fields. So, we've entered sec.Employee_SSN. Note, we've referenced the variable we declared at the start of the query, sec to do this.
Now the expression has some information it can work with. In this case, the social security numbers for each W-2 in the document (as returned by the "Employee SSN" Data Field for each section produced by the "W2 Info" Data Section).
(From sec In W2_Info Select sec.Employee_SSN).Distinct().Count() = 1
This part of the expression is counting the number of distinct values returned by the query (Technically, the .Distinct() expression is returning a subset of distinct values in the query's results. Then, the .Count() expression is counting the values in that subset). If the social security number is the same for each W2, there's only one distinct value. This should evaluate to "1". If not, it will be a larger number.
(From sec In W2_Info Select sec.Employee_SSN).Distinct().Count() = 1
This is just an equivalency argument to give us a Boolean "true" or "false" value. If the left side of the argument (the expression (From sec In W2_Info Select sec.Employee_SSN).Distinct().Count()) counts a single unique social security number in each section is equivalent to the right side of the argument (i.e. "1 = 1") it will return "true", otherwise "false".
|
If we use this expression as the Data Rule's Trigger, it will conditionally execute the True Action configured above only if it evaluates as true. Effectively, it will only add up all the wages for each W-2 only when the social security numbers for each W-2 are the same.
|
|
|
Actions
Once a Data Rule is triggered, what happens next is determined by the True Action and False Action properties. When the Trigger expression evaluates to true, the True Action is executed. When the Trigger expression returns false, the False Action is executed. This determines what action is taken once the trigger condition is met or not met.
This can be one of six choices:
- Calculate Value
- Raise Issue
- Clear Item
- Copy Item
- Parse Value
- Action List
Each action has its own configuration to execute the action, detailed below.
Calculate Value
The Calculate Value action will use a .NET, LINQ or lambda expression to populate a field with the expression's result. The possibilities here are as endless as the capabilities of these expressions. We can perform mathematical operations on numerical data. We can concatenate multiple string fields. We can perform incremental additions to date values. The Calculate Value action allows you to use any configurable expression to manipulate extracted data into a desired result. We've already seen one example of the Calculate Value action in the section of this article above. But let's look at another one.
|
In this example, we have a fairly simple report detailing costs of intangible services related to an oil drilling operation.
|
|
|
|
|
A Word of Caution: Overwriting Results
|
There is one important thing to note about the Calculate Value action. The Value Expression's calculated value will overwrite any existing data in a field. For example, take this document.
However, this isn't actually an accurate total. The document is wrong. The grand total of all the values in the "Total" column should add up to "$1,048,050.00" and not what we see here, "$1,111,000.00" |
|
|
|
|
Now, this may be what you want to do, but it may not be what you want to do. What if you don't want to overwrite the "Grand Total" Data Field it it's already populated? What if you only want to use the Calculate Value action to populate the field if it's blank? That's a great opportunity for a Trigger expression! If this were the case, we would only want to execute this Data Rule if the "Grand Total" Data Field is not there. We could use that as the condition to execute the True Action. All we need to do is figure out an expression that would evaluate to true or false if that's the case. The expression
|
Raise Issue
The Raise Issue action is useful for data validation. You may want to ensure two fields add up to a third field. You may want to ensure a date on the document is a date in the past or within a day range in the future. You may want to check if two fields are equal to each other. This is the realm of data validation. The Raise Issue action can log information in an issue log if conditions like these are not met.
The Raise Issue action will work in concert with the Trigger expression to log issues. If the Trigger expression returns true, and the Raise Issue action is selected as the True Action it will log a defined message in an issue log. You optionally have the capability to add a message category for issue message as well.
|
In this example, we have a fairly simple report detailing costs of intangible services related to an oil drilling operation. We expect the "Total" column to be the cells in the "Dry Hole" and "Completion" columns added together for each row. We will use the Raise Issue action to verify this.
|
|
|
|
|
When configuring a Data Rule, the "Diagnostics" tab will give you some more information on what's going on.
|
|
You may notice the message "Wrong Total" is a little generic. It doesn't give us much information about why the issue was raised. This is why the Log Message property is expression based. It allows you to access some additional information to populate the error message. |
|
Note: The Log Message must evaluate to a string value. This is why we've used the |
Clear Item
The Clear Item action will clear the data in a Data Field if the Trigger condition is met. Clear Item will also clear a Data Column's data if a Data Table is selected as the Scope. This can provide Grooper users a method of removing data from a field or table column if certain conditions are met. For instance, if you know the data is invalid based of the Trigger expression's true/false evaluation, you may prefer to remove the index data rather than keep the invalid data. This could also be a method of redacting sensitive index data after it is exported to a secure database. (Note: For complete redaction, you would probably also want to use the Redact activity to black-bar or white out the document's image and a Correct activity to remove the data from the document's text data as well)
|
For instance, we've seen already situations where this intangibles table's "Grand Total" field on the document does not actually add up to the summation of the "Total" column. We could use a Trigger expression to check if the "Total" column adds up to the "Grand Total" field and clear the extracted "Grand Total" Data Field if it does not add up correctly.
|
|
|
|
|
Copy Item
The Copy Item action will copy a field's value and paste it into another field. Optionally, the value can be moved as well (like a cut and paste operation). This can be useful for situations when you need to move data around in a Data Model's hierarchy. This provides an easy way to move a Data Field's value into a Data Section's single or multiple section instances.
|
For example, let's say you need to move the "Grand Total" Data Field into each section instance of a multi-section Data Section.
|
|
All you need to configure for the Copy Item action is the Data Field you're copying and what Data Field you're pasting the value to.
|
|
|
























