Data Model Expressions

From Grooper Wiki

"Data Model Expressions" are a subset of code expressions in Grooper.

  • Code expressions are snippets of .NET code that are evaluated at runtime to compute values, determine workflow logic, or validate data.
  • Data Model expressions refer to expressions added to Data Fields and Data Columns that are used to calculate and validate field values.

FYI

For brevity's sake, we will refer to "Data Fields" and "Data Column" cells as "fields" throughout this article.

Data Model expressions are configured using the following Data Field and Data Column properties:

  • "Default Value" - Generates a default value for the field.
  • "Calculated Value" - Calculates an expected value for the field based on other field values. This can be used to validate or automatically populate the field.
  • "Is Valid" - A Boolean expression that determines if the field's value is valid.
  • "Is Required" - A Boolean expression that determines if the field is required (meaning it cannot be empty).


Default Value Expressions

"Default Value" expressions determine the default value for a Data Field or Data Column cell.

  • Default values will be overridden by any subsequent data collection event: Value Extractor results, Fill Method (AI Extract) results, or Lookup Specification results
  • You cannot reference Variable Definitions or other Data Field or Data Column values in a Default Value expression. Use "Calculated Value" expressions instead.

Default Value expressions are useful for pre-populating fields with:

  • Static values
  • System-derived values
  • Document metadata

Return Type

Varies - Determined by the field's Value Type

Default Value expressions must produce a value compatible with the Data Field/Data Column's "Value Type". Otherwise, the field will throw an error or produce no value on extraction.

  • Ex: The Data Field's "Value Type" is set to String. The expression must produce a string value.
  • Ex: The Data Field's "Value Type" is set to DateTime. The expression must produce a datetime value.

Examples

Purpose Expression
Default to a literal string value. Value must be enclosed in quotes.
"Literal value"
Default to a literal numeric value.
25.00
Global/System Variables
Default to the current date and time
Now
Default to the current date and time, formatted.
Now.ToString("yyyy-MM-dd")
Default to date and time 30 days from now.
DateAdd("d", 30, Now)
Default to date and time 30 days from now, formatted.
DateAdd("d", 30, Now).ToString("yyyy-MM-dd")
Default to the name of the current user.
My.User.Name
Generate a unique identifier
Guid.NewGuid
Document metadata - links
Returns the name of the link (e.g. "Import" or "Export") from a Batch Folder's content link.
Link.LinkName
Returns the attached file's path, including the file's name from a Batch Folder's content link.
Link.FullPath
Returns the attached file's path, not including the file's name from a Batch Folder's content link.
Link.Path

Returns the first segment in a file path from a Batch Folder's content link.

For example, "servername" in "servername\folder\subfolder\file.pdf"
Link.PathSegments(0)
Returns the linked file's full filename, extension included (the "linked object") from a Batch Folder's content link.
Link.ObjectName
Returns the linked file's filename without the extension from a Batch Folder's content link.
IO.Path.GetFileNameWithoutExtension(Link.ObjectName)

Return metadata associated with a known Content Link type (FileSystemLink, MailLink, SftpLink, Cmis.CmisLink, etc).

Replace "FileSystemLink" with the Content Link type
DirectCast(Link,FileSystemLink).Filename
DirectCast(Link,FileSystemLink).CreatedBy
DirectCast(Link,FileSystemLink).CreatedTime
DirectCast(Link,FileSystemLink).LastModifiedTime
Document metadata - Batch Folder attachment
Returns the attachment's filename, with extension.
Folder.AttachmentFileName
Returns the attachment's filename, without extension.
IO.Path.GetFileNameWithoutExtension(Folder.AttachmentFileName)
Returns the attachment's MIME type
Folder.AttachmentMimeType
Returns the attachment's file extension
Folder.AttachmentFileExtension
Document metadata - MIME type specific data

Return metadata associated with email messages

DirectCast(Handler,MailMimeTypeHandler).Subject
DirectCast(Handler,MailMimeTypeHandler).To
DirectCast(Handler,MailMimeTypeHandler).From
DirectCast(Handler,MailMimeTypeHandler).Date

Return metadata associated with PDF files

DirectCast(Handler, PdfMimeTypeHandler).Author
DirectCast(Handler, PdfMimeTypeHandler).Creator
DirectCast(Handler, PdfMimeTypeHandler).CreationDate
DirectCast(Handler, PdfMimeTypeHandler).Title
DirectCast(Handler, PdfMimeTypeHandler).Subject
Document metadata - more Batch Folder info
Default to the Content Type (Document Type) assigned to the document (Batch Folder)
ContentTypeName

Default to the parent Content Type of the Content Type (Document Type) assigned to the document (Batch Folder).

This is helpful for users trying to populate a value for a Document Type's parent Content Category.
Folder.ContentType.ParentNode.DisplayName
Default to the current document's Batch Folder ID (GUID).
Folder.Id
Default to the current document's Batch ID (GUID).
Folder.Batch.Id


Calculated Value Expressions

"Calculated Value" expressions computes a value based on the values of other Data Fields/Data Column cells, similar to a formula in a spreadsheet. These expressions are commonly used to perform mathematical operations and string manipulations.

These expressions can inspect the Grooper node tree, current Batch, document metadata (such as linked file paths and attachment information) and system data to calculate the desired value.

These expressions can be used in two ways:

  • To validate existing field values
  • To populate empty fields with calculated (or manipulated) values
    • Calculated Value expressions can also conditionally do both, validating an extracted field if present, and populating the field if not. See Calculate Modes for more information.

Return Type

Varies - Determined by the field's Value Type

Calculated Value expressions must produce a value compatible with the Data Field/Data Column's "Value Type". Otherwise, the field will throw an error or produce no value on extraction.

  • Ex: The Data Field's "Value Type" is set to String. The expression must produce a string value.
  • Ex: The Data Field's "Value Type" is set to DateTime. The expression must produce a datetime value.

Calculate Modes

Calculated Value expressions can execute using one of three modes, set by configuring the "Calculate Mode" property:

  • Validate - Used for validation-only scenarios. The expression will validate the field's value.
  • Always Set - Used for population-only scenarios. The expression will populate the field.
  • Set If Empty - Used for mixed scenarios. The expression will validate the field if a value is present, otherwise populated it.
Validate

Validate mode will check that the field's value mathematically satisfies the Calculated Value expression. If it does not, puts the field in an error state.

  • The error message on the field will show the difference between the field’s value and the expected result of the expression.
    • Ex: FieldName Calculation Error - Expected=10, Difference=3
  • As this mode pertains to mathematical validation only, it only works with numerical data types (Int, Decimal, Double).
  • For non-mathematical validation, use an "Is Valid" expression.
Always Set

"Always Set" will populate the field with the Calculated Value expression's result (unless the expression fails to produce a result).

  • Calculated Value expressions can work when the Data Field/Data Column's Value Extractor is configured, and when it is unconfigured.
    • Value Extractor unconfigured - The Calculated Value expression is typically the primary mode of generating the fields value, using system values, document/Batch Folder metadata, or calculating a value based on the results of other fields in the Data Model.
    • Value Extractor configured - The Calculated Value expression is typically used to manipulate the extracted value or normalize it to some desired format.
  • When the Calculated Value expression references other fields in the Data Model, if any of the component fields are modified, the Calculated Value expression recalculates and updates its field value automatically.
    • Ex: FieldA + FieldB This expression just adds the values of two fields together. If the value of "FieldA" changes (say a Review user edits it in the Data Viewer), the expression automatically recalculates its field is updated with the new sum.
Set If Empty

"Set If Empty" mode acts as a mix between field validation and population. If a value is present, it will validate it. If not, it will populate it'

  • If the field was extracted, the Calculated Value expression will simply verify it satisfies the expression.
  • If it the extractor fails and it's still blank, the Calculated Value expression will run and fill the field with its result.
  • In other words, if there is a value present, it behaves like "Validate" mode. If no value is present, it behaves like "Always Set" mode.

Examples

Math related calculations
Addition of multiple fields IntegerField1 + IntegerField2
DecimalField1 + DecimalField2 + DecimalField3
Rounding Math.Round(DecimalField1, 4)
Math.Round(DecimalField1 * DecimalField2, 2)
Non-integer addition (e.g. of date values) DateAdd("d", 30, DateField1)
DateAdd("yyyy", 1, DateField1)
DateAdd("m", -3, DateField1)
String concatenation and manipulation
Concatenation of multiple fields String.Concat(StringField1, StringField2)
String.Concat(StringField1, StringField2, StringField3)
String.Concat(StringField1, StringField2, StringField3, StringField4)
Reformatting / Normalization of values StringField1.Replace("\", "_")
StringField1.Replace("\", "")
Substring calculation

Given the string ABC123456XXXX654321YYY:

StringField1.Substring(0, 3) returns ABC
StringField1.Substring(3, 6) returns 123456
StringField1.Substring(9, 4) returns XXXX
StringField1.Substring(StringField1.Length - 3) returns YYY
CMIS Content Links

Get properties of a CMIS Content Link.

  • Use this to return property values for a document linked in a CMIS Repository.
  • Replace propertyName with the property's name. Example: The Subject property for an email in an Exchange CMIS Repository.
CurrentDocument.ContentLink.GetCustomValue("propertyName").ToString
Misc expressions

Getting the location coordinates of a field on the document

  • This could be used to determine the coordinates and size of an extracted value on a document.
  • Note: This returns a logical rectangle's location in inches.
GetFieldInstance("Field Name").Location.ToString
Examples that could be used in Validate mode
Verify a "Total" field adds up to the sum of the "Subtotal" and "Sales Tax" fields. Subtotal + Sales_Tax
Verify a "Total" field adds up to the sum of all "Line Total" cell values in a "Line Items" Data Table Line_Items.SumOf("Line Total")
Verify a "Total Hours" field adds up to all the "Earned Hours" values in a multi-instance "Semester" Data Section Semester.SumOf("Earned Hours")


Is Valid Expressions

"Is Valid" expressions determine whether a field value meets validation criteria. These expressions help ensure extracted data is formatted appropriately, is balanced appropriately, and meets other business requirements.

Invalid fields are placed in an error state, alerting Review users to the issue in the Data Viewer.

Return Type

Boolean

Is Valid expressions must return a Boolean (True/False) value:

  • When the expression evaluates to "True" the field is valid.
  • When the expression evaluates to "False" the field is invalid. Invalid fields are set to an error state.
    • The default error message is "Validation Expression failed".
    • You may write a custom error message by configuring the "Validate Message" property.

Examples

Purpose Expression
Date in past / future
DateField1 < Now
DateField1 >= DateAdd("d", 30, Now)
Equality / inequality of two fields
StringField1 = StringField2
IntegerField1.Equals(IntegerField2)
IntegerField1 <> DecimalField1
Not DecimalField1.Equals(DecimalField2)
Summing fields and comparing to another field
IntegerField1 + IntegerField2 = IntegerField3
DecimalField1 + DecimalField2 = DecimalField3
DecimalField1 = SumFieldInstance("Table1\AmountColumn")
Running regular expression against field
Regex.IsMatch(StringField1, "[0-9]{6}")
Inspecting field-level confidence scores
Instance.Confidence > 0.8


Is Required Expressions

In Grooper, "required" fields are flagged for human review when the field is empty (i.e. no value was collected during Extract). The field is set to an error state, alerting Review users to manually input the value in the Data Viewer.

Data Fields and Data Columns can be made required in two ways:

  • By enabling the "Required" property if the field always needs to be collected from the document in all cases.
  • By configuring an "Is Required" expression to conditionally determine if the field is required.

"Is Required" expressions dynamically specify whether a value is required at runtime. These expressions allow users to implement conditional requirements, making a field required based on the value of another field.

Return Type

Boolean

Is Required expressions must return a Boolean (True/False) value.

  • When the expression evaluates to true the field becomes required. The field will be placed in an error state if empty/blank.
  • When the expression evaluates to false the field remains optional.

Example Is Required Expression

In the following example, there are two fields.

  • "Marital Status" is a restricted-choice string field. It can be either:
    • married
    • single
  • "Spouse Name" is a string field.
    • This would be conditionally required, based on the value of "Marital Status".
    • If they're married ("Marital Status" is married), the "Spouse Name" should be filled.
    • If they're single ("Marital Status" is single), it should be empty.


The following "Is Required" expression would implement this logic for the "Spouse Name" field.

Marital_Status = "married"

Referencing field values in expressions

Field values (Data Field and Data Column cell values) can be referenced in the following Data Model expressions:

  • Calculated Value
  • Is Valid
  • Is Required

Referencing fields by name (or GUID)

Fields are referenced by Data Field/Data Column name.

  • Names are normalized to a code-friendly version.
  • Spaces are replaced with underscores (Ex: "Invoice Number" becomes Invoice_Number)
  • Special characters are replaced with underscores (Ex: "Yes/No" becomes Yes_No)
  • Multiple sequential spaces and special characters are replaced with a single underscore (Ex: "Yes / No" becomes Yes_No)

Fields may also be referenced by Id (GUID)

  • Instead of using its code friendly name, you can opt to use its Id value instead.
  • Use the GetFieldValue() or GetFieldTypedValue() functions to do so (Ex: GetFieldValue("12345678-90ab-cdef-1234-567890abcdef"))
    • GetFieldValue() always returns a string value.
    • GetFieldTypedValue() returns the fields .NET value type (Ex: If a Data Field's Value Type is a datetime value, this will return its value as a datetime value )
  • There are some pros and cons to referencing fields this way:
    • PRO: If the Data Field/Data Column's name changes, you don't need to update the expression.
    • CON: The field's GUID does not show up in IntelliSense. You have to track it down before entering the expression.
      • You can find a Data Field/Data Column's GUID in their "Advanced" tabs under "ID".
    • CON: The expression is less human-readable. Backtracking to find out what Data Field corresponds to a GUID in an expression may be challenging in larger Data Models.

Referencing sibling, descendent and ancestor fields

Sibling fields and fields in Data Sections are accessed differently.

  • Sibling fields (in the same branch a Data Model hierarchy) are accessed by name only (Ex: Field_Name).
  • Fields in Data Sections are accessed with dot-notation (Ex: Section_Name.Field_Name)
    • Be aware: Only fields in single-instance Data Sections can be accessed this way.

Descendent fields can be accessed through dot-notation.

  • For Data Models with several nested Data Sections, you can access each subsequent child Data Section's fields with dot-notation.
  • Example: Imagine you have a series of nested Data Sections like the Data Model below:
    data_table Data Model
    variables Field L0 Data Field
    insert_page_break Sec L1 Data Section
    variables Field L1 Data Field
    insert_page_break Sec L2 Data Section
    variables Field L2 Data Field
    • Expressions written for "Field L0" could access "Field L2" with the following expression snippet: Sec_L1.Sec_L2.Field_L2
    • Starting at the sibling Data Section, you access each subsequent Data Section's branch with dot-notation until you reach the field desired.
    • Be aware: Only fields in single-instance Data Sections can be accessed this way.

Ancestor fields can be accessed by starting at the Data Model root.

  • This includes Data Fields at the first level in the Data Model.
  • This includes Data Fields in ancestor single-instance Data Sections.
  • The root Data Model takes its name after its parent Content Type.
  • Example: Imagine the following Content Model and its and Data Model.
    stacks Invoices Content Model
    data_table Data Model
    insert_page_break Header Details Data Section
    variables Invoice Number Data Field
    variables Invoice Date Data Field
    insert_page_break Payment Details Data Section
    variables Invoice Total Data Field
    variables Early Payment Discount Data Field
    • A Calculated Value expression on the "Early Payment Discount" field could access the "Invoice Date" value with the following expression Invoices.Header_Details.Invoice_Date

LINQ: Accessing values in collections

Field values in collections can be accessed using LINQ expressions.

"Collections" are multi-instance Data Sections and Data Tables. Values in collections are not normally easily accessible in Grooper expressions except through a few functions such as SumOf("Data Column")

LINQ expressions take the following general format: From item In Collection Where condition Select item.Field OrderBy key

  • item is a variable name used to iterate through each item in the collection.
    • An "item" is a section instance for Data Sections
    • An "item" is a row for Data Tables.
  • Collection is the Data Section or Data Table.
  • Fieldis the Data Field in that Data Section or Data Column in that Data Table.
  • The Where and OrderBy components are optional.

Examples:

(From row in Line_Items Select row.Line_Total).Sum
  • This returns the sum of the "Line Total" Data Column values in a "Line Items" Data Table.
  • From row in Line_Items Select row.Line_Total selects all "Line Total" Data Column values for each row in the "Line Items" Data Table.
  • .Sum adds all the values returned by the LINQ expression.
(From sem In Semester Select sem.Earned_Hours).Max
  • This returns the maximum of the "Earned Hours" Data Fields from all section instances collected by a "Semester" Data Sections.
  • From sem In Semester Select sem.Earned_Hours selects all "Earned Hours" Data Field values for each instance in the "Semester" Data Section.
  • .Max selects the largest value from the LINQ expression.
(From sem In Semester Where sem.Year < DateTime.Now.Year Select sem.Earned_Hours).Sum
  • This returns the total "Earned Hours" for semesters prior to the current year.
  • From sem In Semester Where sem.Year < DateTime.Now.Year Select sem.Earned_Hours selects all "Earned Hours" Data Field values for each instance in the "Semester" Data Section that match the Where condition.
  • Where sem.Year < DateTime.Now.Year restricts values to the condition set by its expression. In this case, where the "Year" value in the section instance (sem.Year) is less than (<) the current year (DateTime.Now.Year).
  • .Sum adds all the values returned by the LINQ expression.

A larger article on LINQ can be found here.

Other tips

IntelliSense in the expression editor will help you.

  • As you enter expressions in an expression editor, the IntelliSense code-completion window will provide code suggestions and documentation.
  • Hover over an item in the completion list to view its documentation.
  • Type Me. to list all accessible instance fields, properties, and methods in the IntelliSense menu. This is the quickest way to determine what Data Elements are accessible if you are unsure.