Data Model Expressions
"Data Model Expressions" are a subset of code expressions in Grooper.
- Code expressions are snippets of .NET code that are evaluated at runtime to compute values, determine workflow logic, or validate data.
- Data Model expressions refer to expressions added to Data Fields and Data Columns that are used to calculate and validate field values.
|
FYI |
For brevity's sake, we will refer to "Data Fields" and "Data Column" cells as "fields" throughout this article. |
Data Model expressions are configured using the following Data Field and Data Column properties:
- "Default Value" - Generates a default value for the field.
- "Calculated Value" - Calculates an expected value for the field based on other field values. This can be used to validate or automatically populate the field.
- "Is Valid" - A Boolean expression that determines if the field's value is valid.
- "Is Required" - A Boolean expression that determines if the field is required (meaning it cannot be empty).
Default Value Expressions
"Default Value" expressions determine the default value for a Data Field or Data Column cell.
- Default values will be overridden by any subsequent data collection event: Value Extractor results, Fill Method (AI Extract) results, or Lookup Specification results
- You cannot reference Variable Definitions or other Data Field or Data Column values in a Default Value expression. Use "Calculated Value" expressions instead.
Default Value expressions are useful for pre-populating fields with:
- Static values
- System-derived values
- Document metadata
Return Type
Default Value expressions must produce a value compatible with the Data Field/Data Column's "Value Type". Otherwise, the field will throw an error or produce no value on extraction.
- Ex: The Data Field's "Value Type" is set to String. The expression must produce a string value.
- Ex: The Data Field's "Value Type" is set to DateTime. The expression must produce a datetime value.
Examples
| Purpose | Expression |
| Default to a literal string value. Value must be enclosed in quotes. | "Literal value" |
| Default to a literal numeric value. | 25.00 |
Global/System Variables | |
|---|---|
| Default to the current date and time | Now |
| Default to the current date and time, formatted. | Now.ToString("yyyy-MM-dd")
|
| Default to date and time 30 days from now. | DateAdd("d", 30, Now)
|
| Default to date and time 30 days from now, formatted. | DateAdd("d", 30, Now).ToString("yyyy-MM-dd")
|
| Default to the name of the current user. | My.User.Name |
| Generate a unique identifier | Guid.NewGuid |
Document metadata - links | |
| Returns the name of the link (e.g. "Import" or "Export") from a Batch Folder's content link. | Link.LinkName |
| Returns the attached file's path, including the file's name from a Batch Folder's content link. | Link.FullPath |
| Returns the attached file's path, not including the file's name from a Batch Folder's content link. | Link.Path |
|
Returns the first segment in a file path from a Batch Folder's content link.
|
Link.PathSegments(0) |
| Returns the linked file's full filename, extension included (the "linked object") from a Batch Folder's content link. | Link.ObjectName |
| Returns the linked file's filename without the extension from a Batch Folder's content link. | IO.Path.GetFileNameWithoutExtension(Link.ObjectName) |
|
Return metadata associated with a known Content Link type (FileSystemLink, MailLink, SftpLink, Cmis.CmisLink, etc).
|
DirectCast(Link,FileSystemLink).Filename DirectCast(Link,FileSystemLink).CreatedBy DirectCast(Link,FileSystemLink).CreatedTime DirectCast(Link,FileSystemLink).LastModifiedTime |
Document metadata - Batch Folder attachment | |
| Returns the attachment's filename, with extension. | Folder.AttachmentFileName |
| Returns the attachment's filename, without extension. | IO.Path.GetFileNameWithoutExtension(Folder.AttachmentFileName) |
| Returns the attachment's MIME type | Folder.AttachmentMimeType |
| Returns the attachment's file extension | Folder.AttachmentFileExtension |
Document metadata - MIME type specific data | |
|
Return metadata associated with email messages |
DirectCast(Handler,MailMimeTypeHandler).Subject DirectCast(Handler,MailMimeTypeHandler).To DirectCast(Handler,MailMimeTypeHandler).From DirectCast(Handler,MailMimeTypeHandler).Date |
|
Return metadata associated with PDF files |
DirectCast(Handler, PdfMimeTypeHandler).Author DirectCast(Handler, PdfMimeTypeHandler).Creator DirectCast(Handler, PdfMimeTypeHandler).CreationDate DirectCast(Handler, PdfMimeTypeHandler).Title DirectCast(Handler, PdfMimeTypeHandler).Subject |
Document metadata - more Batch Folder info | |
| Default to the Content Type (Document Type) assigned to the document (Batch Folder) | ContentTypeName |
|
Default to the parent Content Type of the Content Type (Document Type) assigned to the document (Batch Folder).
|
Folder.ContentType.ParentNode.DisplayName |
| Default to the current document's Batch Folder ID (GUID). | Folder.Id |
| Default to the current document's Batch ID (GUID). | Folder.Batch.Id |
Calculated Value Expressions
A Calculated Value Expression is a VB.Net code snippet that calculates a value of a field based on the values of other fields, much like how a formula defines a relationship between various cells in a spreadsheet. In addition to mathematical operations and text string manipulation, these expressions can inspect the Grooper node tree, batch, and environment variables or file paths to calculate the desired value.
These expressions can be used in two ways:
- To populate empty fields with calculated (or manipulated) values
- To validate existing field values
Return Type
Calculated Value Expressions must return a result that is compatible with the Data Field/Data Column's data type (set using their Data Type property).
- Ex: If a Data Field's data type is set to Integer, its Calculated Value Expression must evaluate to an integer, or else an error will occur.
Calculate Modes
Calculated Value Expressions can execute using one of three modes, set by configuring the Calculate Mode property:
- Validate
- Set If Empty
- Always Set
|
Mode |
Description |
|
Validate |
Validate will check that the field's value mathematically satisfies the Calculated Value Expression.
|
|
Set If Empty |
Set If Empty will only populate the field with the Calculated Value Expression's result if no value was collected during the Extract step of a Batch Process.
|
|
Always Set |
Always Set will always populate the field with the Calculated Value Expression's result (unless the expression fails to produce a result).
|
Examples
|
| |
|---|---|
| Addition of multiple fields | IntegerField1 + IntegerField2DecimalField1 + DecimalField2 + DecimalField3
|
| Rounding | Math.Round(DecimalField1, 4)
|
| Non-integer addition (e.g. of date values) | DateAdd("d", 30, DateField1)DateAdd("yyyy", 1, DateField1)DateAdd("m", -3, DateField1)
|
String concatenation and manipulation | |
| Concatenation of multiple fields | String.Concat(StringField1, StringField2)String.Concat(StringField1, StringField2, StringField3)String.Concat(StringField1, StringField2, StringField3, StringField4)
|
| Reformatting / Normalization of values | StringField1.Replace("\", "_")StringField1.Replace("\", "")
|
| Substring calculation |
Given the string
|
CMIS Content Links | |
|
Get properties of a CMIS Content Link.
|
CurrentDocument.ContentLink.GetCustomValue("propertyName").ToString
|
Misc expressions | |
|
Getting the location coordinates of a field on the document
|
GetFieldInstance("Field Name").Location.ToString
|
Example Calculated Value expressions used in Validate mode | |
| Date in past / future | DateField1 < NowDateField1 >= DateAdd("d", 30, Now)
|
| Equality / inequality of two fields | StringField1 = StringField2IntegerField1.Equals(IntegerField2)IntegerField1 <> DecimalField1Not DecimalField1.Equals(DecimalField2)
|
| Summing fields and comparing to another field | IntegerField1 + IntegerField2 = IntegerField3DecimalField1 + DecimalField2 = DecimalField3DecimalField1 = SumFieldInstance("Table1\AmountColumn")
|
| Running regular expression against field | Text.RegularExpressions.Regex.IsMatch(StringField1, "[0-9]{6}")
|
| Inspecting field-level confidence scores | Instance.Confidence > 0.8
|
Is Valid Expression
An Is Valid Expression is a snippet of VB.Net code that determines whether a Data Field or Data Column’s value is valid.
Return Type
Is Valid Expressions must return a Boolean (True/False) value:
- The expression must evaluate to “True” to be considered valid.
- If the expression returns “False,” an error is thrown stating "Validation Expression failed". The field's background color also changes to red to visually indicate the error.
- You may write a custom error message by configuring the Validate Message property.
Example Is Valid Expression
!!SCREENSHOTS FORTHCOMING!!
For this example, we’ll be using the Regex.IsMatch method. This method compares a string input to a regular expression, returning “True” if they match.
The field “OKDL” is meant to capture Oklahoma Driver’s License Numbers.
- OK driver license have a strict pattern, consisting of one capital letter followed by 9 numerical digits.
- We could use the following Is Valid Expression to verify the result matches the driver license pattern (or in other words, is valid data).
Regex.IsMatch(OKDL, "[A-Z][0-9]{9}")
If the field’s value does not match the regular expression, Grooper will mark it as invalid and the field turns red:
Change the value to a valid ID number, and viola! The field is no longer in error:
Is Required Expressions
You can set a field to be required in Grooper by setting the Required property to true. If the field is not extracted during the Extract step (in other words, "blank"), the field will be flagged as invalid. However, what if whether or not a field is required is based on the value of some other field? You would want that field to be conditionally required, based on some set criteria or parameter.
That's what Is Required Expressions are for. An Is Required Expression is a snippet of VB.Net code that sets a field’s “Required” status conditionally.
Return Type
Is Required Expressions must return a Boolean (True/False) value:
- If the expression returns “True,” the field becomes required.
- If the expression returns “False,” it remains optional.
Example Is Required Expression
In the following example, there are two fields.
- “Marital Status” is a Boolean field with a true value of “Married” and a false value of “Single.”
- “Spouse Name” is a string field. It has the the following Is Required Expression:
Marital_Status = True
When the "Marital Status" field is set to "Single" (therefore "False"), the "Spouse Name" field is not required.
- The condition set by the Is Required Expression
Marital_Status = Trueis met. The "Marital Status" field's value equates to "False". So, the expression returns "False" - The field remains an optional field. No value is required to be entered.
If “Marital Status” is “Married” (therefore "True"), the “Spouse Name” field then becomes required.
- The condition set by the Is Required Expression
Marital_Status = Trueis met. The "Marital Status" field's value equates to "True". So, the expression returns "True" - The field will be in error and turn red until a value is entered.
Referencing field values in expressions
Field values (Data Field and Data Column cell values) can be referenced in the following expressions:
- Calculated Value
- Is Valid
- Is Required
Fields are referenced by Data Field/Data Column name.
- Names are normalized to a code-friendly version.
- Spaces are replaced with underscores (Ex: "Invoice Number" becomes
Invoice_Number) - Special characters are replaced with underscores (Ex: "Yes/No" becomes
Yes_No) - Multiple sequential spaces and special characters are replaced with a single underscore (Ex: "Yes / No" becomes
Yes_No)
Peer fields and fields in Data Sections are accessed differently.
- Peer fields (in the same branch a Data Model hierarchy) are accessed by name only (Ex:
Field_Name. - Fields in Data Sections are accessed with dot-notation (Ex:
Section_Name.Field_Name)- Be aware: Only fields in single-instance Data Sections can be accessed this way.
Ancestor fields can be accessed by starting at the Data Model root.
- This includes Data Fields at the first level in the Data Model.
- This includes Data Fields in ancestor single-instance Data Sections.
- The root Data Model takes its name after its parent Content Type.
- Example: Imagine the following Content Model and its and Data Model.
- stacks Invoices Content Model
- data_table Data Model
- insert_page_break Header Details Data Section
- variables Invoice Number Data Field
- variables Invoice Date Data Field
- insert_page_break Payment Details Data Section
- variables Invoice Total Data Field
- variables Early Payment Discount Data Field
- insert_page_break Header Details Data Section
- data_table Data Model
- A Calculated Value expression on the "Early Payment Discount" field could access the "Invoice Date" value with the following expression
Invoices.Header_Details.Invoice_Date
- stacks Invoices Content Model
- Be aware: This method will not work to crawl Data Elements inherited from a parent Content Type's Data Model.
Field values in collections can be accessed using LINQ expressions.
- Collections are multi-instance Data Sections and Data Tables.
- LINQ expressions take the following general format:
From item In Collection Where condition Select item.Field OrderBy keyitemis a variable name used to iterate through each item in the collection.- An "item" is a section instance for Data Sections
- An "item" is a row for Data Tables.
Collectionis the Data Section or Data Table.Fieldis the Data Field in that Data Section or Data Column in that Data Table.- The
WhereandOrderBycomponents are optional.
- Examples:
(From row in Line_Items Select row.Line_Total).Sum- This returns the sum of the "Line Total" Data Column values in a "Line Items" Data Table.
From row in Line_Items Select row.Line_Totalselects all "Line Total" Data Column values for each row in the "Line Items" Data Table..Sumadds all the values returned by the LINQ expression.
(From sem In Semester Select sem.Earned_Hours).Max- This returns the maximum of the "Earned Hours" Data Fields from all section instances collected by a "Semester" Data Sections.
From sem In Semester Select sem.Earned_Hoursselects all "Earned Hours" Data Field values for each instance in the "Semester" Data Section..Maxselects the largest value from the LINQ expression.
(From sem In Semester Where sem.Year < DateTime.Now.Year Select sem.Earned_Hours).Sum- This returns the total "Earned Hours" for semesters prior to the current year.
From sem In Semester Where sem.Year < DateTime.Now.Year Select sem.Earned_Hoursselects all "Earned Hours" Data Field values for each instance in the "Semester" Data Section that match theWherecondition.Where sem.Year < DateTime.Now.Yearrestricts values to the condition set by its expression. In this case, where the "Year" value in the section instance (sem.Year) is less than (<) the current year (DateTime.Now.Year)..Sumadds all the values returned by the LINQ expression.
IntelliSense in the expression editor will help you.
- As you enter expressions in an expression editor, the IntelliSense code-completion window will provide code suggestions and documentation.
- Hover over an item in the completion list to view its documentation.
- Type
Me.to list all accessible instance fields, properties, and methods in the IntelliSense menu. This is the quickest way to determine what Data Elements are accessible if you are unsure.