2.80:Data Type (Node Type)

Data Types are Data Extractors that use regular expression to match text on a document, returning and collating the results.
The matching pattern or patterns will return as a list of values. The returned values can be further manipulated, isolated, and adjusted by configuring the properties of the Data Type.
About
Data Type extractors are the main way information is found and used on a document. Say, you want to use the form number information on the document below to separate a document.
![]() |
| You need a Data Type! The Data Type will find the form number. The Separate activity will use that Data Type to separate this page into a new folder. |
Say you want to classify this contract as a "Lease" document type, using the header title "Oil and Gas Lease".
Say you want to grab all the highlighted information from this form.
How Data Types Locate Data
Data Types return information on a page by using regular expression pattern matching. Regular expression (or regex) uses a standard syntax to match patterns of text in a block of text. For example, the regular expression "ball" would match every word "ball" in the string of text "ball, football, baseball, 8-ball ball in hand, balloon"
| Regex Pattern | Text | Matches |
ball |
ball, football, baseball, 8-ball ball in hand, balloon | ball, football, baseball, 8-ball ball in hand, balloon |
This is a very specific pattern literally matching only the string of characters "ball". Regex patterns also take advantage of a specific syntax to match more general patterns. For example, the "\d" character in regex will match any digit character 0 through 9.
For more information on regular expression pattern matching, visit the Regular Expression article.
| ! | Before regex can match text on a document, you have to extract machine readable text from the page! Data Type extractors will return no results without any raw text data to match. You must first obtain text from your documents via the Recognize activity. The Recognize activity will extract machine readable text from images through OCR as well as extracting native text from digital PDFs. |
Once you have extracted text for a document via the Recognize activity (either through OCR for image based documents or native text extraction from digital PDFs), Data Type extractors can use regular expression to match text in whatever way you deem necessary. The simplest configuration of a Data Type extractor uses a regular expression pattern (written using the "Pattern" property and the Pattern Editor) to match text on a document and return the matches as individual results.
Data Types are also much more robust than simple regex pattern matching. While regular expression is a huge part of how Data Types return data from a document, it is only the beginning. Two other concepts are critically important to understanding how Data Types work: Inheritance and Collation.
Inheritance
Data Types inherit the values returned by any child extractor created under it (as well as any extractor it references). This allows a single extractor to return multiple values using multiple patterns and extractor configurations.
Data Types can have both Data Format and Data Type extractors as children.

Data Formats are very simple extractors using only
Collation
How the Data Type uses those results will be configured in its properties (Determined by the "Collation" property).
Use Cases
The total number of uses for Data Types are innumerable. However, they fall into three main categories.



