2.80:Data Type (Node Type)

Data Types are Data Extractors that use regular expression to match text on a document, returning and collating the results.
The matching pattern or patterns will return as a list of values. The returned values can be further manipulated, isolated, and adjusted by configuring the properties of the Data Type.
About
Data Type extractors are the main way information is found and used on a document. Say, you want to use the form number information on the document below to separate a document.
![]() |
| You need a Data Type! The Data Type will find the form number. The Separate activity will use that Data Type to separate this page into a new folder. |
Say you want to classify this contract as a "Lease" document type, using the header title "Oil and Gas Lease".
Say you want to grab all the highlighted information from this form.
How Data Types Locate Data
Data Types return information on a page by using regular expression pattern matching. Regular expression (or regex) uses a standard syntax to match patterns of text in a block of text. For example, the regular expression "ball" would match every word "ball" in the string of text "ball, football, baseball, 8-ball ball in hand, balloon"
| Regex Pattern | Text | Matches |
ball |
ball, football, baseball, 8-ball ball in hand, balloon | ball, football, baseball, 8-ball ball in hand, balloon |
This is a very specific pattern literally matching only the string of characters "ball". Regex patterns also take advantage of a specific syntax to match more general patterns. For example, the "\d" character in regex will match any digit character 0 through 9.
For more information on regular expression pattern matching, visit the Regular Expression article.
| ! | Before regex can match text on a document, you have to extract machine readable text from the page! Data Type extractors will return no results without any raw text data to match. You must first obtain text from your documents via the Recognize activity. The Recognize activity will extract machine readable text from images through OCR as well as extracting native text from digital PDFs. |
Once you have extracted text for a document via the Recognize activity (either through OCR for image based documents or native text extraction from digital PDFs), Data Type extractors can use regular expression to match text in whatever way you deem necessary. The simplest configuration of a Data Type extractor uses a regular expression pattern (written using the "Pattern" property and the Pattern Editor) to match text on a document and return the matches as individual results.
Data Types are also much more robust than simple regex pattern matching. While regular expression is a huge part of how Data Types return data from a document, it is only the beginning. Two other concepts are critically important to understanding how Data Types work: Inheritance and Collation.
Inheritance
Data Types inherit the values returned by any child extractor created under it (as well as any extractor it references). This allows a single extractor to return multiple values using multiple patterns and extractor configurations.
Data Types can have both Data Format and Data Type extractors as children.

For example, the extractor below has two "Data Format" children. One finds the word "HELLO". The other finds the word "WORLD!". Both results are returned by the parent Data Type.

Data Format Extractors
Data Formats are very simple extractors. They are only created as children of Data Type extractors. They cannot be created as a free-standing object. They are bitty baby objects that need to hold mommy's hand.
They too use regular expression to return matches against the raw text data. They are configured using only the Pattern Editor and the properties available to the Pattern Editor.


Data Format extractors are useful for patterning multiple varieties in which a data can be formatted. Think about the different ways in which a date can be formatted.
These are all different ways to express the same information.
- 06/12/1985
- June 12, 1985
- 12 June 1985
- 1985-06-12
- 12th day of June 1985
It would be difficult to match each one of these five date formats using a single regular expression. However, it's relatively easy to match each format with five different regex patterns.

Data Types as Children of Data Types
Data Type extractors can also be children of other Data Type extractors. Any result the child Data Type returns will be fed to the parent Data Type. This includes the results of child Data Types own children! This way, the child Data Type can take advantage of the properties available to Data Type objects not available to Data Formats, such as collation (more on collation below).
See below, the parent Data Type named "Sample Data Type" has three children. Two Data Formats and one Data Type. Every result each three child extractors returns are returned by the parent Data Type.

Referenced Extractors
Instead of creating a Data Type as a direct child of another Data Type, you can also reference Data Types in the Node Tree to return their result. Functionally, the parent Data Type uses the reference as if it were a child without changing the child Data Type's location in the Node Tree.

This can be very helpful from an asset management perspective. When a Data Type's results need to be used by multiple different parent Data Types, there's no need to create multiple separate child Data Types for each parent. Instead, a single Data Type can be created as its own object in the Node Tree and all parent Data Types can reference the same object as if it were a child, using the "Referenced Extractors" property.
Collation
How the Data Type uses those results will be configured in its properties (Determined by the "Collation" property).
Use Cases
The total number of uses for Data Types are innumerable. However, they fall into three main categories.



