Grooper A.C.E. - Architect

From Grooper Wiki
Jump to navigation Jump to search
Grooper A.C.E. • Architect / Consultant / Engineer

Welcome to Grooper A.C.E. - Architect!

There are three assumptions made before we move forward:

  1. You are new to Grooper
  2. You have Grooper installed and licensed
  3. You have some familiarity with regular expressions
  4. You have downloaded and unzipped the Grooper A.C.E • Architect - Closing Disclosure Forms
    • This is the sample data/documents that will be used as the work to build out a model in Grooper is done.



The goal of this article is to give you, the new Grooper user, some confidence in your ability to navigate and use the software by providing a thorough, step by step guide to building out an in depth Content Model and accompanying Batch Process in Grooper 2.9.

Asset 22@4x.png

Click the link below to download the Closing Disclosure Form documents. This is a zip file containing PDF files. You will need to unzip the file first before bringing the files into Grooper.

Asset 22@4x.png

Additionally, our Grooper A.C.E. Training Tier II and III members can click here and login to Grooper xChange to download a completed Content Model and Batch Process. It’s not required to download to complete this article, but can be helpful because it can be used as a cross-reference to check your work as you build, or for reverse engineering.


Contents

About

The work done to build out everything will cover a myriad of topics that will touch on many different aspects of Grooper:

  • Import/Export of Electronic content via NTFS File System Import/Export
  • Document form detection via IP Profile and native text recognition
  • Rules-based document classification
  • Content Model / Extraction Techniques
    • Develop general understanding of hierarchy and complex referencing
      • Waterfall extraction
      • Collation methods
        • Key-Value Pairs
        • Split
          • Between
          • Simple
        • Ordered Arrays
    • Text Parsing
      • Input Filtering
      • Data Field Sub-Elements via named capture groups
    • Generic Text Extraction (a.k.a. the cheat code)
      • Exclusion Filtering
      • Lexicons for lists
      • Subtraction Extractor
    • Table Extraction
      • Row Matching
        • Ordered Arrays
      • Inverted Header/Row table
      • Infer Grid
    • Calculate Expressions
      • Simple LINQ syntax
      • Enumerated hierarchical referencing
    • Section Extraction
      • Simple
      • Geometric
      • Understanding Data Instances; where data begins and ends based on input
    • Data Type - Result Options
      • Box Detection/OMR
    • Developing best practice (local resource) foldering/extractor naming
  • Batch Process creation and execution
  • Activity Processing service creation


It is important to build a vocabulary, so everything will be referred to very specifically. All things that may be considered vocabulary will use the following syntax:

  • Grooper objects like Data Types (and names given) will be bolded
  • Object properties will be bolded and italicized (like the Data Type's Collation property)
  • Property settings will be italicized (like setting the Data Type's Collation property to Key-Value Pair.)

It would be beneficial to take a look at the Five Phases of Grooper article, as this will reference concepts covered in that article. Also, please take a moment to read the article about Asset Management, as this will help set a strong foundational understanding of how and why things in Grooper are organized and named the way they are.

There is a lot of work to do to accomplish the goals stated above. The work will be divided into several logical chunks, in may ways representing the final construction of the Content Model and Batch Process. Let's get to it!

How To

Our journey with Grooper starts here. We will begin simply, and develop more complex ideas as we move forward.

! Some of the tabs in this tutorial are longer than the others. Please scroll to the bottom of each step's tab before going to the step.

Grooper Design Studio Basic UI Overview

Opening Grooper and Understanding the UI

Working for the first time in Grooper can be a bit jarring as the UI does not specifically guide you anywhere, and it, in general, has a very nonlinear approach. That considered, there's really no better way to get started than to just dive in, head first, and open Grooper Design Studio.
There are several applications that make up the totality of Grooper, but by far the application you will do the most work in is Grooper Design Studio. As a result, and for the sake of brevity, Grooper Design Studio will be referred to simply as Grooper for the duration of this article. Clarification will be provided when the reference seems ambiguous.

Grooper, in general, is divided into two main areas: the Node Tree on the left, and the configuration panels of what is selected from the Node Tree on the right. The Node Tree is a hierarchical representation of every object in Grooper. As you select different objects from the Node Tree, you will notice the bulk of the right side of the UI change to display properties and UI elements specific to that object.

In the screenshot to the right, with Grooper just opened...

  1. ...you will notice the very top object in the Node Tree is selected, which is what is called the Grooper Root Node.
  2. Every object has a set of tabs that contain different pieces of information about, or ways to interact with, that object. The first tab is typically the most common, and as a result, significant in either configuring or understanding that object. In this case the Grooper Root tab gives us several UI pieces to unpack.
    • It would be unproductive and ultimately unnecessary to go over every single UI element of every object, so we will just focus on common ones that are going to allow you to get the most comfortable.
  3. Property grids, like this one, are incredibly common in Grooper as they are the chief means by which one configures the object they have selected. Any property that is set to a non-default setting will display that setting in bold font.
  4. Nearly every property and object in Grooper shows the Grooper Help information written by the development team to describe or help you understand how to configure the object or property.
  5. Every object in Grooper has an Advanced tab. This tab houses other sub-tabs that give system and configuration information about the object.
Grooper ace architect 0001.png

The sub-tabs of the Advanced tab give some important information.

  1. The first tab, General Info, houses information relevent to the identification of the selected object...
  2. ...such as the ID property which contains a GUID (guaranteed unique identifier) of the object. This is the programmatic way the object is recognized by the Grooper database. If you were to search for this ID against the dbo.TreeNode table in the Grooper database, it would return the row that represents this object.
Grooper ace architect 0001a.png
  1. The Security tab allows you to leverage your Active Directory environment to control permissions to individual objects in the Node Tree. Using this, you could "sequester" entire branches of your Grooper environment based on specific needs. For example, a company’s Human Resources department could run documents through Grooper and maintain controlled access with this functionality.
Grooper ace architect 0001b.png
  1. The Properties tab houses JSON information related to the non-default properties of an object. This would be blank were an object’s properties completely default. This information is serialized in the database. Under normal circumstances, you should not ever need to mess with this information, but it is good to know what it is.
Grooper ace architect 0001c.png
  1. The Files tab has a list-view of files associated with the selected object. A Page object, for example, could have images or character data text files associated with it. These files live in the Grooper filestore as .grp files. If you select a file from the list, the bottom half of the screen will adjust to a viewer to allow you to view the selected file.
Grooper ace architect 0001d.png
  1. The Values tab displays a list-view of serialized values for the currently selected object. This includes things like classification candidates, EPI info, or whether an image has been reviewed. You can also leverage this to write "hidden utility" fields to help make other logic decisions that can be leveraged in custom scripting.
Grooper ace architect 0001e.png
  1. The References tab has two different list-views showing if the selected object is referenced by other objects and/or what objects it may happen to be referencing. You can select items from the list and use the object command (right-click the object...) Go To Item to jump immediately to that object in the Node Tree.
Grooper ace architect 0001f.png

Exploring the Node Tree

Let's explore the different branches of the Grooper Node Tree and discuss what's housed within.

  1. The first main branch is that of Batch Processing. There are two sub-folders to consider within.
  2. The Batches folder is where are Batches live in Grooper. It is divided by Production and Test batches. There are two functional differences between the Production and Test folders:
    1. Batches in the Production folder are not exposed to any testing mechanisms used while designing models, extractors, profiles, etc.in Grooper.
    2. Batches in the Test folder are not observed by Activity Processing Services, so, typically, everything you do to a Test batch will be in a manual fashion.
  3. The Processes folder houses Batch Processes. A Batch Process is a sequence of individual Batch Process Step objects, each specifying an Activity to be applied. Activities may represent automated system tasks, or human-attended tasks which require operator interaction. Collectively, these steps represent a workflow process through which Batches of a particular class will travel. Once created and published, Batch Processes are assigned to production Batches at Batch creation time.
Grooper ace architect 0001g.png

The Content Models branch stores Content Models and the Content Types they contain, which define the taxonomy of a document set, in terms of the Document Types they contain, and the Data Elements which appear on each Document type.

Grooper ace architect 0001h.png

The Data Extraction branch stores extractors that are meant to be available globally. Typically, extractors made within a Content Model are specific to, and accessible only to that model. Extractors stored in this Data Extraction area can be access by all models. This can be useful when you have an extractor that performs a very ubiquitous task, like extracting an address.

Consider carefully what you decide to reference from here, however. While it can be convenient to have an extractor stored "globally" that many things point to so that an update to it updates everything, it can also be problematic if you do not consider this fact. Due to this, it is advised to copy, locally, to your individual models, extractors stored here, to not inadvertently affect production models. The best approach would be to consider updating extractors stored here when new discoveries are made, or to add to this area new extractors you’ve made that perform common types of extraction.

Grooper ace architect 0001i.png
  1. The Global Resources branch stores an assortment of different "profiles".
  2. IP Profiles are containers of a sequence of image processing steps meant to either affect the pixel contents of an image (de-skewing an image or removing a border), or detecting something about the image (are there OMR boxes and are they checked or not, or are there lines on the image and where are they.)
  3. Lexicons are "dictionaries" which store lists of words, phrases, field values, translations, weightings, and other information.
  4. OCR Profiles are containers of a complex set of properties which determine how optical character recognition is performed against image based content.
    • It is worth noting that OCR does not need to be performed against electronically sourced documents like MS Word files as the native, electronic text is accessible by Grooper.
  5. Scanner Profiles are stored configurations for how to run a scanner in Grooper.
    • For example, you could have one scanner, but a profile that runs it at 150dpi single sided, and another profile that runs it at 300dpi double sided.
  6. Separation Profiles are used to specify how pages will be separated into documents and folders within a Batch.
Grooper ace architect 0001j.png
  1. The Infrastructure branch houses objects that perform functions such as connecting to data sources.
  2. CMIS Connections are objects used to connect to sources like MS SharePoint, Application Xtender, or Box.
  3. Data Connections are objects used to connect to tabular data sources like databases.
  4. File Stores are objects configured to point to paths that store objects related to objects stored in Grooper. The most common setup is the one File Store object created upon the initialization of your Grooper repository, but one can configure multiple.
    • An example of why to have multiple File Store objects would be to have "fast" and "slow" storage. Your main Grooper file store might be on "fast" SSD storage, but Batches you choose to dispose may not get deleted immediately, but instead moved to "slow" storage for archival purposes.
Grooper ace architect 0001k.png
  1. License Servers are objects configured to point your Grooper environment to machines that host the Grooper License Server service, meant to provide licensing. A Machine object configured to be pointed at the machine hosting the license must exist to allow a connection to be established.
  2. The Machines folder allows you to manage connections to other computers by creating Machine objects which represent those connections. A core function of this branch is to configure or monitor Grooper services running on other machines. Machine objects can be manually added from here, or are created automatically when creating connections to other Grooper repositories via Grooper Config.
  3. Object Libraries are objects that contain .NET code meant to augment the functionality of Grooper. You can edit directly in Grooper, but ideally you will want a more robust IDE like MS Visual Studio, which you can connect to from Grooper. Once written, the contents of an Object Library must be compiled in Grooper to provide their functionality.
  4. Thread Pools are objects that represent something akin to a waiting room. Activities in a Batch Process submit their tasks to a Thread Pool which is monitored by a Grooper Activity Processing service configured to "watch" that Thread Pool, and upon submission the service will pick up the task and process it.
Grooper ace architect 0001k01.png

Acquire the Documents

The next logical thing to do is to consider the first of the Five Phases of Grooper and to Acquire our documents so we can begin building and testing in Grooper.
Everything in Grooper starts by bringing documents into the system.

The First Batch

Documents exist within Batch Folders in Grooper which belong to Batch objects, therefore, to get our documents into Grooper, we need to create a Batch.

Given the hierarchical nature of the Grooper node tree, establishing a syntax to describe how to get to specific branches in that tree will need to be established. The syntax for that will be as follows (the Grooper root node is assumed):

Level 1 > Level 2 > Level 3 > etc...

Start by expanding the Node Tree to:

Batch Processing > Batches > Test

Grooper ace architect 0002.png

We will be doing a lot of testing as we move forward and build out our model, hence navigating to:

Batch Processing > Batches > Test

Instead of:

Batch Processing > Batches > Production

Batches that are created and live in the Production folder are not available to test against as you are building out models and profiles etc. in Grooper.

To create a Batch...

  1. Right-click on the Test folder. This will bring up the Context Menu which holds what are known as object commands.
  2. Choose the Add > Batch... object command.
  3. In the subsequent Add New Batch window name the Batch Closing Disclosures (this name is arbitrary.)
  4. Click the OK button to confirm the creation of the new Batch and close the Add New Batch window.
Grooper ace architect 0003.png

With that Batch created, expand it by clicking the + icon on its side. This will expose in the node tree the two child objects of a Batch, the Batch Folder and the Batch Process. The Batch Folder is empty (because nothing has been added to it yet), and the Batch Process is called Null (because upon its creation, the Batch was not associated with a published Batch Process.)

Grooper ace architect 0004.png

Adding Documents to the Batch

With the Batch created it is now very easy to add documents to it. In Windows, wherever you saved and subsequently unzipped the provided documents to, go there and copy those files into your clipboard.

In Grooper...

  1. Select the Batch just created.
  2. Click on the Batch Viewer tab.
  3. Either right-click the Batch Folder and use the Paste object command, or select the Batch Folder and press Ctrl+V to paste what is in your clipboard.
    • Documents added to a Batch in this fashion are considered "sparsely" imported.

You should now see 25 documents in the Batch. The documents would now be considered Acquired.

Grooper ace architect 0005.png

Bonus! File System Import

The steps outlined in this tab are not necessary to move forward. They are here to show you an alternate way to create a Batch and populate it with documents. Either approach is fine, and is entirely up to your preference. The following steps outline how to use the File System Import in an ad hoc fashion.

  1. Select the Test branch in the Node Tree.
  2. Above the list-view on the right, use the Batch drop-down.
  3. In the drop-down menu select Legacy Import > File System Import....
Grooper ace architect 0005a.png

There are several properties here not covered by this set of steps, but they are self explanatory, or the Grooper Help does a good job describing what they do. What is necessary for these steps is described below.

  1. The most important property to set is the Base Directory. Navigate to the path your documents are stored in, or paste a path into the property.
  2. While the Sparse Import property is set to True the File Disposition property can only be NoChange. This is because a sparse import does not actually copy files from their directory into the Grooper filestore, it simply creates document objects in Grooper and links to the files where they exist. In considering this, if you moved the files upon creating document objects in Grooper, it would no longer know where they exist.
    • When documents are sparsely imported, you will notice a golden link icon on their document icons in a batch viewer.
  3. Putting something in the Batch Name Prefix property will prepend the name of the Batch once it’s made.
  4. When ready, click the Start Import button, then Close the window when it’s done.
Grooper ace architect 0005b.png
  1. Notice now in the Imported Batches branch there is a new Batch named with the prefix you gave it and the current system date/time. You can also see this represented in the list-view of the Test branch, since it is still selected.
  2. .
Grooper ace architect 0005c.png

Condition the Documents

Having completed the acquisition of the documents, it is now time to prepare, or Condition, them for further processing.

Content Action • Split

These Closing Disclosures are 5 page forms. They exist in Grooper now as Document (also referred to in Grooper as Folder) objects, with access to the pages via the Document Viewer, but no Page objects currently exist. Page objects need to be created to allow certain activities to be applied, as well as to give us the ability to focus on individual pages when testing extraction (which can be very useful when using very generic extractors, especially considering very large documents.) The following steps outline how to apply the Content Action activity configured to split out pages in an ad hoc fashion.

Grooper ace architect 0006.png

Page objects can be created by applying the Content Action activity, which serves the purpose of splitting and/or merging of multi-page file formats. To apply this activity to the entire Batch...

  1. Right-click on the Batch Folder.
  2. Select the Contents > Apply Activity... command.
    • This will bring up the Contents • Apply Activity window, which allows you to perform multi-threaded tasks against whatever scope you choose.
  3. Set the Activity Type property to Content Action.
Grooper ace architect 0007a.png
  1. With the Activity Type property set, click the arrow on the left side of the window to expose the properties of this activity.
  2. Set the Render Resolution property to 300.
  3. Make sure the Scope property is set to Folder and the Folder Level property is set to 1 (the Batch Folder is at a scope of "Folder Level 0", or more accurately the "Batch" scope.)
Grooper ace architect 0007b.png
  1. Press the Execute button which will bring up the Process All window.
  2. Press the Start button and you should eventually...
  3. ...see a green progress bar fill up at the bottom of the window.
Grooper ace architect 0007c.png

There should now be Page objects for each Document object (you may need to Refresh the Node Tree to see this change take effect). You will see these objects reflected in the node tree as children to the Document objects, as well as in the Batch Viewer, where you can individually select a specific page and see only it represented in the Document Viewer.

Grooper ace architect 0008.png

Image Processing - Analyzing the Pages

The documents provided were created in an electronic application, and as a result do not consist of pixel based images subject to the flaws of a scanning process. As a result, Image Processing for these types of documents will include detecting of features of the form of the document, not the "cleaning up" of image artifacts. Each of these requires the same thing, however: an IP Profile. An IP Profile is a linear sequence of IP Steps (which can be organized in IP Groups) designed to do one or more of the following things:

  1. To apply archival image adjustments
  2. To apply image cleanup operations to obtain better text data from OCR.
  3. To obtain image-based data, including Layout Data (such as table line locations, barcode information, OMR checkbox states, and more) as well as image features used for Visual classification.

Follow the steps below to create an IP Profile that will be used for detecting some form features of these documents including OMR Boxes and lines. The detection (and subsequent storage/attachment to their detected pages) of these features will allow for powerful extraction techniques to be used later. Writer's note - Keep in mind that creating an IP Profile, and even testing its functionality while building it, does not actually affect the documents. The results you see while testing are all done in memory, and as a result are temporary. Like the Content Action activity applied previously, once an IP Profile is created, it must be run as an activity to be applied to your content.

  1. Expand the Node Tree to:
    Global Resources > IP Profiles
    ...and in the object-command menu choose Add > IP Profile...
  2. In the Add New IP Profile window name it Feature Detection.
Grooper ace architect condition 001a.png
  1. With the newly created IP Profile selected, click the Add drop-down menu.
  2. From there, in the Feature Detection section, choose Box Detection.
Grooper ace architect condition 001b.png
  1. To see results, the profile needs to be run against a document. In the Batch Viewer, click the Batch drop-down and select the Closing Disclosures Batch.
Grooper ace architect condition 001c.png
  1. Select a document.
  2. With the document selected, click the button next to the Batch drop-down menu. This will save the currently selected document in the Batch Viewer, so as you move around the Grooper UI, and encounter the myriad of places a Batch Viewer exists, it will default to this document.
  3. Selecting a document will automatically execute the entirety of the IP Profile (in this case just one step for now) and show you information in the Processing Results list-view.
  4. To the left of the Document Viewer is a tree view showing not only all the steps executed, but all the intermediary steps OF each step. This is an invaluable diagnostic tool, as you can see each part of what makes an IP command work.
Grooper ace architect condition 001d.png
  1. You can step through each intermediary step to see how the Box Detection IP Step works. In this case select the "Boxes" intermediary step.
  2. There are boxes on this page that SHOULD get detected, but for some reason are not. The excellent thing about Grooper is these features are not simple on/off functions. We can determine, via this diagnostic step, that something is awry, and adjust.
Grooper ace architect condition 001e.png
  1. Here, I scrolled down the page a bit and zoomed in to show the Box Detection working, which I can tell because the boxes are highlighted either green or pink. The power of Box Detection in Grooper is not just the ability to determine the existence of the boxes, but to ascertain their "checked" status. Green indicates the box is "checked", while pink tells us the box is not checked.
    • So, given this we know the IP Step is functioning, it’s just not getting the boxes at the top of the page. With some analysis, one would determine that the boxes at the top are a bit small.
Grooper ace architect condition 001f.png
  1. In the property grid of the Selected Command tab, expand the properties of the Size Range property and set the Minimum property to 5pt.
  2. After making that adjustment, click the Execute button.
  3. Notice now the smaller boxes at the top of the page are considered detected.
Grooper ace architect condition 001g.png
  1. Next we need to add a Line Detection IP Step, so click the Add drop-down again.
  2. In the Feature Detection section, choose Line Detection.
    • The default settings for this step should suffice.
  3. Click the Execute button.
Grooper ace architect condition 001h.png
  1. Notice now the Processing Results list-view has more information.
  2. There are also new diagnostic steps to view, and in this case you can see the lines that this step detected.
Grooper ace architect condition 001i.png

Recognizing the Text and Form Features

The IP Profile that was just created will now be leveraged during the process that will pull the native text from our electronic documents. To do this, we will run the Recognize activity.

Recognize is an activity which detects and reads the presentation elements which convey information in a document, such as text segments, barcodes, lines, check boxes, and other shapes. The resulting character data and layout information is saved on the Document/Folder or Batch Page object being processed, where it is available to subsequent activities which depend on recognition results.

The following steps outline how to apply the Recognize activity in an ad hoc fashion.

  1. Expand the Node Tree to:
    Batch Processing > Batches > Test
    ...and select the Closing Disclosures Batch.
  2. Click the Batch Viewer tab.
    • It is worth noting that all Batch Viewers are basically created equally. Any Batch Viewer will function like any other, so in the case of applying the Contents • Apply Activity command, you do not HAVE to navigate to this particular Batch Viewer.
  3. Right-click on the Batch Folder to bring up the object-command menu.
  4. Select Contents > Apply Activity...".
Grooper ace architect condition 002a.png
  1. In the Contents • Apply Activity window set the Activity Type property to Recognize.
  2. We wont' be OCRing, so the OCR Profile property can be left blank.
  3. Set the Native Text Extraction property to Full, as we will be relying entirely on the native text of the document set.
Grooper ace architect condition 002b.png
  1. For the Alternate IP property, select the Feature Detection IP Profile from the drop-down tree view.
  2. Set the Scope property to Page.
    • While the native text can be made available at the document level, it is advised to run Recognize at the page level as it maximizes the benefits from parallel processing, among other things.
  3. Press the Execute button.
Grooper ace architect condition 002c.png
  1. In the Process All window, check your Thread Count, and press the Start button.
  2. As the task is being processed you will see the completion bar fill.
Grooper ace architect condition 002d.png
  1. With the Recognize activity completed, open the batch up further in the Node Tree to:
    Batch Processing > Batches > Test > Closing Disclosures > Closing Disclosures > Document (1)
...and select a Page object.
  1. Click the Advanced tab.
  2. Click the Files tab.
Writer's note - The following steps (13-17) are not necessary to complete the Recognize activity, as steps 1-12 will suffice. However, the remaining steps are useful for gaining more understanding of Grooper and its relationship between the objects, as entries in the dbo.TreeNode table in the Grooper Database, and if those objects have files related to them that live in the Grooper Filestore.
Grooper ace architect condition 002e.png
  1. Select the LayoutData.json file.
  2. Notice the viewer at the bottom change to display the JSON information. This is the metadata now associated with this page object that was created from running the Feature Detection IP Profile, set on the Alternate IP property from earlier. You can see it has information about things like the OMR Boxes, and whether it is checked or not.
    • You can see in the list-view there are several other files associated with this Page object including images, character data, and others. Try clicking on them and observing them in the image viewer below, as well.
Grooper ace architect condition 002f.png

Organize the Documents

Classification is a critical step in the flow of documents through Grooper as it is how documents are identified. It is the process of logically assigning, via training or rules defined in a Content Model, a Document Type to a document/folder. The process of Classification allows Grooper to know what type of extraction to apply to a document.

A good analogy would be to think of a person needing to file papers away into different filing cabinets. In order for the person to know where to put what paper, that person would need to be able to identify each paper so he/she knew where to put it. This is essentially equivalent to Classification in Grooper.

The following steps outline how to use a Rules-Based approach to classify the Closing Disclosure documents.

Creating the Content Model

The Content Model is the most important object we will create as it contains basically everything about our project moving forward. It is also the object that will leverage supplied logic to apply the classification for its Document Types.

  1. Select the Content Model branch of the Node Tree.
  2. Right-click to bring up the command menu and select Add > Content Model...
  3. In the Add New Content Model window, name it Closing Disclosures.
Grooper ace architect organize 001.png
  1. With the Content Model created and selected, click the Create Data Model... button.
    • It is not necessary to do this right now, but as we will certainly be using it later it is worth getting out of the way.
Grooper ace architect organize 002.png
  1. Click the Create Local Resources Folder... button.
Grooper ace architect organize 003.png

Adding a Document Type

Document Types are created as children of Content Model or Content Category objects. Once created, Document Types can be assigned to Folder/Document objects in a Batch manually using the Assign Document Type command. The act of assigning a Document Type to a folder is called classification.

Classification is rarely performed manually by a user. In most cases, automated classification is used to classify documents based on their lexical content or visual appearance. To use automated classification, each document type must be trained with examples or configured with classification rules.

  1. Select the Closing Disclosures Content Model and right-click to bring up the command menu.
  2. Select Add > Document Type....
  3. In the Add New Document Type window name it Closing Disclosure.
Grooper ace architect organize 004.png
  1. Select the newly created Closing Disclosure Document Type.
  2. The Classification collection of properties has a hidden property we need to activate. This is done on the Content Model.
Grooper ace architect organize 005.png
  1. Select the Closing Disclosures Content Model.
  2. Set the Classification Method property to Rules-Based from the drop-down menu.
Grooper ace architect organize 006.png

Classifying the Documents with Rules

Having created the main Document Type and setting the Content Model to Rules-Based classification, it's time to make the rule. This will be done with one of the three extractor types you can find in Grooper: the Data Type.

A Data Type defines extraction logic for a distinct type of data, such as a field value or a table row. Each Data Type defines one or more extractors, along with settings which control how the extractor results are transformed into a final result set.

The following steps outline how to create a Data Type within a folder of the (local resources) of the Content Model that will be configured to find a result that can allow it to be leveraged as a Positive Extractor for the Closing Disclosure Document Type, which will allow for classification to succeed.

Creating a Rules Extractor For Classification
  1. Select the (local resources) folder and right-click to bring up the command menu.
  2. Select Add > Folder....
  3. In the Add New Folder window name it Classification Extractors.
Grooper ace architect organize 007.png
  1. Select the newly created Classification Extractors folder and right-click to bring up the command menu.
  2. Select Add > Data Type....
  3. In the Add New Data Type window name it CLAS - Closing Disclosure (after the Document Type it is meant to be used for).
    • The choice for naming of this object is described in detail in the Asset Management article.
Grooper ace architect organize 008.png
  1. Select the newly created CLAS - Closing Disclosure Data Type.
  2. Select the Pattern property and click the ellipsis button which will open the Pattern Editor window.
Grooper ace architect organize 009.png

The Pattern Editor interface allows us to write regular expressions and apply some properties to our pattern to return specific data sets.

For this pattern, we want to return some value we know will be on every Closing Disclosure form.

  1. In the Value Pattern area type:
  2. Closing Disclosure
  3. Notice the results being returned in the Results list-view, as well as a value being highlighted in a green polygon in the Document Viewer.
    • At this point you would want to click through a sampling of other documents in the Batch Viewer at the bottom to make sure this pattern returns results on other documents.
Grooper ace architect organize 010.png
Setting the Rule on the Document Type
  1. Select the Closing Disclosure Document Type.
  2. Set the Positive Extractor property to Reference from the drop-down menu.
    • Earlier, when we set the Classification Method property to Rules-Based on the Closing Disclosures Content Model it exposed the Positive Extractor and Negative Extractor properties on the Closing Disclosure Document Type.
Grooper ace architect organize 011.png
  1. After setting the Positive Extractor property to Reference you can expand to expose the Extractor property. Select it, then from the drop-down menu tree view select the CLAS - Closing Disclosure Data Type from:
  2. Closing Disclosures • (local resources) > Classification Extractors
Grooper ace architect organize 012.png
Applying Classification
  1. Select the Closing Disclosures Content Model.
  2. Click the Classification Testing tab.
  3. Click the Classify Batch button.
  4. In the Classification Tester Settings window make sure the Folder property is set to 1 and that the Classification Level property is set to DocType.
  5. Click the Execute button.
Grooper ace architect organize 013.png
  1. In the Operation in Progress window you will see a green progress bar move.
  2. As it does you will see the documents in the batch go from a generic name of Document to now be classified as Closing Disclosure.
    • While this is a "testing" tab, the Classify Batch button does apply a permanent change to whatever Batch you run it against. It is not a temporary, in memory, sort of situation.
  3. Notice, based on the Rules-Based approach that the system is 100% confident this is the correct document type.
    • Again, this is because the extractor supplied to the Positive Extractor property of the Closing Disclosure Document Type returned any result.
Grooper ace architect organize 014.png

Collect Data from the Documents

The goal of this section is to demonstrate extraction techniques by guiding you through new concepts. Given the scope of the documents, it would be a hinderance to walk you through creating each and every extractor necessary to have 100% extraction of the documents. Much of what is needed to reach that goal is simply duplicating a specific extractor to meet a slightly different data point.
Given that, this article will guide you through a principle once, and rely on you to repeat the technique to practice and to fill out your model. You by no means need to build the complete model, but doing as much as you feel is necessary to get enough practice is really the point.
It will also be of benefit to point out tricky parts of the documents and point out the specific techniques needed to overcome those challenges.

Key-Value Pair Collation

It's not a stretch to think of Grooper as an A.I. because you are ultimately building a system that intelligently collects information from documents in a way much like a person would. In fact, when asking how to do something in Grooper it's best to start by thinking how a person would do something, then learn how to make Grooper perform that behavior.
Many of the data points on this set of documents consist of a label, denoting what the piece of data is, followed closely by a value. You as a person look at this relationship and realize what the information is as a result. We can tell Grooper to also understand this relationship by leveraging a Collation method on a Data Type called a Key-Value Pair.

Collecting a Date
The First Value Extractor

Starting at the beginning of the document, the first thing to collect is a date. Let's build an extractor that will collect all the dates on the document.

  1. Right-click on the (local resources) folder.
  2. Select Add > Folder... from the command menu.
  3. Name the new folder Value Extractors.
Grooper ace architect collect key-value pairs collecting a date 001.png
  1. Right-click the newly created Value Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it VAL - Date.
Grooper ace architect collect key-value pairs collecting a date 002.png
  1. Right-click the newly created Val - Date Data Type.
  2. Select Add > Data Format... from the command menu.
  3. Name it ##/##/####.



The last Data Type we created we wrote the pattern locally to the Data Type. For this Data Type, because I know we’re going to need more than one pattern, we’re going to handle the different patterns individually in their own Data Format.
Data Format objects are children of Data Type objects, and represent a distinct format for the Data Type. In our example our documents have dates representing the year with 4 numeric places, and others with two.

Grooper ace architect collect key-value pairs collecting a date 003.png

You should notice that the Data Format's interface looks just like the pattern editor for a Data Type, because that’s basically all it is. Data Formats are just that, patterns that return a specific result. Data Types, while being able to have their own internal pattern, are more about the complex logic that sits on top and allows the manipulation of a returned result to create something more specific.

  1. In the Value Pattern area type or copy/paste the following:
  2. (?<Month>1[012]|0?[1-9])/
    (?<Day>3[01]|[12][0-9]|0?[1-9])/
    [012][089][0-9]{2}
  • You may notice in Grooper that you can organize your regex code by returning the line. When doing so, it puts a soft pilcrow character at the end of the line. You do not type this, it’s handled by Grooper.
  • You will also notice the first and second lines are in named capture groups. This is mainly to contain the "or pipe" in the regex pattern. Aside from that, in Grooper you can easily make named capture groups by one of three ways:
    1. Manually type the required regex syntax for a named capture group.
    2. Highlight the block of text you want grouped and use the CTRL+G hotkey.
    3. Highlight the block of text you want grouped, then right-click to bring up the command menu and select the Create Group object command.
  1. While you may write your regex pattern with line breaks to make it easier to read, it’s still returned to the system as if it were one line. You can see why this is a bit more difficult to read this way, hence the usefulness of being able to use line breaks in your patterns.
  2. You can see results highlighted in the Document Viewer and in the Results list-view.
Grooper ace architect collect key-value pairs collecting a date 004.png
  1. Right-click the VAL - Date Data Type.
  2. Select Add > Data Format... from the command menu.
  3. Name it ##/##/##.
Grooper ace architect collect key-value pairs collecting a date 005.png
  1. In the Value Pattern area type or copy/paste the following:
  2. (?<Month>1[012]|0?[1-9])/
    (?<Day>3[01]|[12][0-9]|0?[1-9])/
    [0-9]{2}
  3. Go to page 2 of 5 in the Document Viewer.
  4. You’ll see this pattern successfully collects these shorter date formats.
    • You will probably also notice that it is returning results you may not want, as they are part of longer date formats. We’ll fix this in a moment.
Grooper ace architect collect key-value pairs collecting a date 006.png
  1. Select the VAL - Date Data Type, and notice it is returning the results from all its child objects.
  2. Go back to page 1 of 5 in the document viewer and zoom in on the top three dates in the upper right of the page.
  3. You can see the problem mentioned previously, in the overlapping green polygons of the returned results. We only want one result being returned for these.
Grooper ace architect collect key-value pairs collecting a date 007.png
  1. To rectify this problem, set the Deduplicate By property of the Data Type to Area.
    • The Deduplicate By property specifies the mode to be used for deduplicating overlapping results. It can be set a few different ways, but the option we chose allows the data instance occupying the largest geometric region to win. Therefore, the longer date is the one that is kept, and the shorter one is thrown away.
  2. Having set this value, clicking the Save button, then clicking the Test Single button you should now see there are no longer overlapping green boxes, a visual indication this is doing what we want.
    • It’s worth mentioning again that the best practice after getting an extractor to a good point is verify by click through several documents in the Document Viewer and seeing how it works on the other documents.
Grooper ace architect collect key-value pairs collecting a date 008.png
Specifying a Result via Key-Value Pair

We now have an extractor that works well for accurately colleting all the dates in the documents, but we need to specifically collect the first date: Date Issued. To do this we will narrow down to a specific result by leveraging the Collation property set to Key-Value Pair on a new Data Type. In doing so, we'll also begin to build a foundational knowledge of how/why referencing other, already created objects, in Grooper is so powerful.

We've already created something that is capturing the value we're after, but we need to create an extractor to collect the key. The key is the piece of information, you can think of it as a label, that is near the value that tells you what the value near it is. Again, we're looking for the Date Issued.

  1. Right-click the (local resources) folder.
  2. Select Add > Folder... from the command menu.
  3. Name it Key Extractors.
Grooper ace architect collect key-value pairs collecting a date 009.png
  1. Right-click the newly created Key Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it KEY - Date Issued.
Grooper ace architect collect key-value pairs collecting a date 010.png
  1. With the newly created KEY - Date Issued Data Type selected, select the Pattern property and click the ellipsis button to bring up the Pattern Editor window.
Grooper ace architect collect key-value pairs collecting a date 011.png
  1. In the Value Pattern area type or copy/paste the following:
  2. Date Issued
  3. Notice the result highlighted in the Document Viewer and listed in the Results list-view.
    • It is critical when making a "key" that it only return one result, so be sure to test you pattern against other documents to make sure.
Grooper ace architect collect key-value pairs collecting a date 012.png



Continuing the creation of our Key-Value Pair along, let's now create the objects that will combine what we've created so far into a unit that will return a single result.

  1. Right-click the (local resources) folder.
  2. Select Add > Folder... from the command menu.
  3. Name it Key-Value Pair Extractors.
Grooper ace architect collect key-value pairs collecting a date 013.png
  1. Right-click the newly created Key-Value Pair Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it KVP-H - Date Issued.
Grooper ace architect collect key-value pairs collecting a date 014.png
  1. Right-click the newly created KVP-H - Date Issued Data Type.
  2. Select Add > Data Type... from the command menu.
  3. Name it KVP-H - Date Issued - Key.



Previously we created a Data Format as a child of a Data Type which served to simply return a value from a pattern. But, it is possible to make Data Types the children of Data Types and leverage their more advanced extraction properties to pass results to their parent. In this case we’re going to use the ability of a Data Type to reference other objects.

Grooper ace architect collect key-value pairs collecting a date 015.png
  1. Right-click the newly created KVP-H - Date Issued - Key Data Type.
  2. Select Clone from the command menu.
    • Notice this object-command has a hotkey of CTRL+SHIFT+C which you can use instead of the command menu.
  3. Replace the word Key with the word Value in the name of the object.



Cloning was just a shortcut that let us more quickly name the new Data Type. You could have created it like we have every other object, I’ve just found this to be a convenient and quick method.

Another thing to consider is that the names of the children of the Data Type that will ultimately be setup as the Key-Value Pair are completely arbitrary. The names given are specific to create a best practice. What’s important is the order of the children. The first child is always the "key" and the second the "value".

Grooper ace architect collect key-value pairs collecting a date 016.png
  1. The newly created KVP-H - Date Issued - Value Data Type should be the currently selected object.
  2. Select the Referenced Extractors property and click the ellipsis button.
Grooper ace architect collect key-value pairs collecting a date 017.png
  1. In the Referenced Extractors window click the Add button.
    • We are going to add just one reference in this case, but you could add many. This window displays the list of the references you add. The results returned would be in order of top to bottom of the list.
Grooper ace architect collect key-value pairs collecting a date 018.png
  1. In the Select Items window from the drop-down tree view select VAL - Date from:
Closing Disclosures • (local resources) > Value Extractors



Notice here that you select the item by putting a check next to what you want, not merely clicking it, because this menu lets you select multiple items at once.

Grooper ace architect collect key-value pairs collecting a date 019.png
  1. With the Referenced Extractors property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now functioning just like (returning the same results from) the VAL - Date Data Type.
    • So, the question here might be, "Why make the other Data Type and have this one point to it?" The VAL - Date Data Type now serves as a powerful tool to extract all dates, and can be used anywhere we need it based on specific conditions. By referencing it, we do not need to "reinvent the wheel", per say, every time we need to extract a date. Consider this page alone as 3 dates on it representing three different values.
Grooper ace architect collect key-value pairs collecting a date 020.png
  1. Select the KVP-H - Date Issued - Key Data Type.
  2. Select the Referenced Extractors property and click the ellipsis button to bring up the Referenced Extractors window, and click the Add button in it.
  3. In the Select Items window from the drop-down tree view select KEY - Date Issued from:
 Closing Disclosures • (local resources) > Key Extractors



This is perhaps a slightly more esoteric use case than the previous reference, as you may be thinking, "Why would I reference the key somewhere else? Therefore, why make the other object and have this reference it, instead of just having this object contain the regex and reduce object count?" That’s not necessarily wrong, and could be an approach you take. However, the steps described here are meant to help establish a best practice. The best practice in this case developing your (local resources) in such a way as to compartmentalize objects based on their function, as much as possible, to make it easier to understand what is doing what, and why, and by as many people as possible.

Grooper ace architect collect key-value pairs collecting a date 021.png
  1. With the Referenced Extractors property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now functioning just like (returning the same result from) the KEY - Date Issued Data Type
Grooper ace architect collect key-value pairs collecting a date 022.png
  1. Select the KVP-H - Date Issued Data Type.
  2. Notice the default setting of the Collation property is Individual.
  3. This allows all the results of the children of the Data Type to return their results individually, but this is not how we want this Data Type to function. We need it to return one result: that of the Date Issued value.
Grooper ace architect collect key-value pairs collecting a date 023.png
  1. Select the Collation property and click the drop-down button.
  2. In the drop-down menu select Key-Value Pair.
Grooper ace architect collect key-value pairs collecting a date 024.png

Okay, so we’re almost there. We now need to consider the spatial relationship between the "key" and the "value". So go back to the idea of "how does a person" know, so we can translate it to "make Grooper do it". If Date Issued is the key, and the date near it the value, how do you know? Well the "value" in this case is to the right of the "key", which to Grooper is a Forward, Horizontal relationship.

  1. Click the drop-down arrow next to the Collation property, made available after setting it to Key-Value Pair, to expose this setting’s sub-properties.
    • These sub-properties have mainly to do with setting, and fine tuning the spatial relationship discussed previously.
  2. The only property you need to change from default (for now) is setting Horizontal Layout to Enabled.
    • Notice when you do this it too has sub-properties you can adjust, which have to do with preventing false positives by narrowing in to a specific result. Again, these can be left default for now, but feel free to play around. Also, notice the Horizontal Direction property, above it, was exposed, which again, Forward, means "to the right of".
    • You may have noticed too that these "hidden" properties are appearing more and more as you explore Grooper, and that’s true. Given that your chief means of interacting with Grooper is by manipulating property grids, it would be disadvantageous to the end user to always expose all properties. It does, however, mean that you may not know a property exists until you manipulate another. Therefore, it is highly encouraged that you:
      1. Explore Grooper to gain comfort.
      2. Lean on the Wiki for learning.
      3. Discuss with us on GrooperXChange.
Grooper ace architect collect key-value pairs collecting a date 025.png
  1. With the Collation property set to Key-Value Pair...
  2. ...and the Horizontal Layout property set to Enabled (and with a quick press of the Save and Test Single buttons)...
  3. ...you should now see that this Data Type is now only returning one, specific date: that of the Date Issued.
Grooper ace architect collect key-value pairs collecting a date 026.png
Supplying the Result to the Data Model

At this point we've built an excellent tool for collecting the data we need, but we now need to put what was collected into a bucket. We do this by adding a Data Element to the (data model). There are four types of Data Elements in Grooper:

  1. Data Field
    • This is the most straight forward Data Element and is meant to contain a single instance of data.
  2. Data Table
    • Data Table objects are used to describe tabular data. The columns in the table are defined by adding child...
  3. Data Column
    • These work like a Data Field but represent the possibility for multiple instances of data, as a Data Table may return multiple rows of information.
  4. Data Section
    • A Data Section allows the content of a document to be subdivided into sections (single, or multiple/repeating) for further processing.



The Date Issued is a single piece of data, therefore, we will add a Data Field to our (data model).

  1. Right-click the (data model).
  2. Select Add > Data Field... from the command menu.
  3. Name it Date Issued.
Grooper ace architect collect key-value pairs collecting a date 027.png
  1. Select the newly created Date Issued Data Field.
  2. Select the Value Type property and click the drop-down arrow.
  3. From the drop-down menu select Date Time.
Grooper ace architect collect key-value pairs collecting a date 028.png
  1. With the Value Type property set, you can expose its sub-properties by clicking the arrow to the left of the property, and set the Format Specifier property to d.
  2. With the Format Specifier property selected, you can see in the Grooper Help some details about it. It outlines a few examples, and the one we want is listed with some information about it. These are all .Net format specifiers, so you can use whatever type conforms to that standard.



The main reason for setting the Value Type and subsequent Format Specifier is for accuracy of data, as Data Elements supplied with data not conforming to their Value Type will be flagged as invalid. You could leave every field as a String type, but this could allow for bad information, as well as not conform to whatever backend system you may be using to store the data.

Grooper ace architect collect key-value pairs collecting a date 029.png
  1. Right-click the selected Date Issued Data Field.
  2. Select Add > Data Type... from the command menu.
  3. Name it Value Extractor - Date Issued.
Grooper ace architect collect key-value pairs collecting a date 030.png
  1. Select the Referenced Extractors property and click the ellipsis button to bring up the Referenced Extractors window.
  2. In the Referenced Extractors window click the Add button to bring up the Select Items window.
  3. In the Select Items window from the drop-down tree view select KVP-H - Date Issued from:
Closing Disclosures • (local resources) > Key-Value Pair Extractors
Grooper ace architect collect key-value pairs collecting a date 031.png
  1. With the Referenced Extractors property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now functioning just like (returning the same result from) the KVP-H - Date Issued Data Type.
Grooper ace architect collect key-value pairs collecting a date 032.png
  1. Select the Date Issued Data Field.
  2. Select the Value Extractor property, click the drop-down arrow, and in the drop-down menu select Reference.
Grooper ace architect collect key-value pairs collecting a date 033.png
  1. With the Value Extractor property set, you can expose its sub-properties by clicking the arrow to the left of the property.
  2. Select the Extractor property, click the drop-down arrow, and in the drop-down menu select the Value Extractor - Date Extracted Data Type from:
(data model) • Date Issued


The question at this point might be, "Why make a Data Type the child of the Data Field to reference another Data Type, then have the Data Field reference the child Data Type that's referencing another Data Type, when the Data Field itself can reference a Data Type?" And while that’s a mouthful, it is a reasonable question worth asking. The answer is like others asked about these idiosyncratic methods. It is mainly to establish best practices.
Technically a Data Field can only reference one Data Element, while a Data Type can reference many. If you’re in a situation where you realize your Data Field needs to point to more than one Data Element, and you’ve set it up the way I just described, you can now simply have the child Data Type reference as many other Data Elements as you deem necessary, and they’ll return in the correct waterfall order. That order is as follows:

  1. Internal pattern
  2. Child Data Elements
  3. Referenced extractors in order from top to bottom in their list.
Grooper ace architect collect key-value pairs collecting a date 034.png
  1. With the Data Field setup and ready to go, click on the (data model).
  2. Click the Test Extraction button.
  3. Because you have a properly classified document selected in the Batch Viewer, you should now see the Data Field populated with data in the Data Model Test Results area, as well as a result highlighted in the Document Viewer.
Grooper ace architect collect key-value pairs collecting a date 035.png
Cloning Objects to Save Time and Reduce Clicks

Let's collect another date, the Closing Date, and use it as an opportunity to learn how to save time and clicks by cloning objects we've already made.

Replicating the Extractor
  1. Right-click the Key Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it KEY - Closing Date.
Grooper ace architect collect key-value pairs COSTC 001.png
  1. Select the Pattern property and click the ellipsis button to bring up the Pattern Editor.
Grooper ace architect collect key-value pairs COSTC 002.png
  1. Type or copy/paste the following in the Value Pattern area:
  2. Closing Date
  3. Notice the result being shown in the Document Viewer and listed in the Results list-view.
Grooper ace architect collect key-value pairs COSTC 003.png
  1. Select the KVP-H - Date Issued Data Type and right-click it.
  2. Select Clone from the command menu.
  3. Name it KVP-H - Closing Date.
Grooper ace architect collect key-value pairs COSTC 004.png
  1. After cloning, be sure to rename the child Data Types to match their parent. In this case, changing the Date Issued portion to Closing Date.
Grooper ace architect collect key-value pairs COSTC 005.png
  1. Select the KVP-H Closing Date - Key Data Type.
  2. Right-click the Referenced Extractors property.
  3. Select Reset... from the command menu, to clear the reference.
Grooper ace architect collect key-value pairs COSTC 006.png
  1. With the Referenced Extractors property selected, click the ellipsis button to bring up the Referenced Extractors window.
  2. Click the Add... button from the Referenced Extractors window to bring up the Select Items window.
  3. In the Select Items window from the drop-down tree view select Key - Closing Date from:
Closing Disclosures • (local resources) > Key Extractors
Grooper ace architect collect key-value pairs COSTC 007.png
  1. Select the parent KVP-H - Closing Date Data Type.
  2. Notice the results being shown in the Document Viewer and listed in the Results list-view.
Grooper ace architect collect key-value pairs COSTC 008.png
Replicating the Data Element
  1. Right-click the Date Issued Data Field.
  2. Select Clone from the command menu.
  3. Name it Closing Date.
Grooper ace architect collect key-value pairs COSTC 009.png
  1. Select the child Data Type of this newly cloned Data Field and be sure to rename it to Value Extractor - Closing Date.
  2. Right-click the Referenced Extractors property.
  3. Select Reset... from the command menu, to clear the reference.
Grooper ace architect collect key-value pairs COSTC 010.png
  1. With the Referenced Extractors property selected, click the ellipsis button to bring up the Referenced Extractors window.
  2. Click the Add... button from the Referenced Extractors window to bring up the Select Items window.
  3. In the Select Items window from the drop-down tree view select KVP-H - Closing Date from:
Closing Disclosures • (local resources) > Key-Value Pair Extractors



When the Data Field was cloned, it carried over a reference to its child Data Type, not the child Data Type of the Data Field that was cloned. Because of this, having changed the reference for this Data Type, the newly cloned Data Field is ready to go (including its Value Type property!)

Grooper ace architect collect key-value pairs COSTC 011.png
  1. Select the (data model).
  2. Click the Test Extraction button.
  3. Because you have a properly classified document selected in the Batch Viewer, you should now see both Data Fields populated with data in the Data Model Test Results area, as well as a result highlighted in the Document Viewer.
Grooper ace architect collect key-value pairs COSTC 012.png
Generic Text Extractor

Frequently with regular expression we are focused on using syntax that will find things we want. But like the importance of negative space in art, it can be just as important in regular expressions to use syntax that says what to not get.
One of our engineers found a very powerful pattern that has lovingly earned the nickname of the cheat code. Its function is to, essentially, find every block of logical text on a document. This is incredibly useful, especially with native text electronic documents.

A word of warning, however: While using the cheat code can be incredibly useful, be fully aware of what it is doing and know when it can get you into trouble. If you don't use it sparingly, or at least carefully plan when/where to use it, you can find yourself getting bad data, as it is simply returning whatever text is available, with no correction or validation.
Despite that, on this document set, you'll see how useful it really is.

Creating the Cheat Code

It's quite deceiving how powerful this is extractor is once you see how simple the regular expression is. There are some important components that make it work that are worth covering, so let's get to it.

  1. Right-click the Value Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it VAL - Generic Text Segment.
Grooper ace architect collect key-value pairs generic text 001.png
  1. With the newly created VAL - Generic Text Segment Data Type selected ...
  2. Click the Pattern property and click the ellipsis button to bring up the Pattern Editor window.
Grooper ace architect collect key-value pairs generic text 002.png
  1. In the Value Pattern area type or copy/paste the following:
  2. [^\n\r\t\f]+
  3. Notice in the Document Viewer and in the Results list-view the interesting results being returned.



So, starting the character set [] with the caret symbol ^ is saying "not the following...". We then have the newline feed and carriage return meta characters \n\r, which will be more clear in the next image, but essentially Grooper ends every line with this combination of characters. The form feed character \f is inserted between pages in a multipage document. These are all special white space meta characters that are "on" by default in Grooper. The tab character \t is not on by default, but can be activated to allow the reading of large spaces as a tab.
This unique combination of meta characters, followed by the one-to-many quantifier +, as it is, is essentially letting us get every logical "line" of text, but we can make it more powerful by it giving us "segments" of text by enabling the tab.

Grooper ace architect collect key-value pairs generic text 003.png
  1. Click the Text tab to see a new means of viewing the text of our documents.
    • So far, we’ve mainly used the Document Viewer, which is very useful for understanding the relationship the recognized characters have to their form. The Text view is much better at allowing us to analyze the pure text better. Although you see, in the Document Viewer the text seemingly laid out as if "behind" the images of the pages, it’s very important to understand that Grooper actually is synthesizing the text that are on the same horizontal plane in a single line.
    • This view makes it easier to understand how the current regex syntax is returning every logical line.
  2. Here you can see the aforementioned combination of the carriage return and newline feed \r\n at the end of every line. Ignore the greater than/lessthan symbols <> that are surrounding the meta characters.
Grooper ace architect collect key-value pairs generic text 004.png

Let’s enable tab marking to see how this changes things.

  1. Click the Properties tab.
  2. Click the drop-down arrow to the left of the Preprocessing Options property to expose its sub-properties.
  3. Select the Tab Marking property and set it to Enabled.
Grooper ace architect collect key-value pairs generic text 005.png
  1. Notice now all the "tabs" that have been inserted.



Essentially, what is happening here is gaps that are large enough to pass a certain programmatic threshold are no longer processed as single spaces, but instead the tab meta character \t. Given that, our "cheat code" pattern is now, very usefully, returning every segment of text, not every line.

Grooper ace architect collect key-value pairs generic text 006.png
  1. Go back to the Image tab.
    • From here you should now have a better/different visual of what this is doing.
  2. While enabling the tabs \t got us very close to our end goal, you can see that there are a couple of places where the gaps are still being read as spaces, and not tabs.
    • This is made visible by the fact that the green polygon highlight is spanning the gap and grabbing more text.
    • Thankfully, like with most things in Grooper there are properties we can adjust to compensate for this.
Grooper ace architect collect key-value pairs generic text 007.png
  1. Click the drop-down arrow next to the Tab Marking property (made available after having set it to Enabled.)
  2. Set the Minimum Tab Width property to 0.125.
    • Space characters which have a physical, horizontal width greater than this value will be converted to tab characters. Units are in inches.
  3. Set the Character Size Ratio property to 125%.
    • Space characters which have a horizontal width greater than the height of the previous character times this ratio will be converted to tab characters.



Simply put, by reducing these numbers we’ve allowed the threshold by which a gap could be considered a tab to be much smaller.
We didn’t enable the Detect Lines property, but this is another extremely useful property. When enabled, the lines that were detected from the IP Profile we made earlier can act as tab characters. This is incredibly useful in situations like tabular data when information is very close in horizontal proximity, but one can easily differentiate one value from another because of the lines that form a data cell. Leveraging the lines allows Grooper to do the same.

Grooper ace architect collect key-value pairs generic text 008.png
Excluding Field Labels

As you can see, being able to collect every segment of text is highly valuable. However, we're concerned with the values of the documents, not the labels. We can, however, create a list of what we don't need, and exclude that from our extractor.

  1. Right-click the (local resources) folder.
  2. Select Add > Folder... from the command menu.
  3. Name it Lexicons.
Grooper ace architect collect key-value pairs generic text 009.png
  1. Right-click the newly created Lexicons folder.
  2. Select Ad > Lexicon... from the command menu.
  3. Name it Field Labels.
Grooper ace architect collect key-value pairs generic text 010.png
  1. With the newly created Field Labels Lexicon selected ...
  2. ...set the Type property to Vocabulary.
    • A Vocabulary Lexicon contains a list of values, one per line.
  3. Copy and paste the following into the Local Entries area:
Borrower
Closing Date
Closing Information
Date Issued
Disbursement Date
File #
Lender
Loan ID #
Loan Information
Loan Term
Loan Type
MIC #
Product
Property
Purpose
Sale Price
Seller
Settlement Agent
Transaction Information


Grooper ace architect collect key-value pairs generic text 011.png
  1. Right-click the (local resources) folder.
  2. Select Add > Folder... from the command menu.
  3. Name it Exclusion Extractors.
Grooper ace architect collect key-value pairs generic text 012.png
  1. Right-click the newly created Exclusion Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it EXCL - Field Labels.
Grooper ace architect collect key-value pairs generic text 013.png
  1. With the newly created EXCL - Field Labels Data Type selected ...
  2. ... select the Pattern property and click the ellipsis button to bring up the Pattern Editor window.
Grooper ace architect collect key-value pairs generic text 014.png
  1. Click on the Properties tab.
  2. Select the Mode property and set it to FuzzyList from the drop-down menu.
Grooper ace architect collect key-value pairs generic text 015.png
  1. Select the Lookup Options property and click the ellipsis button to bring up the Lookup Options window.
  2. In the Lookup Options window click the drop-down arrow next to the Vocabulary property to expose its sub-properties.
  3. Click the Included Lexicons property and in the drop-down tree view select the Field Labels lexicon from:
Closing Disclosures • (local resources) > Lexicons
Grooper ace architect collect key-value pairs generic text 016.png
  1. With the Mode property set to FuzzyList and the lookup Options properly referencing the new Lexicon...
  2. ... you can see this extractor is now returning results from the entries of the Lexicon.



This list of results is not every label on the document set. You could now go back to the Field Labels Lexicon and add more entries to update this extractor. You wouldn’t need to change anything on the extractor, or press update or anything. The link is active and will simply add to the results if you put more in the Lexicon.

Writer's note - Before completing this extractor there are a couple of adjustments that should be mentioned. First, in the Preprocessing Options set Tab Marking to Enabled and set the Minimum Tab Width to 0.125. Next, in the Pattern Editor, set the Prefix Pattern to \n|\t and set the Suffix Pattern to \t|\r.

Grooper ace architect collect key-value pairs generic text 017.png

The goal now is to exclude the results from the previously made extractor from our generic text extractor. That way, the only results being returned from it are values we want, and no labels.

  1. Select the VAL - Generic Text Segment Data Type.
  2. Select the Exclusion Extractor property and select Reference from the drop-down menu.
Grooper ace architect collect key-value pairs generic text 018.png
  1. Once the Exclusion Extractor property is set to Reference, click the drop-down arrow to the left to expose its sub-properties.
  2. Select the Extractor property and in the drop-down tree view select the EXCL - Field Labels Data Type from:
Closing Disclosures • (local resources) > Exclusion Extractors
Grooper ace architect collect key-value pairs generic text 019.png
  1. With the Exclusion Extractor property set (and with a quick press of the Save and Test Single buttons)...
  2. ... you’ll see that we are now getting results EXCEPT those returned by the Exclusion Extractor.
Grooper ace architect collect key-value pairs generic text 020.png
Generic Text in a Key-Value Pair

With our generic text extractor made, we can now put it to use.
Let's collect the Settlement Agent. The value for this field is a name. Think about the construction of the names of people, and how incredibly complex it can be. It would be quite difficult to make an extractor that would accurately compensate for all that variation. We can, however, allow our generic text extractor to show us why it's so useful.

  1. Right-click the Key Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it KEY - Settlement Agent.
Grooper ace architect collect key-value pairs generic text 021.png
  1. With the newly created KEY - Settlement Agent Data Type selected ...
  2. ... select the Pattern property and click the ellipsis button to bring up the Pattern Editor window.
Grooper ace architect collect key-value pairs generic text 022.png
  1. In the Value Pattern area type or copy/paste the following:
  2. Settlement Agent
  3. Notice the result being displayed in the Document Viewer (from page 1), but also notice in the Results list-view that a second result is returned.
    • We don’t want the "key" of a "key-value pair" to return more than one result, so we need to find a difference between these two results and adjust our regex syntax to eliminate the second.
Grooper ace architect collect key-value pairs generic text 023.png
  1. With the second result selected, notice in the Document Viewer that it is on page 5.
  2. You can see this result seems like it’s a column header for a table of information.
Grooper ace architect collect key-value pairs generic text 024.png
  1. Select the Text tab so we can get a deeper understanding of the actual text, separated from the image of the page view.
  2. You can see the result still highlighted, and the thing to understand about this particular result is that it has a bunch of text and spaces before it.
Grooper ace architect collect key-value pairs generic text 025.png
  1. The result we do want to act as our key is at the beginning of a new line, which we can use to our advantage.
    • Understand that the carriage return and newline feed meta characters \r\n happen at the end of every line, but Grooper still thinks of them in a linear flow. What this means is that the character JUST before the Settlement Agent we want to be our key is a newline feed meta character \n. Therefore, we can use that as an anchor to differentiate it from the second, unwanted result.
Grooper ace architect collect key-value pairs generic text 026.png
  1. In the Prefix Pattern area type or copy/paste the following:
  2. \n
  3. Notice now the single result in the Document Viewer and listed in the Results list-view.
Grooper ace architect collect key-value pairs generic text 027.png
  1. Right-click the KVP-H - Closing Date Data Type.
  2. Select Clone from the command menu.
  3. Name it KVP-H - Settlement Agent.



Either of the Data Types in this area could be cloned as they’re both using the desired Collation method.

Grooper ace architect collect key-value pairs generic text 028.png
  1. Be sure to rename the children of this newly cloned Data Type to reflect their parent’s name, then select the KVP-H - Settlement Agent - Value Data Type.
  2. Right-click the Referenced Extractors property.
  3. Select Reset... from the command menu to clear this reference.
Grooper ace architect collect key-value pairs generic text 029.png
  1. With the Referenced Extractors property selected click the ellipsis button to bring up the Referenced Extractors window.
  2. In the Referenced Extractors window click the Add... button to bring up the Select Items window.
  3. In the Select Items window select the VAL - Generic Text Segment Data Type from:
Closing Disclosures • (local resources) > Value Extractors
Grooper ace architect collect key-value pairs generic text 030.png
  1. With the Referenced Extractors property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now functioning just like (returning the same result from) the VAL - Generic Text Segment Data Type.
Grooper ace architect collect key-value pairs generic text 031.png
  1. Select the KVP-H - Settlement Agent - Key Data Type.
  2. Right-click the Referenced Extractors property.
  3. Select Reset... from the command menu to clear this reference.
Grooper ace architect collect key-value pairs generic text 032.png
  1. With the Referenced Extractors property selected click the ellipsis button to bring up the Referenced Extractors window.
  2. In the Referenced Extractors window click the Add... button to bring up the Select Items window.
  3. In the Select Items window select the KEY - Settlement Agent Data Type from:
Closing Disclosures • (local resources) > Key Extractors
Grooper ace architect collect key-value pairs generic text 033.png
  1. With the Referenced Extractors property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now functioning just like (returning the same result from) the KEY - Settlement Agent Data Type.
Grooper ace architect collect key-value pairs generic text 034.png
  1. Select the KVP-H - Settlement Agent Data Type.
  2. You can see this Data Type is now returning one generic text segment: the Settlement Agent.
Grooper ace architect collect key-value pairs generic text 035.png
Supplying the Result to the Data Model
  1. Right-click one of the Data Fields.
  2. Select Clone from the command menu.
  3. Name it Settlement Agent.
Grooper ace architect collect key-value pairs generic text 036.png
  1. Be sure to rename, then select the child Data Type (re)named Value Extractor - Settlement Agent.
  2. Right-click the Referenced Extractors property.
  3. Select Reset... from the command menu to clear this reference.
Grooper ace architect collect key-value pairs generic text 037.png
  1. With the Referenced Extractors property selected click the ellipsis button to bring up the Referenced Extractors window.
  2. In the Referenced Extractors window click the Add... button to bring up the Select Items window.
  3. In the Select Items window select the KVP-H - Settlement Agent Data Type from:
Closing Disclosures • (local resources) > Key-Value Pair Extractors
Grooper ace architect collect key-value pairs generic text 038.png
  1. With the Referenced Extractors property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now functioning just like (returning the same result from) the KVP-H - Settlement Agent Data Type.
Grooper ace architect collect key-value pairs generic text 039.png
  1. Select the (data model)
  2. Click the Test Extraction button.
  3. Notice the two dates extract fine, and the Settlement Agent does extract, but it’s in an error state represented by the field being a pinkish color, and the 1 issue listed beneath.
    • The problem here is that we cloned this Data Field from another Data Field whose Value Type property is set to DateTime. With that in mind, the name populating this field is an invalid value. We need to change this newly made Data Field's Value Type property.
Grooper ace architect collect key-value pairs generic text 040.png
  1. Select the Settlement Agent Data Field.
  2. Select the Value Type property and select String from the drop-down menu.
Grooper ace architect collect key-value pairs generic text 041.png
  1. Now go back and select the Data Model.
  2. Click the Test Extraction button again ...
  3. ... and now notice the data extracts fine with no errors.
Grooper ace architect collect key-value pairs generic text 042.png
Parsing Data from Generic Text Array

Collecting generic text segments is enormously useful, as we've seen. We can take this approach a little further to get more, too.

If working with Grooper will do anything, it will make you start to analyze the structure of documents more closely. It's like a language you begin to learn. You start to recognize patterns more readily, and why things are structured the way they are. Take this, in combination with the tools at your disposal with Grooper, and you start to understand how to manipulate this platform to get basically anything you want off of a document, by simply combining different sets of properties.

I say all of this because I want you to look at the Borrower value in the Transaction Information area at the top of these documents. What I'd like you to notice is that you can tell this block of information is a unit because it is tightly packed vertically. Compare this to disparate fields of information and you'll notice they are separated by more vertical space. We can use this to our advantage to not only collect this whole block, but then go further and parse the individual components from it as pieces of information.

Creating the Generic Text Array

To get at the individual components of what make up the address we're seeking, we first need to narrow in on address as a whole, so we can then more easily break the individual pieces out.

  1. Right-click on the (local resources) folder.
  2. Select Add > Folder... from the command menu.
  3. Name it Input Filter Extractors.
Grooper ace architect collect key-value pairs PDfGTA 001.png
  1. Right-click the Input Filter Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it INPT - Generic Text block.
Grooper ace architect collect key-value pairs PDfGTA 002.png
  1. With the newly created INPT - Generic Text Block Data Type selected...
  2. ... select the Referenced Extractors property and click the ellipsis button to bring up the Referenced Extractors window.
  3. In the Referenced Extractors window click the Add... button to bring up the Select Items window.
  4. In the Select Items window select the VAL - Generic Text Segment Data Type from:
Closing Disclosures • (local resources) > Value Extractors
Grooper ace architect collect key-value pairs PDfGTA 003.png
  1. Select the Collation property and click drop-down arrow.
  2. Select Array from the drop-down menu.
Grooper ace architect collect key-value pairs PDfGTA 004.png
  1. With the Collation property set to Array click the drop-down arrow to the left to expose its sub-properties.
  2. Set the Vertical Layout property to Enabled.
  3. With the Vertical Layout property set to Enabled click the drop-down arrow to the left to expose its sub-properties.
  4. Set the Maximum Distance property to 0.035in.
  5. Set the Result Separator property to \r\n.
Grooper ace architect collect key-value pairs PDfGTA 005.png
  1. Save and Test Single... to get results, then right-click on the top result (any result is fine...)
  2. Select Inspect Instance... from the command menu.
Grooper ace architect collect key-value pairs PDfGTA 006.png
  1. Click the Text View tab.
  2. This is the "beginning of the string" of this particular data instance...
  3. ...and this is the "end of the string".


This is a very important concept that you need to grasp. When you are looking for a result, it is done against an instance of data. Normally, that instance is whatever you have selected, whether it be the entire document, or perhaps an individual page. In those circumstances the beginning of the string is at the beginning of the entirety of what you have selected, while the end is at the very end. This is critical because in your regular expressions, if you use the beginning of string meta character ^, or the end of string meta character $, it is going to anchor off the beginning or end of the entire selection.

When data is being returned from an extractor, that result is considered the instance of data, and therefore, the beginning and end of string are at the beginning and end of that particular result, not the document or page.

Understanding that, we can now leverage a result from an extractor as an Input Filter for another extractor to narrow its scope to a specific result, not the entire document or page.

Grooper ace architect collect key-value pairs PDfGTA 007.png
Separating the Array into Sub-Elements

The goal of this section is to look within a specific instance of data and write a regular expression with named capture groups to return specific pieces of the data as individual results.

  1. Right-click on the Value Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it VAL - Address Block.
Grooper ace architect collect key-value pairs PDfGTA 008.png
  1. With the newly created VAL - Address Block Data Type selected, set the Input Filter property to Reference.
  2. With the Input Filter property set to Reference click the drop-down arrow to the left to expose its sub-properties.
  3. Select the Extractor property and click the drop-down arrow.
  4. From the drop-down tree view select the INPT - Generic Text Block Data Type from:
Closing Disclosures • (local resources) > Input Filter Extractors
Grooper ace architect collect key-value pairs PDfGTA 009.png
  1. With the Input Filter property set (and with a quick press of the Save and Test Single buttons) you may notice that nothing is being returned.
    • As of yet, we haven’t told the extractor what to collect. Setting the Input Filter property merely narrows the scope of what to extract against.
Grooper ace architect collect key-value pairs PDfGTA 010.png
  1. Select the Pattern property and click the ellipsis button to bring up the Pattern Editor window.
Grooper ace architect collect key-value pairs PDfGTA 011.png
  1. In the Value Pattern area type or copy/paste the following:
  2. (?<BorrowerName>[^\r]+)\r\n
    (?<StreetAddress>[^\r]+)\r\n
    (?<City>[^,]+),\s
    (?<State>[A-Z]{2})\s
    (?<Zip>[0-9]{5})
  • Unfortunately, the Pattern Editor window is not aware of settings from the Data Type the pattern is being written for. As a result, even though an Input Filter is being used, the Pattern Editor window is working against the entire current selection (page or document). This can make writing a pattern when using something like an Input Filter very unintuitive. Do your best to be aware of the instance of data you’re working with and write your pattern accordingly.
  1. In the Prefix Pattern area type or copy/paste the following:
  2. ^
  3. In the Suffix Pattern area type or copy/paste the following:
  4. $
Grooper ace architect collect key-value pairs PDfGTA 012.png
  1. With the Input Filter ...
  2. ....and Pattern properties set (and with a quick press of the Save and Test Single buttons)...
  3. ...you should see two results in the Document Viewer and listed in the Results list-view.
Grooper ace architect collect key-value pairs PDfGTA 013.png

These results deserve a closer look to truly understand what’s going on.

  1. Select one of the results from the Results list-view and right-click it.
  2. Select Inspect Instance... from the command menu.
Grooper ace architect collect key-value pairs PDfGTA 014.png

In the Instance Viewer window start by clicking on VAL - Address Block and notice this is the entire result. However, as you click on the other, sub-results, first notice that they’re named after the named capture groups from the regular expression just written. Next, notice the individual values being returned, as well as being displayed in the Document Viewer.

Grooper ace architect collect key-value pairs PDfGTA 015.gif
Parsed Array in a Key-Value Pair

We can now use this extractor and differentiate the Borrower from Seller by leveraging yet another key-value pair.

  1. Right-click on the Key Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it KEY - Borrower.
Grooper ace architect collect key-value pairs PDfGTA 016.png
  1. With the newly created KEY - Borrower selected, select the Pattern property and click the ellipsis button to bring up the Pattern Editor window.
Grooper ace architect collect key-value pairs PDfGTA 017.png
  1. In the Value Pattern area type or copy/paste the following:
Borrower
  1. Notice this is returning many results, so we need to narrow it down.
Grooper ace architect collect key-value pairs PDfGTA 018.png
  1. Click the Properties tab.
  2. Expand the Preprocessing Options property and set the Tab Marking property to Enabled.
Grooper ace architect collect key-value pairs PDfGTA 019.png
  1. Back in the Pattern Editor tab, in the Prefix Pattern area type or copy/paste the following:
  2. \t
  3. This tab anchor was enough to narrow it down to a single result.
Grooper ace architect collect key-value pairs PDfGTA 020.png
  1. With the Pattern property set (and with a quick press of the Save and Test Single buttons)...
  2. ... you should now see this extractor returning a single result in the Document Viewer as well as in the Results list-view.
Grooper ace architect collect key-value pairs PDfGTA 021.png
  1. Right-click on the KVP-H - Settlement Agent Data Type.
  2. Select Clone from the command menu.
  3. Name it KVP-H - Borrower.
Grooper ace architect collect key-value pairs PDfGTA 022.png
  1. Be sure to rename the child Data Types, then select the KVP-H - Borrower - Value Data Type.
  2. Right-click the Referenced Extractors property.
  3. Select Reset... from the command menu to clear this reference.
Grooper ace architect collect key-value pairs PDfGTA 023.png
  1. With the Referenced Extractors property selected, click the ellipsis button to bring up the Referenced Extractors window.
  2. Click the Add... button from the Referenced Extractors window to bring up the Select Items window.
  3. In the Select Items window from the drop-down tree view select VAL - Address Block from:
Closing Disclosures • (local resources) > Value Extractors
Grooper ace architect collect key-value pairs PDfGTA 024.png
  1. With the Referenced Extractors property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now functioning just like (returning the same results from) the VAL - Address Block Data Type.
Grooper ace architect collect key-value pairs PDfGTA 025.png
  1. Select the KVP-H - Borrower - Key Data Type.
  2. Right-click the Referenced Extractors property.
  3. Select Reset... from the command menu to clear this reference.
Grooper ace architect collect key-value pairs PDfGTA 026.png
  1. With the Referenced Extractors property selected, click the ellipsis button to bring up the Referenced Extractors window.
  2. Click the Add... button from the Referenced Extractors window to bring up the Select Items window.
  3. In the Select Items window from the drop-down tree view select KEY - Borrower from:
Closing Disclosures • (local resources) > Key Extractors
Grooper ace architect collect key-value pairs PDfGTA 027.png
  1. With the Referenced Extractors property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now functioning just like (returning the same results from) the KEY - Borrower Data Type.
Grooper ace architect collect key-value pairs PDfGTA 028.png
  1. Select the KVP-H - Borrower Data Type.
  2. You can see this Data Type is now returning the Borrower address block.
    • If you were to inspect this instance as well, you could see that the sub-elements from the value extractor have carried through to this key-value pair.
Grooper ace architect collect key-value pairs PDfGTA 029.png
Supplying the Results to the Data Model

As usual, the final step in creating our extractor is getting the data to our model. The process is very similar to how we've done it so far, except for a special property that will allow us to leverage the parsed data.

  1. Right-click on the Settlement Agent Data Field.
  2. Select Clone... from the command menu.
  3. Name it Borrower Name(s).
Grooper ace architect collect key-value pairs PDfGTA 030.png
  1. Be sure to rename the child Data Type, then select it.
  2. Right-click the Referenced Extractors property.
  3. Select Reset... from the command menu to clear this reference.
Grooper ace architect collect key-value pairs PDfGTA 031.png
  1. With the Referenced Extractors property selected, click the ellipsis button to bring up the Referenced Extractors window.
  2. Click the Add... button from the Referenced Extractors window to bring up the Select Items window.
  3. In the Select Items window from the drop-down tree view select KVP-H - Borrower from:
Closing Disclosures • (local resources) > Key-Value Pair Extractors
Grooper ace architect collect key-value pairs PDfGTA 032.png
  1. With the Referenced Extractors property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now functioning just like (returning the same results from) the KVP-H - Borrower Data Type.
Grooper ace architect collect key-value pairs PDfGTA 033.png
  1. Select the Borrower Name(s) Data Field.
  2. Select the Sub-Element Name property and click the drop-down arrow.
  3. Select BorrowerName.
    • This list is populated with all the data instances related to the extractor. Notice in this list are all the instances we inspected earlier that were named after the named capture groups we made in the regex pattern. This one extractor, with its many sub-elements, can now feed each discrete piece of information to several fields, simply by changing this one property.
Grooper ace architect collect key-value pairs PDfGTA 034.png
  1. Save and click the Test Extraction button.
  2. Notice the result from the sub-element populating the Data Field Test Results area as well as being shown in the Document Viewer.
Grooper ace architect collect key-value pairs PDfGTA 035.png
  1. Clone the Borrower Name(s) Data Field four times and name them Borrower Street Address, Borrower City, Borrower State, and Borrower Zip (be sure to rename the child Data Types as well.) From there you can simply change the Sub-Element Name property for each clone, so it retrieves the appropriate data.
  2. Select the (data model) and click the Test Extraction button.
  3. You now have all the individual results collected by one extractor, made possible with sub-elements, to each Data Field.
Grooper ace architect collect key-value pairs PDfGTA 036.png

Tabular Data Extraction

So far, the data that has been collected has centered around static pieces of information that are represented a single time within a document. In these cases, a unique extractor is used per value and the result returned is contained within a Data Field. There are also, however, pieces of information that exist in a dynamic fashion within the documents, such as tabular data. This information is collected in Grooper via extractors that are designed to find repeated instances of data. These repeated pieces of information are contained, in a dynamic fashion via a different Data Element known as a Data Table and its child Data Column objects.

Simple Table via Ordered Array Collation

The key to building tabular extractors is defining what components are within a row of information and combining those elements in a specific way that represent the construction of that row.

Currency Extractor

The table we will be collecting data for is the B. Services Borrower Did Not Shop For located in the Loan Costs section on page 2. In looking at the components that make up a row of information, you should notice it is mainly currency amounts. Therefore, let's start by creating an extractor that will effectively collect these data points.

  1. Right-click on the Value Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it VAL - Currency Positive.
Grooper ace architect collect tabular data extraction STvOAC 001.png
  1. Select the Pattern property and click the ellipsis button which will open the Pattern Editor window.
Grooper ace architect collect tabular data extraction STvOAC 002.png
  1. In the Value Pattern area type or copy/paste the following:
  2. (?<Dollars>[0-9]{1,3}(,[0-9]{3}){0,2})
    [.]
    (?<Cents>[0-9]{2})
  3. In the Prefix Pattern area type or copy/paste the following:
  4. \s
  5. In the Suffix Pattern area type or copy/paste the following:
  6. \s(?!%)
  7. In the Output Format area type or copy/paste the following:
    ${Dollars}.{Cents}
    • If you decided to type this portion, you may notice that IntelliSense will kick in and recognize named capture groups from your Value Pattern and let you auto-complete.
Grooper ace architect collect tabular data extraction STvOAC 003.png
  1. With the Pattern property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you should now see that this Data Type is returning positive currency amounts.
Grooper ace architect collect tabular data extraction STvOAC 004.png
  1. Right-click on the Value Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it VAL - Currency Negative.
Grooper ace architect collect tabular data extraction STvOAC 005.png
  1. Select the Pattern property and click the ellipsis button which will open the Pattern Editor window.
Grooper ace architect collect tabular data extraction STvOAC 006.png
  1. In the Value Pattern area type or copy/paste the following:
  2. -[$] ?
    (?<Dollars>[0-9]{1,3}(,[0-9]{3}){0,2})
    [.]
    (?<Cents>[0-9]{2})
  3. In the Output Format area type or copy/paste the following:
    -${Dollars}.{Cents}
    • If you decided to type this portion, you may notice that IntelliSense will kick in and recognize named capture groups from your Value Pattern and let you auto-complete.
Grooper ace architect collect tabular data extraction STvOAC 007.png
  1. With the Pattern property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you should now see that this Data Type is returning negative currency amounts.
Grooper ace architect collect tabular data extraction STvOAC 008.png
  1. Right-click on the Value Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it VAL - All Currency.
Grooper ace architect collect tabular data extraction STvOAC 009.png
  1. Select the Referenced Extractors property and click the ellipsis button to bring up the Referenced Extractors window.
  2. In the Referenced Extractors window click the Add button to bring up the Select Items window.
  3. In the Select Items window from the drop-down tree view select VAL - Currency Negative and VAL - Currency Positive from:
Closing Disclosures • (local resources) > Value Extractors
Grooper ace architect collect tabular data extraction STvOAC 010.png
  1. With the Referenced Extractors property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you should now see that this Data Type is returning both positive and negative currency amounts, but there are overlapping results.
Grooper ace architect collect tabular data extraction STvOAC 011.png
  1. Set the Deduplicate By property to Area.
  2. Re-run extraction and notice you no longer have duplicated/overlapping values.
Grooper ace architect collect tabular data extraction STvOAC 012.png
Subtraction Extractor to Enhance Generic Text

The remaining values that make up a row of information in the B. Services Borrower Did Not Shop For table are the names of mortgage fees, and the names of the entities to whom the fees will be delivered. These can easily be collected with our VAL - Generic Text Segment Data Type, but it needs to be enhanced slightly.

As the extractor functions currently, the gap between the leading row line numbers and the word "to" are too small to be considered a tab, therefore, they’re consumed in the result returned. While it might be possible to tweak the tab detection settings to compensate for this, it would probably detrimentally affect the extractor. A better solution will be to subtract from our results what we don’t want.

Grooper ace architect collect tabular data extraction STvOAC 013.png
  1. Right-click on the (local resources) folder.
  2. Select Add > Folder... from the command menu.
  3. Name it Subtraction Extractors.
Grooper ace architect collect tabular data extraction STvOAC 014.png
  1. Right-click on the Subtraction Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it SUB - Generic Text.
Grooper ace architect collect tabular data extraction STvOAC 015.png
  1. Select the Pattern property and click the ellipsis button which will open the Pattern Editor window.
Grooper ace architect collect tabular data extraction STvOAC 016.png
  1. In the Value Pattern area type or copy/paste the following:
  2. \d{2}\s|
    to\s
Grooper ace architect collect tabular data extraction STvOAC 017.png
  1. With the Pattern property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you should now see that this Data Type is returning two digit numbers and the word "to".
Grooper ace architect collect tabular data extraction STvOAC 018.png
  1. Select the VAL - Generic Text Segment Data Type.
  2. Select the Subtraction Extractor property and set it to Reference.
  3. expose the sub-properties of the Subtraction Extractor property and in the drop-down menu for the Extractor property select the SUB - Generic Text Data Type.
Grooper ace architect collect tabular data extraction STvOAC 019.png
  1. With the Subtraction Extractor property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you should now see that this Data Type is returning the generic text segments, but with the results from the supplied SUB - Generic Text Data Type subtracted.
Grooper ace architect collect tabular data extraction STvOAC 020.png
Defining a Table Row via Ordered Array Collation

Having created the individual extractors that allow us to collect the elements within a row of information in the table, it’s now time to combine those elements within one extractor that defines a row. To do this we need to understand the table in question.

This is a standard table with columns and rows listed in a typical fashion. Given that, the construction of this table is easily understood by listing, from left to right, the columns of information. Therefore, a row of information consists of one of each of the following, specifically from left to right:

  • Charge Description
  • Charge To
  • Borrower-Paid At Closing
  • Borrower-Paid Before Closing
  • Seller-Paid At Closing
  • Seller-Paid Before Closing
  • Paid by Others
Grooper ace architect collect tabular data extraction STvOAC 021.png

To create this extractor, we will make a Data Type using a new Collation method called Ordered Array with a Horizontal Layout. This Data Type will combine, via the Collation property, the results returned from child Data Types (in order from left to right, hence the Horizontal Layout) named after the columns of information we seek (just like the list above.)

It's important to name the child Data Types very specifically, so feel free to copy the list above. We'll cover why later.

  1. Right-click on the (local resources) folder.
  2. Select Add > Folder... from the command menu.
  3. Name it Table Extractors.
Grooper ace architect collect tabular data extraction STvOAC 022.png
  1. Right-click on the Table Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it TBL - B. Services Borrower Did Not Shop For.
Grooper ace architect collect tabular data extraction STvOAC 023.png
  1. Right-click on the TBL - B. Services Borrower Did Not Shop For Data Type.
  2. Select Contents > Add Multiple Items... from the command menu.
  3. In the Contents • Add Multiple Items window, set the Item Type property to Data Type.
  4. Click the drop-down menu for the Item Names property and paste in the column names list from above.
Grooper ace architect collect tabular data extraction STvOAC 024.png
  1. Select the Charge Description Data Type.
  2. Select the Referenced Extractors property and click the ellipsis button to bring up the Referenced Extractors window.
  3. In the Referenced Extractors window click the Add button to bring up the Select Items window.
  4. In the Select Items window from the drop-down tree view select VAL - Generic Text Segment from:
Closing Disclosures • (local resources) > Value Extractors
Grooper ace architect collect tabular data extraction STvOAC 025.png
  1. With the Referenced Extractors property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now functioning just like (returning the same results from) the VAL - Generic Text Segment Data Type.
    • If you’re concerned that this is returning more information than you think it needs to, given its name, don’t worry. Duplicate/unwanted data will be handled eventually by the parent Data Type's Collation property.
Grooper ace architect collect tabular data extraction STvOAC 026.png
  1. Right-click anywhere in the property grid, except directly on a value.
  2. Select Copy Properties > All Properties from the command menu.
Grooper ace architect collect tabular data extraction STvOAC 027.png
  1. Select the Charge To Data Type.
  2. Right-click anywhere in the property grid, except directly on a value.
  3. Select Paste Properties from the command menu.
Grooper ace architect collect tabular data extraction STvOAC 028.png
  1. With the Referenced Extractors property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now functioning just like (returning the same results from) the VAL - Generic Text Segment Data Type.
Grooper ace architect collect tabular data extraction STvOAC 029.png
  1. Select the Borrower-Paid At Closing Data Type.
  2. Select the Referenced Extractors property and click the ellipsis button to bring up the Referenced Extractors window.
  3. In the Referenced Extractors window click the Add button to bring up the Select Items window.
  4. In the Select Items window from the drop-down tree view select VAL - All Currency from:
Closing Disclosures • (local resources) > Value Extractors
Grooper ace architect collect tabular data extraction STvOAC 030.png
  1. With the Referenced Extractors property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now functioning just like (returning the same results from) the VAL - All Currency Data Type.
Grooper ace architect collect tabular data extraction STvOAC 031.png
  1. Right-click anywhere in the property grid, except directly on a value.
  2. Select Copy Properties > All Properties from the command menu.
Grooper ace architect collect tabular data extraction STvOAC 032.png
  1. Select the Borrower-Paid Before Closing Data Type.
  2. Right-click anywhere in the property grid, except directly on a value.
  3. Select Paste Properties from the command menu.
Grooper ace architect collect tabular data extraction STvOAC 033.png
  1. With the Referenced Extractors property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now functioning just like (returning the same results from) the VAL - All Currency Data Type.
Grooper ace architect collect tabular data extraction STvOAC 034.png
  1. Continue pasting the copied properties on the remaining Data Types as they are all meant to collect the same type of value.
  2. Having pasted the Referenced Extractors property on the remaining Data Types...
  3. ...you’ll notice these Data Types are now functioning just like (returning the same results from) the VAL - All Currency Data Type.
Grooper ace architect collect tabular data extraction STvOAC 035.png
  1. Select the parent Data Type, TBL - B. Services Borrower Did Not Shop For.
  2. With the Collation property set to its default setting, Individual, you may notice all the results from the child Data Types returning in that fashion. Given that, there will be many overlapping results. This will be remedied in the next steps.
Grooper ace architect collect tabular data extraction STvOAC 036.png
  1. Set the Collation property to Ordered Array.
  2. With the Collation property set, click the drop-down arrow to expose its sub-properties.
  3. Set the Horizontal Layout property to Enabled.
Grooper ace architect collect tabular data extraction STvOAC 037.png
  1. With the Collation property (and its sub-property, Horizontal Layout) set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now returning entire rows of information from the document that match the left to right arrangement of the results returned from the child Data Types.
Grooper ace architect collect tabular data extraction STvOAC 038.png
  1. These results deserve a closer look, so make sure Page 2 from one of the documents is selected.
  2. Right-click on the top value listed in the Results list-view.
  3. Select Inspect Instance... from the command menu.
Grooper ace architect collect tabular data extraction STvOAC 039.png
  1. So, the results of this is very similar to what we did with the named capture groups from the previous section. The individual child Data Elements are all returning their unique results and you can select them in the Instance Viewer to see them individually.
Grooper ace architect collect tabular data extraction STvOAC 040.png
Supplying the Results to the Data Model

We now have an extractor that is returning all rows of data from the table we're after. We can now feed this data to a Data Table in our (data model). We do this by first creating a Data Table, then its child Data Columns. It is important to name the Data Columns exactly like the child Data Types of the extractor that will feed the Data Table, because unlike Data Fields we do not set extractors on the individual Data Columns. The Data Table is configured with a Row Extractor and the table knows what elements of data to feed its child Data Columns because of the matching names.

  1. Right-click on the (data model).
  2. Select Add > Data Table... from the command menu.
  3. Name it B. Services Borrower Did Not Shop For.
Grooper ace architect collect tabular data extraction STvOAC 041.png
  1. Right-click on the B. Services Borrower Did Not Shop For Data Table.
  2. Select Contents > Add Multiple Items... from the command menu.
  3. In the Contents • Add Multiple Items window, click the drop-down menu for the Item Names property and copy/paste in the column names list from below (keep in mind this is the exact list we used from the previous tab):
    • Charge Description
    • Charge To
    • Borrower-Paid At Closing
    • Borrower-Paid Before Closing
    • Seller-Paid At Closing
    • Seller-Paid Before Closing
    • Paid by Others
Grooper ace architect collect tabular data extraction STvOAC 042.png

We now have several Data Columns that can be populated with information, but it’s important to make sure their Value Types are set properly. The first two columns are fine as string values, but the remaining five are specifically currency amounts.

  1. Select the Borrower-Paid At Closing Data Column.
  2. Select the Value Type property and select Decimal from the drop-down menu.
Grooper ace architect collect tabular data extraction STvOAC 043.png
  1. With the Value Type property set, click the drop-down arrow to expose its sub-properties.
  2. Set the Format Specifier property to c2.
    • Once again, feel free to look at the Grooper Help for some specifics on the .net format specifiers. In this case, given we’re using a Decimal Value Type, the c2 setting allows the data to be of a "currency" type, with 2 decimal places.
Grooper ace architect collect tabular data extraction STvOAC 044.png
  1. With the Value Type property (and Format Specifier sub-property) set, right-click anywhere in the property grid, except directly on a value.
  2. Select Copy Properties > All Properties from the command menu.
    • This is another situation where it will save time and clicks to simply copy the properties we just set, and paste them on the remaining Data Columns, considering they will all use the same type of information.
Grooper ace architect collect tabular data extraction STvOAC 045.png
  1. Select the Borrower-Paid Before Closing Data Column.
  2. Right-click anywhere in the property grid, except directly on a value.
  3. Select Paste Properties from the command menu.
Grooper ace architect collect tabular data extraction STvOAC 046.png
  1. Select the remaining Data Columns one at a time...
  2. ...and repeat the process of pasting the properties to each.
Grooper ace architect collect tabular data extraction STvOAC 047.png
  1. Select the B. Services Borrower Did Not Shop For Data Table.
  2. Set the Extract Method property to Row Match.
  3. With the Extract Method property set to Row Match, click the drop-down arrow to expose its sub-properties.
  4. expose the Row Extractor sub-properties by clicking the drop-down arrow.
  5. Set the Type property to Reference.
  6. Select the Referenced Extractor property, and in the drop-down menu select TBL - B. Services Borrower Did Not Shop For from:
Closing Disclosures • (local resources) > Table Extractors
Grooper ace architect collect tabular data extraction STvOAC 048.png
  1. Be sure to have a document selected in the Batch Viewer, not a page.
  2. With the Extract Method property (and its sub-properties) set...
  3. ...Save and Test Extraction...
  4. ...and you’ll see in the Data Table Test Results area values from the table being returned. You should also see these results highlighted in the Document Viewer below. If you click in the table, you can see an individual result being displayed, as well as tab though as many cells as you like. You should also notice a pink box being drawn around the bounds of this table in the Document Viewer.
    • You may notice that this extractor is returning to this table more than the results from the B. Services Borrower Did Not Shop For area of the document. We will cover this fix to this later when we go over Data Sections.
Grooper ace architect collect tabular data extraction STvOAC 049.png
Inverted Table with Row Headers

The next table we’ll capture information from is the Payment Calculation table from the Projected Payments section on the first page. The interesting thing about this table is that it is inverted compared to the last table we worked with. To be more specific, it has row headers instead of column headers. With that in mind, we must think about this table slightly differently. We’ll be capturing what appears to be columns of information and supplying them to a Data Table as rows.

Grooper ace architect collect tabular data extraction ITwRH 001.png
Defining a Table Row via Ordered Array Collation
  1. Right-click on the Table Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it TBL - Payment Calculation.
Grooper ace architect collect tabular data extraction ITwRH 002.png
  1. Right-click on the TBL - B. Services Borrower Did Not Shop For Data Type.
  2. Select Contents > Add Multiple Items... from the command menu.
  3. In the Contents • Add Multiple Items window, click the drop-down menu for the Item Names property and copy/paste in the item names list from below:
    • Year Range
    • Principal and Interest
    • Mortgage Insurance
    • Estimated Escrow
    • Estimated Total Monthly Payment
Grooper ace architect collect tabular data extraction ITwRH 003.png
  1. Select the Year Range Data Type.
  2. Select the Pattern property and click the ellipsis button to bring up the Pattern Editor window.
Grooper ace architect collect tabular data extraction ITwRH 004.png
  1. In the Value Pattern area type or copy/paste the following:
  2. \d{1,2}-\d{1,2}
Grooper ace architect collect tabular data extraction ITwRH 005.png
  1. With the Pattern property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you should now see that this Data Type is returning the year ranges from the desired table.
Grooper ace architect collect tabular data extraction ITwRH 006.png
  1. Select the Principal and Interest Data Type.
  2. Select the Referenced Extractors property and click the ellipsis button to bring up the Referenced Extractors window.
  3. In the Referenced Extractors window click the Add button to bring up the Select Items window.
  4. In the Select Items window from the drop-down tree view select VAL - All Currency from:
Closing Disclosures • (local resources) > Value Extractors
Grooper ace architect collect tabular data extraction ITwRH 007.png
  1. With the Referenced Extractors property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now functioning just like (returning the same results from) the VAL - All Currency Data Type.
Grooper ace architect collect tabular data extraction ITwRH 008.png
  1. Right-click anywhere in the property grid, except directly on a value.
  2. Select Copy Properties > All Properties from the command menu.
Grooper ace architect collect tabular data extraction ITwRH 009.png
  1. Select the Mortgage Insurance Data Type.
  2. Right-click anywhere in the property grid, except directly on a value.
  3. Select Paste Properties from the command menu.
Grooper ace architect collect tabular data extraction ITwRH 010.png
  1. With the Referenced Extractors property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now functioning just like (returning the same results from) the VAL - All Currency Data Type.
Grooper ace architect collect tabular data extraction ITwRH 011.png
  1. Continue pasting the copied properties on the remaining Data Types as they are all meant to collect the same type of value. Having pasted the Referenced Extractors property on the remaining Data Types you’ll notice these Data Types are now functioning just like (returning the same results from) the VAL - All Currency Data Type.
Grooper ace architect collect tabular data extraction ITwRH 012.png
  1. Select the parent Data Type, TBL - Payment Calculation.
  2. Set the Collation property to Ordered Array.
  3. With the Collation property set, click the drop-down arrow to expose its sub-properties.
  4. Set the Vertical Layout property to Enabled.
Grooper ace architect collect tabular data extraction ITwRH 013.png
  1. With the Collation property (and its sub-property, Vertical Layout) set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now returning entire rows of information from the document that match the top to bottom arrangement of the results returned from the child Data Types.
Grooper ace architect collect tabular data extraction ITwRH 014.png
Supplying the Results to the Data Model
  1. Right-click on the (data model).
  2. Select Add > Data Table... from the command menu.
  3. Name it Payment Calculation.
Grooper ace architect collect tabular data extraction ITwRH 015.png
  1. Right-click on the B. Services Borrower Did Not Shop For Data Table.
  2. Select Contents > Add Multiple Items... from the command menu.
  3. In the Contents • Add Multiple Items window, click the drop-down menu for the Item Names property and copy/paste in the column names list from below (keep in mind this is the exact list we used from the previous tab):
    • Year Range
    • Principal and Interest
    • Mortgage Insurance
    • Estimated Escrow
    • Estimated Total Monthly Payment
Grooper ace architect collect tabular data extraction ITwRH 016.png
  1. Select the Principal and Interest Data Column.
  2. Select the Value Type property and select Decimal from the drop-down menu.
  3. With the Value Type property set, click the drop-down arrow to expose its sub-properties.
  4. Set the Format Specifier property to c2.
Grooper ace architect collect tabular data extraction ITwRH 017.png
  1. Right-click anywhere in the property grid, except directly on a value.
  2. Select Copy Properties > All Properties from the command menu.
Grooper ace architect collect tabular data extraction ITwRH 018.png
  1. Select the Mortgage Insurance Data Column.
  2. Right-click anywhere in the property grid, except directly on a value.
  3. Select Paste Properties from the command menu.
Grooper ace architect collect tabular data extraction ITwRH 019.png
  1. Select the remaining Data Columns one at a time and repeat the process of pasting the properties to each.
Grooper ace architect collect tabular data extraction ITwRH 020.png
  1. Select the Payment Calculation Data Table.
  2. Set the Extract Method property to Row Match.
  3. With the Extract Method property set to Row Match, click the drop-down arrow to expose its sub-properties.
  4. expose the Row Extractor sub-properties by clicking the drop-down arrow.
  5. Set the Type property to Reference.
  6. Select the Referenced Extractor property, and in the drop-down menu select TBL - Payment Calculation from:
Closing Disclosures • (local resources) > Table Extractors
Grooper ace architect collect tabular data extraction ITwRH 021.png
  1. With the Extract Method property (and its sub-properties) set...
  2. ...Save and Test Extraction...
  3. ...and you’ll see in the Data Table Test Results area values from the table being returned. You should also see these results highlighted in the Document Viewer below.
    • Notice as you tab through the results that information displayed as columns in the document is arranged as rows in the Data Table.
Grooper ace architect collect tabular data extraction ITwRH 022.png
Table with Missing or Optional Data

The next table we’ll capture information from is the Calculating Cash to Close on the top of page 3. The challenge presented by this table is the fact that it has columns of information with missing or optional data. Consider how we’ve made our row extractors so far. The way the Ordered Array Collation property works is that it requires all elements of the extractor to be present, in a designated layout, for any result to be returned. Considering that, in the case of a column of information being missing, the row will not return a result. The solution is two make variations of the row extractor, both with and without the optional column. These variations can then be combined by another Data Type referencing them both, then returning those results to a Data Table.

Grooper ace architect collect tabular data extraction TwMOD 001.png
First Row Variant (all data)
  1. Right-click on the (local resources) folder.
  2. Select Add > Folder... from the command menu.
  3. Name it Ordered Array Extractors.
Grooper ace architect collect tabular data extraction TwMOD 002.png
  1. Right-click on the Ordered Array Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it OA-H - Description, Loan Estimate, Final, Change, Notes.
Grooper ace architect collect tabular data extraction TwMOD 003.png
  1. Right-click on the OH-H - Description, Loan Estimate, Final, Change, Notes Data Type.
  2. Select Contents > Add Multiple Items... from the command menu.
  3. Set the Item Type property to Data Type.
  4. In the Contents • Add Multiple Items window, click the drop-down menu for the Item Names property and copy/paste in the item names list from below:
    • Description
    • Loan Estimate
    • Final
    • Change
    • Notes
Grooper ace architect collect tabular data extraction TwMOD 004.png
  1. Select the Description Data Type.
  2. Select the Referenced Extractors property and click the ellipsis button to bring up the Referenced Extractors window.
  3. In the Referenced Extractors window click the Add button to bring up the Select Items window.
  4. In the Select Items window from the drop-down tree view select INPT - Generic Text Block and VAL - Generic Text Segment from:
Closing Disclosures • (local resources) > Input Filter Extractors

...and...

Closing Disclosures • (local resources) > Value Extractors
Grooper ace architect collect tabular data extraction TwMOD 005.png
  1. With the Referenced Extractors property set...
  2. ...be sure to set the Deduplicate By property to Area (and with a quick press of the Save and Test Single buttons)...
  3. ...you’ll notice this Data Type is now functioning just like (returning the same results from) the INPT - Generic Text Block and VAL - Generic Text Segment Data Types.
Grooper ace architect collect tabular data extraction TwMOD 006.png
  1. Right-click anywhere in the property grid, except directly on a value.
  2. Select Copy Properties > All Properties from the command menu.
Grooper ace architect collect tabular data extraction TwMOD 007.png
  1. Select the Notes Data Type.
  2. Right-click anywhere in the property grid, except directly on a value.
  3. Select Paste Properties from the command menu.
Grooper ace architect collect tabular data extraction TwMOD 008.png
  1. Select the Referenced Extractors property and click the ellipsis button to bring up the Referenced Extractors window.
  2. In the Referenced Extractors window select the INPT - Generic Text Block Data Type.
  3. Click the Delete button.
Grooper ace architect collect tabular data extraction TwMOD 009.png
  1. With the Referenced Extractors property set...
  2. ...be sure to set the Deduplicate By property to the default None setting (and with a quick press of the Save and Test Single buttons)...
  3. ...you’ll notice this Data Type is now functioning just like (returning the same results from) the VAL - Generic Text Segment Data Type.
Grooper ace architect collect tabular data extraction TwMOD 010.png
  1. Select the Description Data Type.
  2. Select the Referenced Extractors property and click the ellipsis button to bring up the Referenced Extractors window.
  3. In the Referenced Extractors window click the Add button to bring up the Select Items window.
  4. In the Select Items window from the drop-down tree view select VAL - All Currency from:
Closing Disclosures • (local resources) > Value Extractors
Grooper ace architect collect tabular data extraction TwMOD 011.png
  1. With the Referenced Extractors property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now functioning just like (returning the same results from) the VAL - All Currency Data Type.
Grooper ace architect collect tabular data extraction TwMOD 012.png
  1. Right-click anywhere in the property grid, except directly on a value.
  2. Select Copy Properties > All Properties from the command menu.
Grooper ace architect collect tabular data extraction TwMOD 013.png
  1. Select the Final Data Type.
  2. Right-click anywhere in the property grid, except directly on a value.
  3. Select Paste Properties from the command menu.
Grooper ace architect collect tabular data extraction TwMOD 014.png
  1. With the Referenced Extractors property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now functioning just like (returning the same results from) the VAL - All Currency Data Type.
Grooper ace architect collect tabular data extraction TwMOD 015.png
  1. Select the Change Data Type.
  2. Select the Pattern property and click the ellipsis button to bring up the Pattern Editor window.
Grooper ace architect collect tabular data extraction TwMOD 016.png
  1. In the Value Pattern area type or copy/paste the following:
  2. YES
Grooper ace architect collect tabular data extraction TwMOD 017.png
  1. Select the Properties tab.
  2. Set the Case Sensitive property to True.
Grooper ace architect collect tabular data extraction TwMOD 018.png
  1. With the Pattern property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now returning the YES results from the desired table.
Grooper ace architect collect tabular data extraction TwMOD 019.png
  1. Select the parent Data Type, OA-H - Description, Loan Estimate, Final, Change, Notes.
  2. Set the Collation property to Ordered Array.
  3. With the Collation property set, click the drop-down arrow to expose its sub-properties.
  4. Set the Horizontal Layout property to Enabled.
Grooper ace architect collect tabular data extraction TwMOD 020.png
  1. With the Collation property (and its sub-property, Horizontal Layout) set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now returning entire rows of information from the document that match the left to right arrangement of the results returned from the child Data Types.
Grooper ace architect collect tabular data extraction TwMOD 021.png
Second Row Variant (missing data)
  1. Right-click on the OA-H - Description, Loan Estimate, Final, Change, Notes Data Type.
  2. Select Clone from the command menu.
  3. Name it OA-H - Description, Loan Estimate, Final, Change.
Grooper ace architect collect tabular data extraction TwMOD 022.png
  1. Right-click on the Notes Data Type (the child object of the newly cloned Data Type).
  2. Select Delete from the command menu.
Grooper ace architect collect tabular data extraction TwMOD 023.png
  1. Select the Change Data Type (the child object of the newly cloned Data Type).
  2. Select the Pattern property and click the ellipsis button to bring up the Pattern Editor window.
Grooper ace architect collect tabular data extraction TwMOD 024.png
  1. In the Value Pattern area type or copy/paste the following:
  2. NO
Grooper ace architect collect tabular data extraction TwMOD 025.png
  1. With the Pattern property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now returning the NO results from the desired table.
Grooper ace architect collect tabular data extraction TwMOD 026.png
  1. Select the parent Data Type, OA-H - Description, Loan Estimate, Final, Change.
  2. With the Collation property (and its sub-property, Horizontal Layout) set...
  3. ...you’ll notice this Data Type is now returning entire rows of information from the document that match the left to right arrangement of the results returned from the child Data Types.
Grooper ace architect collect tabular data extraction TwMOD 027.png
Combining and Supplying the Results to the Data Model

The previous table extractors that have been made have, themselves, been the ordered arrays. In this case, the table extractor to be made will be the combination of the two variants of ordered arrays that were just created. This will allow it to return all instances of data appropriately to the Data Table.

  1. Right-click on the Table Extractors folder.
  2. Select Add > Data Type from the command menu.
  3. Name it TBL - Calculating Cash to Close.
Grooper ace architect collect tabular data extraction TwMOD 028.png
  1. With the newly created TBL - Calculating Cash to Close Data Type selected...
  2. ...select the Referenced Extractors property and click the ellipsis button to bring up the Referenced Extractors window.
  3. In the Referenced Extractors window click the Add button to bring up the Select Items window.
  4. In the Select Items window from the drop-down tree view select both Data Types from:
Closing Disclosures • (local resources) > Ordered Array Extractors
Grooper ace architect collect tabular data extraction TwMOD 029.png
  1. With the Referenced Extractors property set, be sure to set the Deduplicate By property to Area and (with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now all rows from the desired table, a combination of the results from the Data Types selected in the Referenced Extractors property.
Grooper ace architect collect tabular data extraction TwMOD 030.png
  1. Right-click on the (data model).
  2. Select Add > Data Table from the command menu.
  3. Name it Calculating Cash to Close.
Grooper ace architect collect tabular data extraction TwMOD 031.png
  1. Right-click the newly created Calculating Cash to Close Data Table.
  2. Select Contents > Add Multiple Items from the command menu.
  3. In the Contents • Add Multiple Items window, click the drop-down menu for the Item Names property and copy/paste in the column names list from below:
    • Description
    • Loan Estimate
    • Final
    • Change
    • Notes
Grooper ace architect collect tabular data extraction TwMOD 032.png
  1. Select the Loan Estimate Data Column.
  2. Select the Value Type property and select Decimal from the drop-down menu.
  3. With the Value Type property set, click the drop-down arrow to expose its sub-properties.
  4. Set the Format Specifier property to c2.
Grooper ace architect collect tabular data extraction TwMOD 033.png
  1. Right-click anywhere in the property grid, except directly on a value.
  2. Select Copy Properties > All Properties from the command menu.
Grooper ace architect collect tabular data extraction TwMOD 034.png
  1. Select the Final Data Column.
  2. Right-click anywhere in the property grid, except directly on a value.
  3. Select Paste Properties from the command menu.
Grooper ace architect collect tabular data extraction TwMOD 035.png
  1. Select the Calculating Cash to Close Data Table.
  2. Set the Extract Method property to Row Match.
  3. With the Extract Method property set to Row Match, click the drop-down arrow to expose its sub-properties.
  4. expose the Row Extractor sub-properties by clicking the drop-down arrow.
  5. Set the Type property to Reference.
  6. Select the Referenced Extractor property, and in the drop-down menu select TBL - Calculating Cash to Close from:
Closing Disclosures • (local resources) > Table Extractors
Grooper ace architect collect tabular data extraction TwMOD 036.png
  1. # With the Extract Method property (and its sub-properties) set...
  2. ...Save and Test Extraction...
  3. ...and you’ll see in the Data Table Test Results area values from the table being returned. You should also see these results highlighted in the Document Viewer below, including rows with missing/optional data.
Grooper ace architect collect tabular data extraction TwMOD 037.png
Infer Grid Table Extraction

The next table we’ capture is the Contact Information table on page 5. This table presents us an opportunity to use the last type of table extraction method we’ll cover, which is called Infer Grid. Previously we’ve created extractors that defined the contents of the table. With this approach we’ll create extractors that will define the headers. From there a grid is "inferred" by the intersection of these extractors. The contents of this grid are then passed to the Data Table. It’s almost literally how a person perceives a table naturally.

Grooper ace architect collect tabular data extraction IGTE 001.png
Capturing the Column Headers a.k.a. X-Axis
  1. Right-click on the Table Extractors folder.
  2. Select Add > Data Type from the command menu.
  3. Name it TBL-IGX - Contact Information.
Grooper ace architect collect tabular data extraction IGTE 002.png
  1. Right-click on the TBL-IGX - Contact Information Data Type.
  2. Select Contents > Add Multiple Items... from the command menu.
  3. In the Contents • Add Multiple Items window, click the drop-down menu for the Item Names property and copy/paste in the item names list from below:
    • Lender
    • Mortgage Broker
    • Real Estate Broker (B)
    • Real Estate Broker (S)
    • Settlement Agent
Grooper ace architect collect tabular data extraction IGTE 003.png
  1. Select the Lender Data Format.
  2. In the Value Pattern area type or copy/paste the following:
    Lender
Grooper ace architect collect tabular data extraction IGTE 004.png
  1. Select the Mortgage Broker Data Format.
  2. In the Value Pattern area type or copy/paste the following:
    Mortgage Broker
Grooper ace architect collect tabular data extraction IGTE 005.png
  1. Select the Real Estate Broker (B) Data Format.
  2. In the Value Pattern area type or copy/paste the following:
    Real Estate Broker [(]B[)]
Grooper ace architect collect tabular data extraction IGTE 006.png
  1. Select the Real Estate Broker (S) Data Format.
  2. In the Value Pattern area type or copy/paste the following:
    Real Estate Broker [(]S[)]
Grooper ace architect collect tabular data extraction IGTE 007.png
  1. Select the Settlement Agent Data Format.
  2. In the Value Pattern area type or copy/paste the following:
    Settlement Agent
Grooper ace architect collect tabular data extraction IGTE 008.png
  1. Select the parent Data Type, TBL-IGX - Contact Information.
  2. Set the Collation property to Ordered Array.
  3. With the Collation property set, click the drop-down arrow to expose its sub-properties.
  4. Set the Horizontal Layout property to Enabled.
Grooper ace architect collect tabular data extraction IGTE 009.png
  1. With the Collation property (and its sub-property, Horizontal Layout) set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now returning the combined, ordered results returned from the child Data Formats.
Grooper ace architect collect tabular data extraction IGTE 010.png
Capturing the Row Headers a.k.a. Y-Axis
  1. Right-click on the Table Extractors folder.
  2. Select Add > Data Type from the command menu.
  3. Name it TBL-IGY - Contact Information.
Grooper ace architect collect tabular data extraction IGTE 011.png
  1. Right-click on the TBL-IGY - Contact Information Data Type.
  2. Select Contents > Add Multiple Items... from the command menu.
  3. In the Contents • Add Multiple Items window, click the drop-down menu for the Item Names property and copy/paste in the item names list from below:
    • Name
    • Address
    • NMLS ID
    • State License ID
    • Contact
    • Contact NMLS ID
    • Contact State License ID
    • Email
    • Phone
Grooper ace architect collect tabular data extraction IGTE 012.png
  1. Select the Name Data Format.
  2. In the Value Pattern area type or copy/paste the following:
    Name
Grooper ace architect collect tabular data extraction IGTE 013.png
  1. Select the Address Data Format.
  2. In the Value Pattern area type or copy/paste the following:
    Address
Grooper ace architect collect tabular data extraction IGTE 014.png
  1. Select the NMSL ID Data Format.
  2. In the Value Pattern area type or copy/paste the following:
    NMLS ID
Grooper ace architect collect tabular data extraction IGTE 015.png
  1. Select the State License ID Data Format.
  2. In the Value Pattern area type or copy/paste the following:
    [A-Z]{2}\s?License ID
Grooper ace architect collect tabular data extraction IGTE 016.png
  1. Select the Contact Data Format'.
  2. In the Value Pattern area type or copy/paste the following:
    Contact
Grooper ace architect collect tabular data extraction IGTE 017.png
  1. Select the Contact NMSL ID Data Format.
  2. In the Value Pattern area type or copy/paste the following:
    Contact NMLS ID
Grooper ace architect collect tabular data extraction IGTE 018.png
  1. Select the Contact State License ID Data Format.
  2. In the Value Pattern area type or copy/paste the following:
    Contact
Grooper ace architect collect tabular data extraction IGTE 019.png
  1. Select the Email Data Format.
  2. In the Value Pattern area type or copy/paste the following:
    Email
Grooper ace architect collect tabular data extraction IGTE 020.png
  1. Select the Phone Data Format.
  2. In the Value Pattern area type or copy/paste the following:
    Phone
Grooper ace architect collect tabular data extraction IGTE 021.png
  1. Select the parent Data Type, TBL-IGY - Contact Information.
  2. Set the Collation property to Ordered Array.
  3. With the Collation property set, click the drop-down arrow to expose its sub-properties.
  4. Set the Vertical Layout property to Enabled.
Grooper ace architect collect tabular data extraction IGTE 022.png
  1. With the Collation property (and its sub-property, Vertical Layout) set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now returning the combined, ordered results returned from the child Data Formats.
Grooper ace architect collect tabular data extraction IGTE 023.png
Supplying the Results to the Data Model
  1. Right-click on the (data model).
  2. Select Add > Data Table from the command menu.
  3. Name it Contact Information.
Grooper ace architect collect tabular data extraction IGTE 024.png
  1. Right-click the newly created Contact Information Data Table.
  2. Select Contents > Add Multiple Items from the command menu.
  3. In the Contents • Add Multiple Items window, click the drop-down menu for the Item Names property and copy/paste in the column names list from below:
    • Lender
    • Mortgage Broker
    • Real Estate Broker (B)
    • Real Estate Broker (S)
    • Settlement Agent
Grooper ace architect collect tabular data extraction IGTE 025.png
  1. Set the Extract Method property to Infer Grid and click the drop-down arrow to expose its sub-properties.
Grooper ace architect collect tabular data extraction IGTE 026.png
  1. expose the X-Axis Extractor sub-properties by clicking the drop-down arrow.
  2. Set the Type property to Reference.
  3. Select the Referenced Extractor property, and in the drop-down menu select TBL-IGX - Contact Information from:
Closing Disclosures • (local resources) > Table Extractors
Grooper ace architect collect tabular data extraction IGTE 027.png
  1. expose the Y-Axis Extractor sub-properties by clicking the drop-down arrow.
  2. Set the Type property to Reference.
  3. Select the Referenced Extractor property, and in the drop-down menu select TBL-IGY - Contact Information from:
Closing Disclosures • (local resources) > Table Extractors
Grooper ace architect collect tabular data extraction IGTE 028.png
  1. # With the Extract Method property (and its sub-properties) set...
  2. ...Save and Test Extraction...
  3. ...and you’ll see in the Data Table Test Results area values from the table being returned. You should also see these results highlighted in the Document Viewer below. Every cell was captured, simply by defining the column and row headers. Notice they’re highlighted a soft yellow in the Document Viewer.
Grooper ace architect collect tabular data extraction IGTE 029.png

OMR Extraction

OMR, or Optical Mark Recognition, extraction has been around in document capture for a while, but it has always been an exceedingly tedious setup. For the longest time it depended entirely on zonal boxes that were designed to understand a pixel threshold and return a value based on it. It evolved from there to zonal boxes that you could anchor off text so at least there was some flexibility in document structure. At no point, however, was the setup for these approaches anything but enormously time consuming. Grooper, however, has solved this problem. If you recall previously in this article in the Condition the Documents section on the Recognize the Text and Form Features tab, there was a segment showing the LayoutData.json file. This metadata file created during the Recognize step, thanks to the supplied IP Profile with Box Detection, is an amazing and powerful tool that now makes the extraction of OMR boxes exceptionally easy.

Following, we’ll look at the three flavors of OMR extraction:

  • CheckMulti
  • CheckOne
  • Boolean
Grooper ace architect collect omr extraction 001.png
OMR Mode - Check Multi
  1. Right-click on the (data model).
  2. Select Add > Data Field from the command menu.
  3. Name it This Estimate Includes.
Grooper ace architect collect omr extraction 002.png
  1. Right-click on the newly created This Estimate Includes Data Field.
  2. Select Add > Data Type from the command menu.
  3. Name it Value Extractor - This Estimate Includes.
Grooper ace architect collect omr extraction 003.png
  1. Select the newly created Value Extractor - This Estimate Includes Data Type.
  2. Select the Pattern property and click the ellipsis button which will open the Pattern Editor window.
Grooper ace architect collect omr extraction 004.png
  1. Click on the Properties tab.
  2. Expand the Preprocessing Options by clicking the drop-down arrow.
  3. Set Tab Marking to Enabled.
Grooper ace architect collect omr extraction 005.png
  1. In the Value Pattern area type or copy/paste the following:
  2. Property Taxes|
    Homeowner's Insurance|
    Other
  3. In the Prefix Pattern area type or copy/paste the following:
    \t
  4. In the Suffix Pattern area type or copy/paste the following:
    :|\t
Grooper ace architect collect omr extraction 006.png
  1. Select the Post Processing property and click the drop-down arrow.
  2. Select OMR Reader.
Grooper ace architect collect omr extraction 007.png
  1. With the Post Processing property set to OMR Reader click the drop-down arrow to expose its sub-properties.
  2. Set the Mode property to Check Multi.
  3. In the Separator String property type or copy/paste the following:
    ,\s
Grooper ace architect collect omr extraction 008.png
  1. With the Pattern and Post Processing options (and it’s sub-properties) set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now returning the combined results, separated by a comma and a space, from the Pattern where there were detected boxes 0.25in to the left of those values.
Grooper ace architect collect omr extraction 009.png
  1. Select the This Estimate Includes Data Field.
  2. Set the Value Extractor property to Reference.
  3. With the Value Extractor property set click the drop-down arrow to expose its sub-properties.
  4. Click the drop-down arrow next to the Extractor property.
  5. Select the Value Extractor - This Estimate Includes Data Type from:
(data model) • This Estimate Includes
Grooper ace architect collect omr extraction 010.png
  1. With the Value Extractor property (and it’s sub-properties) set...
  2. ... and with a quick press of the Save and Test Extraction buttons...
  3. ...you’ll notice this Data Field is returning the results from its child Data Type.
Grooper ace architect collect omr extraction 011.png
OMR Mode - CheckOne
  1. Right-click on the (data model).
  2. Select Add > Data Field from the command menu.
  3. Name it Assumption.
Grooper ace architect collect omr extraction 012.png
  1. Right-click on the newly created This Estimate Includes Data Field.
  2. Select Add > Data Type from the command menu.
  3. Name it Value Extractor - Assumption.
Grooper ace architect collect omr extraction 013.png
  1. Select the newly created Value Extractor - Assumption Data Type.
  2. Select the Pattern property and click the ellipsis button which will open the Pattern Editor window.
Grooper ace architect collect omr extraction 014.png
  1. In the Value Pattern area type or copy/paste the following:
  2. will allow|
    will not allow
Grooper ace architect collect omr extraction 015.png
  1. Set the Post Processing property to OMR Reader.
  2. With the Post Processing property set click the drop-down arrow to expose its sup-properties.
  3. You can leave the Mode property to its default of setting CheckOne.
Grooper ace architect collect omr extraction 016.png
  1. With the Pattern and Post Processing options (and it’s sub-properties) set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now returning one of the two results that has a checked box 0.25in to the left.
    • You may notice that both results are highlighted and returned but pay closer attention and you’ll see in the Results list-view that only the correct result is returned with 100% confidence.
Grooper ace architect collect omr extraction 017.png
  1. Select the Assumption Data Field.
  2. Set the Value Extractor property to Reference.
  3. With the Value Extractor property set click the drop-down arrow to expose its sub-properties.
  4. Click the drop-down arrow next to the Extractor property.
  5. Select the Value Extractor - Assumption Data Type from:
(data model) • Assumption
Grooper ace architect collect omr extraction 018.png
  1. With the Value Extractor property (and it’s sub-properties) set...
  2. ... and with a quick press of the Save and Test Extraction buttons...
  3. ...you’ll notice this Data Field is returning the results from its child Data Type.
    • A Data Field has a property called Minimum Confidence that has a default setting of 20%. Given that the results from the child Data Type are returned at 100% and 0%, only one result will be an option for this Data Field.
Grooper ace architect collect omr extraction 019.png
OMR Mode - Boolean
  1. Right-click on the (data model).
  2. Select Add > Data Field from the command menu.
  3. Name it Escrow Account.
Grooper ace architect collect omr extraction 020.png
  1. Right-click on the newly created Escrow Account Data Field.
  2. Select Add > Data Type from the command menu.
  3. Name it Value Extractor - Escrow Account.
Grooper ace architect collect omr extraction 021.png
  1. Select the newly created Value Extractor - Escrow Account Data Type.
  2. Select the Pattern property and click the ellipsis button which will open the Pattern Editor window.
Grooper ace architect collect omr extraction 022.png
  1. In the Value Pattern area type or copy/paste the following:
  2. will have an escrow account
Grooper ace architect collect omr extraction 023.png
  1. Set the Post Processing property to OMR Reader.
  2. With the Post Processing property set click the drop-down arrow to expose its sup-properties.
  3. Set the Mode property to Boolean.
    • This will activate further sub-properties that you can manipulate if you choose, but for this setup the defaults are fine.
Grooper ace architect collect omr extraction 024.png
  1. With the Pattern and Post Processing options (and it’s sub-properties) set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now returning a True or False depending on if the box next to the result from the Pattern has a box checked within 0.25in to its left.
Grooper ace architect collect omr extraction 025.png
  1. Select the Escrow Account Data Field.
  2. Set the Value Extractor property to Reference.
  3. With the Value Extractor property set click the drop-down arrow to expose its sub-properties.
  4. Click the drop-down arrow next to the Extractor property.
  5. Select the Value Extractor - Escrow Data Type from:
(data model) • Assumption
Grooper ace architect collect omr extraction 026.png
  1. Set the Value Type property to Boolean.
    • Feel free to leave the sub-properties defaulted.
Grooper ace architect collect omr extraction 026a.png
  1. With the Value Extractor property (and it’s sub-properties) set...
  2. ... and with a quick press of the Save and Test Extraction buttons...
  3. ...you’ll notice this Data Field is returning the results from its child Data Type.
Grooper ace architect collect omr extraction 027.png

Data Sections

Data Sections are tools to allow you to sub-divide a document and ease extraction techniques. Think of them like big input filters. There are several Extract Methods that can be used for Data Sections, and here we’ll cover two of the most commonly used.

Divider Mode

So, back in the Simple Table via Ordered Array section you may recall I ended that by saying that the table was capturing more than it was intended, and saying we would discuss the solution later. Now is the time! The solution to limiting the scope for that specific table is a Data Section. It would be possible to manipulate the extractor to be more specific, but why overcomplicate things when you have a tool at your disposal that lets you keep things simple. A Data Section using the Divider Extract Method will let us do just that.

Grooper ace architect collect data sections 001.png
  1. Right-click on the (data model).
  2. Select Add > Data Section from the command menu.
  3. Name it B. Services Borrower Did Not Shop For Section.
Grooper ace architect collect data sections 003.png
  1. Drag and drop the B. Services Borrower Did Not Shop For Data Table onto the newly created B. Services Borrower Did not Shop For Section Data Section.
    • This will create a hierarchical relationship, making the Data Table a child of the Data Section. It will, therefore, inherit from its parent.
Grooper ace architect collect data sections 004.png
  1. Select the B. Services Borrower Did not Shop For Section Data Section.
  2. Set the Scope property to SingleInstance.
  3. Set the Extract Method property to Divider.
  4. Click the drop-down arrow to expose the Extract Method's sub-properties.
  5. Click the drop-down arrow to expose the Divider Extractor's sub-properties.
  6. Set the Split Position property to Between.
  7. Select the Pattern property and click the ellipsis button the open the Pattern Editor window.
Grooper ace architect collect data sections 005.png
  1. In the Value Pattern area type or copy/paste the following:
  2. B\.|
    C\.
Grooper ace architect collect data sections 006.png
  1. With the Extract Method property (and it’s sub-properties), and Scope properties set...
  2. ... and with a quick press of the Save and Test Extraction buttons...
  3. ...you’ll notice this Data Section is now limiting the scope of the Data Elements that inherit from it. This section is not considering the entire document, just the instance of data created by splitting the document between the entries we just put in the Pattern Editor. As a result, the table within will only return results from within this specific instance of data.
Grooper ace architect collect data sections 007.png
Geometric Mode

Looking at the image on the right you can see a unique challenge. There’s a simple ordered array extractor setup to collect a couple of dates and a currency amount. We want to use this simple extractor to get the information for each section where it appears, but how do we isolate each section? The Divider approach would work to separate K and M from L and N, but how do we isolate K from M, and L from N? It would be far too difficult, if not impossible to do it with a pattern to define a Divider. This is exactly where a Data Section with a Geometric Extract Method comes in handy.

Grooper ace architect collect data sections 002.png
Create Value Extractor

First things first, we need to create an extractor for the set of values we’re seeking to collect. We’ll use several techniques we learned previously to make a simple extractor. It will ultimately return more values than we need, but we’ll use the power of our Data Section to, once again, limit the scope of the information.

  1. Right-click on the Ordered Array Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it OA-H - From, To, Amount.
Grooper ace architect collect data sections 008.png
  1. Right-click on the newly created OA-H - From, To, Amount Data Type.
  2. Select Contents > Add Multiple Items... from the command menu.
  3. In the Contents • Add Multiple Items window, set the Item Type property to Data Type.
  4. Click the drop-down menu for the Item Names property and copy/paste in the list from below:
    • From
    • To
    • Amount
Grooper ace architect collect data sections 009.png
  1. Select the From Data Type.
  2. Select the Referenced Extractors property and click the ellipsis button to bring up the Referenced Extractors window.
  3. In the Referenced Extractors window click the Add button to bring up the Select Items window.
  4. In the Select Items window from the drop-down tree view select VAL - Date from:
Closing Disclosures • (local resources) > Value Extractors
Grooper ace architect collect data sections 010.png
  1. Right-click anywhere in the property grid, except directly on a value.
  2. Select Copy Properties > All Properties from the command menu.
Grooper ace architect collect data sections 011.png
  1. Select the To Data Type.
  2. Right-click anywhere in the property grid, except directly on a value.
  3. Select Paste Properties from the command menu.
Grooper ace architect collect data sections 012.png
  1. Select the Amount Data Type.
  2. Select the Referenced Extractors property and click the ellipsis button to bring up the Referenced Extractors window.
  3. In the Referenced Extractors window click the Add button to bring up the Select Items window.
  4. In the Select Items window from the drop-down tree view select VAL - All Currency from:
Closing Disclosures • (local resources) > Value Extractors
Grooper ace architect collect data sections 013.png
  1. Select the parent 'Data Type, OA-H - From, To, Amount, and set the Collation property to Ordered Array.
  2. Click the drop-down arrow to expose its sub-properties.
  3. Set the Horizontal Layout property to Enabled.
  4. Click the drop-down arrow to expose its sub-properties and set the Maximum Distance property to 1.5in.
    • By default, an ordered array, in horizontal mode, will look the entire width of the document to find matches to the array. Setting this property will limit the distance by which matches can be considered.
Grooper ace architect collect data sections 014.png
  1. With the Collation property (and it’s sub-properties) set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is returning the combined results of its child extractors from left to right.
Grooper ace architect collect data sections 015.png
Create Key-Value Pair Extractor

This extractor is returning the desired values, but now we want to focus it specifically to the City/Town Taxes.

  1. Right-click the Key Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it KEY - City/TownTaxes.
Grooper ace architect collect data sections 016.png
  1. With the newly created KEY - City/Town Taxes Data Type selected...
  2. ...select the Pattern property and click the ellipsis button to bring up the Pattern Editor window.
Grooper ace architect collect data sections 017.png
  1. In the Value Pattern area type or copy/paste the following:
  2. City/Town Taxes
Grooper ace architect collect data sections 18.png
  1. Right-click the Key-Value Pair Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it KVP-H - City/TownTaxes.
Grooper ace architect collect data sections 019.png
  1. Right-click on the KVP-H City/Town Taxes Data Type.
  2. Select Contents > Add Multiple Items... from the command menu.
  3. In the Contents • Add Multiple Items window, set the Item Type property to Data Type.
  4. Click the drop-down menu for the Item Names property and type or copy/paste the list from below:
    • KVP-H - City Town Taxes - Key
    • KVP-H - City Town Taxes - Value
Grooper ace architect collect data sections 020.png
  1. Select the KVP-H - City Town Taxes - Key Data Type.
  2. Select the Referenced Extractors property and click the ellipsis button to bring up the Referenced Extractors window.
  3. In the Referenced Extractors window click the Add button to bring up the Select Items window.
  4. In the Select Items window from the drop-down tree view select KEY - City/Town Taxes from:
Closing Disclosures • (local resources) > Key Extractors
Grooper ace architect collect data sections 021.png
  1. Select the KVP-H - City Town Taxes - Value Data Type.
  2. Select the Referenced Extractors property and click the ellipsis button to bring up the Referenced Extractors window.
  3. In the Referenced Extractors window click the Add button to bring up the Select Items window.
  4. In the Select Items window from the drop-down tree view select OA-H - From, To, Amount from:
Closing Disclosures • (local resources) > Ordered Array Extractors
Grooper ace architect collect data sections 022.png
  1. Select the parent Data Type, KVP-H - City/Town Taxes.
  2. Set the Collation property to Key-Value Pair and expose it sub-properties.
  3. Set the Horizontal Layout property to Enabled.
Grooper ace architect collect data sections 023.png
  1. With the Collation property (and it’s sub-properties) set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type the desired dates and amount, but limited specifically to City/Town Taxes.
Grooper ace architect collect data sections 024.png
Configuring the Parent, Geometric Data Section

With our extractor property configured, it’s now time to configure our new Data Sections so we can focus our extractor on just the K. Due From Borrower at Closing section, which is within the Borrower's Transaction section. Keep in mind a Data Section does not, itself, return information. It is designed to limit the scope of text that our extractor will work against. The goal of the Borrower's Transaction Data Section will be to limit the text to the left column of the document.

  1. Right-click on the (data model).
  2. Select Add > Data Section... from the command menu.
  3. Name it Borrower's Transaction.
Grooper ace architect collect data sections 025.png
  1. Select the newly created Borrower's Transaction Data Section.
  2. Set the Scope property to SingleInstance.
  3. Set the Extract Method property to Geometric.
  4. Expose its sub-properties by clicking the drop-down arrow.
  5. Expose the Main Extractor sub-properties by clicking the drop-down arrow.
  6. Set the Type property to Internal.
  7. Select the Pattern property and click the ellipsis button to bring up the Pattern Editor window.
    • The goal of the main extractor is to first limit our text to a major section. We well, soon, further limit the text by manipulating the properties of this Extract Method.
Grooper ace architect collect data sections 026.png
  1. In the Value Pattern area type or copy/paste the following:
  2. Borrower's Transaction
    .*?
    CLOSING DISCLOSURE
    • Lines 1 and 3 are the beginning and the end, respectively. Line two is "everything in between". The . character being "any" character in regex, and the * being "zero to many". This regex pattern is "greedy" by default, so the * would allow it to get all characters. However, following the * with a ? makes it "lazy", essentially meaning "get as much as you can, but as few as necessary to match the pattern". Check out this link for some more information.
Grooper ace architect collect data sections 027.png

Having set the main area of text to get, the goal is to now "draw a box" to limit this section further. Think of the following ... Adjustment properties as defining each side of a rectangle.

  1. Set the Left Adjustment property to Edge of Page.
    • This allows text all the way to the left of the page to be collected. The "left side" of our box.
  2. Set the Top Adjustment property to Anchor and expose its sub-properties by clicking the drop-down arrow.
    • Here we’ll use a regex pattern to set the "top side" of our box.
  3. Expose the sub-properties of the Anchor Extractor property by clicking the drop-down arrow.
  4. Make sure the Type property is set to Internal
  5. Select the Pattern property and click the ellipsis button to bring up the Pattern Editor window.
Grooper ace architect collect data sections 028.png
  1. In the Value Pattern area type or copy/paste the following:
  2. Borrower's Transaction
Grooper ace architect collect data sections 029.png
  1. Set the Right Adjustment property to Anchor.
    • Like before, here we’ll use a regex pattern to set the "right side" of our box.
  2. Expose the sub-properties of the Right Adjustment property by clicking the drop-down arrow.
  3. Expose the sub-properties of the Anchor Extractor property by clicking the drop-down arrow.
  4. Make sure the Type property is set to Internal
  5. Select the Pattern property and click the ellipsis button to bring up the Pattern Editor window.
Grooper ace architect collect data sections 030.png
  1. In the Value Pattern area type or copy/paste the following:
  2. Seller's Transaction
Grooper ace architect collect data sections 031.png
  1. Set the Method property to Exclusive.
    • This property is important because it determines whether to draw the "right side" of our box to the right or the left of the pattern we just wrote. Setting it to Exclusive means it will exclude the pattern as a side consideration, therefore, drawing the "right side" of the box at the begging of the pattern, not the end.
  2. Set the Manual Adjustment property to -0.4in.
    • While the "right side" of our box is now to the left of the Seller's Transaction pattern, with this property we’re pushing the side slightly further to the left from there.
  3. Finally, set the Bottom Adjustment property, or the "bottom side" of our box, to Edge of Page.
Grooper ace architect collect data sections 032.png
Configuring a Divider Data Section within a Geometric Data Section

This Data Section is now setup to get the left column of the page, but it can be hard to understand its doing this without seeing a result. To test, you could add a Data Field to this section and set its pattern to .*, then text extraction on the Data Section. You would see the box it is drawing in this way.

However, we’ll simply move forward and create another section within this section to separate the K from the L sections.

  1. Right-click on the Borrower's Transaction Data Section.
  2. Select Add > Data Section... from the command menu.
  3. Name it K. Due from Borrower at Closing.
Grooper ace architect collect data sections 033.png
  1. Select the newly created K. Due from Borrower at Closing Data Section.
  2. Set the Scope property to SingleInstance.
  3. Set the Extract Method property to Divider.
  4. Expose its sub-properties by clicking the drop-down arrow.
  5. Expose the Divider Extractor sub-properties by clicking the drop-down arrow.
  6. Set the Type property to Internal.
  7. Set the Split Position property to Between.
  8. Select the Pattern property and click the ellipsis button to bring up the Pattern Editor window.
Grooper ace architect collect data sections 034.png
  1. Click the Properties tab.
  2. Set the Case Sensitive property to True.
Grooper ace architect collect data sections 035.png
  1. Back in the Pattern Editor tab, in the Value Pattern area type or copy/paste the following:
  2. K\.|
    L\.
Grooper ace architect collect data sections 036.png

We’ve now limited the scope of text even further, so now we need to add the Data Elements that this section will collect.

  1. Right-click the K. Due from Borrower at Closing Data Section.
  2. Select Contents > Add Multiple Items... from the command menu.
  3. In the Contents • Add Multiple Items window, click the drop-down menu for the Item Names property and copy/paste in the item names list from below:
    • City/Town Taxes From
    • City/Town Taxes To
    • City/Town Taxes Amount
Grooper ace architect collect data sections 037.png
  1. Select the City/Town Taxes From Data Field.
  2. Set the Value Extractor property to Reference.
  3. In the Extractor drop-down tree view select KVP-H - City/Town Taxes from:
Closing Disclosures • (local resources) > Key-Value Pair Extractors
Grooper ace architect collect data sections 038.png
  1. Set the Sub-Element Name property to From.
    • Think back to creating the ordered array extractor from earlier. The child Data Types we made are the sub-elements you see now. This is similar to what we did very early on with the named capture groups in the regex pattern.
Grooper ace architect collect data sections 039.png
  1. Right-click anywhere in the property grid, except directly on a value.
  2. Select Copy Properties > All Properties from the command menu.
Grooper ace architect collect data sections 040.png
  1. Select the City/Town Taxes To Data Field.
  2. Right-click anywhere in the property grid, except directly on a value.
  3. Select Paste Properties from the command menu.
Grooper ace architect collect data sections 041.png
  1. Set the Sub-Element Name property to To.
Grooper ace architect collect data sections 042.png
  1. Select the City/Town Taxes To Data Field.
  2. Right-click anywhere in the property grid, except directly on a value.
  3. Select Paste Properties from the command menu.
Grooper ace architect collect data sections 043.png
  1. Set the Sub-Element Name property to Amount.
Grooper ace architect collect data sections 044.png
  1. Select the Borrower's Transaction Data Section.
  2. Click the Test Extraction button.
  3. Notice the three results being returned are specific to the "K" section. You can also see a box being drawn around our section.
Grooper ace architect collect data sections 045.png

Calculate Expressions

Calculation Expressions are an incredibly important aspect of Grooper that allow you to generate or verify data. A Calculate Expression is a VB.Net code snippet which calculates the value for a field based on the value of other fields - similar to the way a formula defines a relationship between various cells in a spreadsheet. Depending on the value of the Calculate Mode property, the expression can be used to validate the field or automatically populate its value.

The expression must produce a value compatible with the field’s Value Type. If the field is a Decimal type, the expression should evaluate to a Decimal value. If the field is a DateTime type, the expression should evaluate to a DateTime value. If the expression produces an invalid value which cannot be converted to the field’s type, the field will be set to an error state.

Following is not an exhaustive list of expressions and all the ways they can be used, but merely a place to get you started so you can understand in a basic sense how and what they do.

Calculation of Peer Element in Immediate Scope

This first type of Calculate Expression is the most straight forward and easiest to consume. When Data Elements are in the same hierarchical branch, they are considered peers, and can call each other directly. The expression ends up looking like simple arithmetic.

  1. Select the Estimated Total Monthly Payment Data Column.
    • It’s very important, especially now, to continue to emphasize the relevance of heirarchy and as a result, inheritance. This Data Column we just selected, for example, is a local element to its immediate parent, the Payment Calculation Data Table. Other elements that share the same immediate parent are considered "in scope". All the elements within the (data model) inherit from it, but not all elements are immediate children of the (data model), and therefore, are not all local to one another. The most important aspect of this is knowing from where to Test Extraction while we’re building and testing. Always Test Extraction from the parent object where you’re expect inheritance to trickle down from.
  2. Select the Calculate Expression property and click the ellipsis button to bring up the Calculate Expression window.
Grooper ace architect collect calculate expressions 001.png

The goal of this particular expression will be to verify that the value extracted for the Estimated Total Monthly Payment Data Column is equal to the summation of the values extracted for the Principal and Interest, Mortgage Insurance, and Estimated Escrow Data Columns.

  1. Put your cursor in the blank area and type:
  2. P
    You should notice the IntelliSense menu pop up.
    • This menu has an awareness of all programmatic objects, classes, and functions available, as well as all peer objects that are in the immediate scope of the object you’re applying this calculation to. Therefore, you should see the names of Data Columns from this Data Table being listed. The only difference is the names have been normalized/sanitized for programmatic purposes (spaces replaced with underscores etc.)
  3. Select the Principal_and_Interest object from the list by double-clicking or using your arrows to navigate down to the object and pressing tab.
Grooper ace architect collect calculate expressions 002.png
  1. Continue the expression with (notice there are spaces in the following):
  2.  + 
  3. IntelliSense again has an awareness of objects available from here, so choose Mortgage_Insurance from the list.
Grooper ace architect collect calculate expressions 003.png
  1. Continue the expression with (notice there are spaces in the following):
  2.  + 
  3. IntelliSense again has an awareness of objects available from here, so choose Estimated_Escrow from the list.
Grooper ace architect collect calculate expressions 004.png
  1. The final expression should read as follows:
  2. Principal_and_Interest + Mortgage_Insurance + Estimated_Escrow
Grooper ace architect collect calculate expressions 005.png
  1. The Calculate Mode in this case will be left at Validate as that is the form of evaluation we’re seeking. Calculate Tolerance is simply a measure of how "wrong" something can be, 0 meaning that something must evaluate perfectly. Obviously, different business units may have varying thresholds they would be willing to consider.
Grooper ace architect collect calculate expressions 006.png
  1. Select the Payment Calculation Data Table.
    • Again, we want to run Test Extraction from the parent object of the entire scope. This will allow the elements that are expected to have values to populate with values so the Calculate Expression can properly evaluate.
  2. Click the Test Extraction button.
  3. This will extract like normal and not seem like anything is different than the first time we tested extraction on this table. However...
Grooper ace architect collect calculate expressions 007.png
  1. Select a value from one of the elements of which the Calculate Expression is being evaluated against and change its value.
  2. Notice the element with the expression is now in error and highlighted a pink-ish hue.
Grooper ace architect collect calculate expressions 008.png
  1. Select the element that is in error.
  2. Notice there is a message explaining why this error is occurring including an error message, what is expected, what the difference is, and what the tolerance for error is.
Grooper ace architect collect calculate expressions 009.png
Calculation of Non-Peer Element out of Immediate Scope

When Data Elements are not with in the same hierarchical branch, they are not considered peers and require slightly different enumerated syntax to be called by one another.

Creating a Key-Value Pair Extractor
  1. Right-click the Key Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it KEY - Cash to Close.
Grooper ace architect collect calculate expressions 010.png
  1. Select the newly created KEY - Cash to Close Data Type.
  2. Select the Pattern property and click the ellipsis button which will open the Pattern Editor window.
Grooper ace architect collect calculate expressions 011.png
  1. In the Value Pattern area type or copy/paste the following:
  2. Cash to Close
  3. In the Prefix Pattern area type or copy/paste the following:
    \n
Grooper ace architect collect calculate expressions 012.png
  1. Click on the Properties tab.
  2. Select the Result Filter property and click the ellipsis button to bring up the Result Filter window.
Grooper ace architect collect calculate expressions 013.png
  1. In the Result Filter window set the Page Filter property to 3.
  2. Set the Maximum Results property to 1.
    • Here, like with any "key" we want to return one result. Taking advantage of these properties allows us to whittle down to the one result we need. The "Cash to Close" value we’re looking for is the first value on page 3 (the second occurring at the bottom of the page.)
Grooper ace architect collect calculate expressions 014.png
  1. With the Pattern property set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll see this extractor now returns one result that will function as a "key".
Grooper ace architect collect calculate expressions 015.png
  1. Right-click the Key-Value Pair Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it KVP-H - Cash to Close.
Grooper ace architect collect calculate expressions 016.png
  1. Right-click the newly created KVP-H - Cash to Close Data Type.
  2. Select Contents > Add Multiple Items... from the command menu.
  3. In the Contents • Add Multiple Items window set the Item Type property to Data Type.
  4. In the Item Names drop-down type or copy/paste the following:
    • KVP-H - Cash to Close - Key
    • KVP-H - Cash to Close - Value.
Grooper ace architect collect calculate expressions 017.png
  1. Select the KVP-H - Cash to Close - Key Data Type.
  2. Select the Referenced Extractors property and click the ellipsis button to bring up the Referenced Extractors window.
  3. In the Referenced Extractors window click the Add button to bring up the Select Items window.
  4. In the Select Items window from the drop-down tree view select KEY - Cash to Close from:
Closing Disclosures • (local resources) > Key Extractors
Grooper ace architect collect calculate expressions 018.png
  1. Select the KVP-H - Cash to Close - Value Data Type.
  2. Select the Referenced Extractors property and click the ellipsis button to bring up the Referenced Extractors window.
  3. In the Referenced Extractors window click the Add button to bring up the Select Items window.
  4. In the Select Items window from the drop-down tree view select VAL - All Currency from:
Closing Disclosures • (local resources) > Value Extractors
Grooper ace architect collect calculate expressions 019.png
  1. Select the parent Data Type, KVP-H - Cash to Close.
  2. Set the Collation property to Key-Value Pair.
  3. Expose its sup-properties by click the drop-down arrow.
  4. Set the Horizontal Layout property to Enabled.
Grooper ace architect collect calculate expressions 020.png
  1. With the Collation property (and its sub-properties) set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now the one currency value to the right of the defined key.
Grooper ace architect collect calculate expressions 021.png
Supplying the Results to the Data Model
  1. Right-click the (data model).
  2. Select Add > Data Field... from the command menu.
  3. Name it Cash to Close.
Grooper ace architect collect calculate expressions 022.png
  1. Select the newly created Cash to Close Data Field.
  2. Set the Value Type property to Decimal.
  3. Expose its sub-properties by click the drop-down arrow.
  4. Set the Format Specifier property to c2.
Grooper ace architect collect calculate expressions 023.png
  1. Set the Value Extractor property to Reference.
  2. Expose its sub-properties by click the drop-down arrow.
  3. In the drop-down tree view of the Extractor property select KVP-H - Cash to Close from:
Closing Disclosures • (local resources) > Key-Value Pair Extractors
Grooper ace architect collect calculate expressions 024.png
  1. Select the Calculate Expression property and click the ellipsis button to bring up the Calculate Expression window.
Grooper ace architect collect calculate expressions 025.png

The (data model) is the immediate parent of the Cash to Close Data Field, therefore, it is not in the same immediate scope as the Data Column by which it will be evaluated against. This isn’t a problem, however, as we can use an enumeration syntax to get to that element.

  1. Start the expression by typing:
  2. C
  3. Select the Calculating_Cash_to_Close object from the list.
    • Because the Data Column we wish to evaluate against is in a different scope, we start by going through its parent object. In this case the Calculating Cash to Close Data Table. This table is an immediate peer to our Cash to Close Data Field, so notice it is immediately available via IntelliSense.
Grooper ace architect collect calculate expressions 026.png
  1. Continue the expression by typing:
  2. S
  3. Select the SumOf function from the list.
Grooper ace architect collect calculate expressions 027.png
  1. Finish the expression by typing the following:
  2. ("Loan Estimate")
    • Again, because the desired element is not within scope, we can not "see" it normally, so IntelliSense won’t pick it up. The column in question lives within the Calculating Cash to Close Data Table so we enumerate into it via the parenthesis. We then surround the name of the desired element in quotes. As a result of it being evaluated as a "string type" value, we do not need to sanitize the name.
  1. The final expression should read as follows:
  2. Calculating_Cash_to_Close.Sumof("Loan Estimate")
Grooper ace architect collect calculate expressions 028.png
  1. Select the (data mode).
    • We need to run extraction from this level because the Cash to Close Data Field does not have awareness of the elements its being evaluated against. Extracting from the (data model) allows us to have extracted values in the appropriate Data Element and therefore will allow the Cash to Close Data Field to evaluate properly.
  2. Click the Test Extraction button.
  3. You can see the Cash to Close Data Field extracts properly and is not in error, because the Calculate Expression evaluated correctly. If you wanted to test, you could change one of the values in the Loan Estimate Data Column to see the math not add up, and produce an expected error.
Grooper ace architect collect calculate expressions 029.png
Simple LINQ Calculation for Multiple Non-Peer Elements out of Immediate Scope

Similar to the previous expression type, once again we are considering elements that are not peers. However, the syntax required to calculate more than one element at once in this way requires exploring a bit of the new LINQ syntax in Grooper.

Creating a Key-Value Pair Extractor
  1. Right-click the Key Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it KEY - Services Borrower Did Not Shop For.
Grooper ace architect collect calculate expressions 030.png
  1. Select the newly created KEY - Services Borrower Did Not Shop For Data Type.
  2. Select the Pattern property and click the ellipsis button which will open the Pattern Editor window.
Grooper ace architect collect calculate expressions 031.png
  1. In the Value Pattern area type or copy/paste the following:
  2. Services Borrower Did Not Shop For
Grooper ace architect collect calculate expressions 032.png
  1. Right-click the Key-Value Pair Extractors folder.
  2. Select Add > Data Type... from the command menu.
  3. Name it KVP-H - Services Borrower Did Not Shop For.
Grooper ace architect collect calculate expressions 033.png
  1. Right-click the newly created KVP-H - Services Borrower Did Not Shop For Data Type.
  2. Select Contents > Add Multiple Items... from the command menu.
  3. In the Contents • Add Multiple Items window set the Item Type property to Data Type.
  4. In the Item Names drop-down type or copy/paste the following:
    • KVP-H - Services Borrower Did Not Shop For - Key
    • KVP-H - Services Borrower Did Not Shop For - Value.
Grooper ace architect collect calculate expressions 034.png
  1. Select the KVP-H - Services Borrower Did Not Shop For - Key Data Type.
  2. Select the Referenced Extractors property and click the ellipsis button to bring up the Referenced Extractors window.
  3. In the Referenced Extractors window click the Add button to bring up the Select Items window.
  4. In the Select Items window from the drop-down tree view select KEY - Services Borrower Did Not Shop For from:
Closing Disclosures • (local resources) > Key Extractors
Grooper ace architect collect calculate expressions 035.png
  1. Select the KVP-H - Services Borrower Did Not Shop For - Value Data Type.
  2. Select the Referenced Extractors property and click the ellipsis button to bring up the Referenced Extractors window.
  3. In the Referenced Extractors window click the Add button to bring up the Select Items window.
  4. In the Select Items window from the drop-down tree view select VAL - All Currency from:
Closing Disclosures • (local resources) > Value Extractors
Grooper ace architect collect calculate expressions 036.png
  1. Select the parent Data Type, KVP-H - Services Borrower Did Not Shop For.
  2. Set the Collation property to Key-Value Pair.
  3. Expose its sup-properties by click the drop-down arrow.
  4. Set the Horizontal Layout property to Enabled.
Grooper ace architect collect calculate expressions 037.png
  1. With the Collation property (and its sub-properties) set (and with a quick press of the Save and Test Single buttons)...
  2. ...you’ll notice this Data Type is now the one currency value to the right of the defined key.
Grooper ace architect collect calculate expressions 038.png
Supplying the Results to the Data Model
  1. Right-click the B. Services Borrower Did Not Shop For Section Data Section.
  2. Select Add > Data Field... from the command menu.
  3. Name it Borrower-Paid Total.
Grooper ace architect collect calculate expressions 039.png
  1. Select the newly created Borrower-Paid Total Data Field.
  2. Set the Value Type property to Decimal.
  3. Expose its sub-properties by click the drop-down arrow.
  4. Set the Format Specifier property to c2.
  5. Set the Value Extractor property to Reference.
  6. Expose its sub-properties by click the drop-down arrow.
  7. In the drop-down tree view of the Extractor property select KVP-H - Services Borrower Did Not Shop For from:
Closing Disclosures • (local resources) > Key-Value Pair Extractors
Grooper ace architect collect calculate expressions 040.png
  1. Select the Calculate Expression property and click the ellipsis button to bring up the Calculate Expression window.
Grooper ace architect collect calculate expressions 041.png
  1. LINQ query expressions must begin with a from statement, and we will be getting a Row, so start the statement with (notice there is a space in the following):
  2. From  Row 
  3. After the space IntelliSense will kick in. Choose the In clause and follow it with a space.
Grooper ace architect collect calculate expressions 042.png
  1. The space will once again initiate the IntelliSense and have awareness of the peer object we want.
  2. Select the B_Services_Borrower_Did_Not_Shop_For object from the list.
    • This is the Data Table that exists in this Data Section with this newly created Data Field. This statement will allow us to enumerate into the peer object.
Grooper ace architect collect calculate expressions 043.png
  1. Now that we know From what we’re getting, and In what, we state specifically what to Select.
  2. Continue the expression by adding the following (notice there are spaces in the following):
     Select 
Grooper ace architect collect calculate expressions 044.png
  1. Continue the expression by adding the following:
    Row.
  2. IntelliSense will kick in again so choose the Borrower_Paid_Before_Closing object from the menu.
    • Because we’re using LINQ syntax, and have properly enumerated to a peer object, the menu will have awareness of child objects available.
Grooper ace architect collect calculate expressions 045.png
  1. Continue the expression by adding the following (notice there are spaces in the following):
     + R
  2. IntelliSense will kick in again so choose the Row object from the menu.
Grooper ace architect collect calculate expressions 046.png
  1. Continue the expression by adding the following:
    .Borrower_Paid_At_Closing
Grooper ace architect collect calculate expressions 047.png
  1. Place an open parenthesis ( at the beginning of the expression...
  2. ...and place a close parenthesis ) at the end of the expression.
  3. Complete the expression by adding the following:
    .Sum
Grooper ace architect collect calculate expressions 048.png
  1. The final expression should read as follows:
  2. (From Row In B_Services_Borrower_Did_Not_Shop_For Select Row.Borrower_Paid_Before_Closing + Row.Borrower_Paid_At_Closing).Sum
Grooper ace architect collect calculate expressions 049.png
  1. Select the parent of this entire scope, the B. Services Borrower Did Not Shop For Data Section.
  2. Click Test Extraction.
  3. You can see the Borrower-Paid Total Data Field extracts properly and is not in error, because the Calculate Expression evaluated correctly. If you wanted to test, you could change one of the values in the Borrower-Paid At Closing, or the Borrower-Paid Before Closing Data Columns to see the math not add up, and produce an expected error.
Grooper ace architect collect calculate expressions 050.png

Deliver Data from the Documents

You now have all the tools at your disposal to, if you wanted, flesh out the Content Model to get every piece of data on this document set. While the work would be quite useful as a learning experience on your own, it would not be efficient to cover every single repeated step here.

To that end, the assumption will be made that what has been made so far will suffice, and we’ll move on with the final piece to this Grooper A.C.E. • Architect journey: the delivery of the collected data from the documents.

We’ve made all the requisite components to Acquire, Condition, Organize, and Collect the data. We simply need to put them all together, along with a component to Deliver, in a Batch Process so that we can run our documents through Grooper in an automated fashion.

Creating a Batch Process

A Batch Process is a sequence of individual Batch Process Step objects, each specifying an Activity to be applied. Activities may represent automated system tasks, or human-attended tasks which require operator interaction. Collectively, these steps represent a workflow process through which batches of a particular class will travel.

Once created and published, Batch Processes are assigned to production Batches at Batch creation time.

  1. Navigate to:
Batch Processing > Processes
  1. Right-click the Working folder.
  2. Select Add > Batch Process... from the command menu.
  3. Name it Closing Disclosures.
Grooper ace architect deliver 001.png
  1. Select the newly created Closing Disclosures Batch Process.
  2. Click the Add Step... button.
Grooper ace architect deliver 002.png
  1. As you are working on creating your Batch Process you can see the Validation Errors area display errors that will prevent your Batch Process from functioning properly.
  2. In the Properties of Batch Process Step property grid click the drop-down arrow for the Activity Type property and select Content Action from the menu.
    • To Acquire our documents we will be doing a File System Import, which is not a part of the Batch Process. Given that, our process starts with tasks related to Conditioning our documents.
Grooper ace architect deliver 003.png
  1. As you’re adding steps and setting their Activity Type property pay attention to the Scope property.
    • Typically the Scope default property will be correct, but you may need to consider other scopes at times, such as Page, or Batch, or a different folder level.
  2. Also pay attention to the Thread Pool property.
    • Think of a Thread Pool like a waiting room. In a moment, we will create a service that will pick up run tasks in an automated fashion. It knows what tasks to pick up because they will belong to a specific Thread Pool (depending on this property). It’s like the Doctor’s assistant coming into the waiting room and calling the next person. There are a myriad of reasons one would establish different thread pools, and they depend entirely on your workflows, infrastructure, and licensing restrictions.
  3. Click the Add Step... button.
Grooper ace architect deliver 004.png
  1. In the Properties of Batch Process Step property grid click the drop-down arrow for the Activity Type property and select Recognize from the menu.
Grooper ace architect deliver 005.png
  1. In the Properties of Recognize Activity property grid click the drop-down arrow for the Alternate IP property and select Feature Detection from the menu.
    • If you recall this is the IP Profile we made early on in this article to detect form features like lines and boxes.


With this step added we can consider the Condition phase of our Batch Process to be established.

Grooper ace architect deliver 006.png
  1. Click the Add Step... button.
Grooper ace architect deliver 007.png
  1. In the Properties of Batch Process Step property grid click the drop-down arrow for the Activity Type property and select Classify from the menu.
Grooper ace architect deliver 008.png
  1. In the Properties of Classify Activity property grid click the drop-down arrow for the Content Model Scope property and select Closing Disclosures from the menu.
    • This is the Content Model we’ve been working on for some time now.
Grooper ace architect deliver 009.png
  1. Click the Add Step... button.
Grooper ace architect deliver 010.png
  1. In the Properties of Batch Process Step property grid click the drop-down arrow for the Activity Type property and select Classify Review from the menu.
    • Every Activity Type we’ve set so far has been what is called an Unattended Activity, meaning, the intention is that a service will pick up the work and the computer will perform the task in an automated fashion. However, this is the first Attended Activity we’ve set. This is an activity meant for a person to interact with.
Grooper ace architect deliver 011.png
  1. In the Properties of Classify Review Activity property grid click on the Classification Viewer Settings property and click the ellipsis button to bring up the Classification Viewer Settings.
Grooper ace architect deliver 012.png
  1. In the Classification Viewer Settings window click the Content Model Scope property and click the drop-down arrow. In the drop-down tree view select Closing Disclosures.


With this step added we can consider the Organize phase of our Batch Process to be established.

Grooper ace architect deliver 013.png
  1. Click the Add Step... button.
Grooper ace architect deliver 014.png
  1. In the Properties of Batch Process Step property grid click the drop-down arrow for the Activity Type property and select Extract from the menu.
    • We will rarely need to adjust settings for this activity as its main function is to extract data from documents. Given that the documents provided to this step have been classified, it will know which Content Model to look in for extraction information because the Document Type the documents will be classified as will belong to that model.
Grooper ace architect deliver 015.png
  1. Click the Add Step... button.
Grooper ace architect deliver 016.png
  1. In the Properties of Batch Process Step property grid click the drop-down arrow for the Activity Type property and select Data Review from the menu.
    • This is another Attended Activity that will rely on human interaction.

You may want to set the Allow Completion with Invalid Documents property to True. This will let you complete the Data Review module without having to correct all errors.

With this step added we can consider the Collect phase of our Batch Process to be established.

Grooper ace architect deliver 017.png
  1. Click the Add Step... button.
Grooper ace architect deliver 018.png
  1. In the Properties of Batch Process Step property grid click the drop-down arrow for the Activity Type property and select Document Export from the menu.
    • This is the main step associated with how we Deliver the data from our documents (and in this case we’ll actually deliver the documents as well.)
Grooper ace architect deliver 019.png
  1. In the Properties of Document Export Activity property grid click on the Export Provider property and click the drop-down button. Select File System Export from the menu.
Grooper ace architect deliver 020.png
  1. Select the Export Settings property and click the ellipsis button to bring up the File System Export window.
Grooper ace architect deliver 021.png
  1. In the File System Export window select the Base Export Folder property and input a (preferably fully qualified UNC) path.
  2. Click the drop-down arrow to expose the sub-properties of the File Export Settings property.
  3. Set the Content Format property to PDF.
  4. Click the drop-down arrow to expose the sub-properties of the Content Format property.
Grooper ace architect deliver 022a.png
  1. Set the Make Searchable property to True.
    • When True, the PDF pages generated from Grooper page images will be generated as searchable PDF pages.
  2. Set the Prefer Child Versions property to True.
    • When True, the output file generation process will prefer content stored on child objects over content stored on parent objects. By default, if a folder contains a PDF or image-based version, then the folder’s content will be included in the output, and any content stored on child objects will not be included. Enabling this option allows the output file to be instead constructed using the content stored on child folder or page objects. For example, in a case where PDF files are imported, and image processing has been applied to the pages, it may be desirable to output the processed page versions rather than the original version stored on the folder.
  3. Set the Update Link property to False.
    • When True, the Content Link on the document will be updated to point at the exported content. In this case, it will be false as the batch will be set to Dispose when done, so the link is irrelevant anyway. If you were perhaps sending this content to a CMS, and needed that link to do further work, it would be beneficial for the link to be updated.
Grooper ace architect deliver 022b.png
  1. Set the Content Type property to Closing Disclosures from the drop-down tree view..
    • Giving this step awareness of the Content Model will allow the Export Mappings property to be set to Data Elements within this scope.
  2. Set the Export Mappings property to the Borrower_Name_s option.
    • This is a list of Data Elements from the selected Content Type from above. The names are sanitized for the sake of programmatic functionality.
  3. Set the Metadata Format to JSON.

With this step added we can consider the Deliver phase of our Batch Process to be established.

Grooper ace architect deliver 022c.png
  1. Click the Add Step... button.
Grooper ace architect deliver 023.png
  1. In the Properties of Batch Process Step property grid click the drop-down arrow for the Activity Type property and select Dispose Batch from the menu.
Grooper ace architect deliver 024.png
  1. In the Properties of Dispose Batch Activity property grid set the Delete Batch property to True.
Grooper ace architect deliver 025.png
  1. Save and click the Publish button.
    • Our "Working" Batch Process is not available to production until it is published.


Batch Processes must be "published" before they are available for production use. This can be done using the Publish button on the Batch Process > General tab. Publishing places a read-only copy of the Batch Process into the Batch Processing > Processes > Published folder, making it available for production use. Once a Batch Process has been published, all new batches will use the current published version. When changes are made to a Batch Process and a new version is published, the changes will apply to new batches, but will not impact existing batches already in progress. To apply the latest published version of a Batch Process to an existing batch, pause the batch and use the Batch > Update Process command. To unpublish a previously-published Batch Process, making it unavailable for production processing, use the Unpublish button on the Batch Process > General tab.

Grooper ace architect deliver 026.png
  1. With the Batch Process published, you can see the Publishing Info properties are updated.
  2. You can also see that a copy of this process has been added to the Published branch of the node tree.
Grooper ace architect deliver 027.png
  1. Select the published Batch Process.
  2. You can see that the properties of this published Batch Process are read only.
Grooper ace architect deliver 028.png

Creating an Activity Processing Service

An Activity Processing service executes Unattended Activities (non-human interactive), which perform the work associated with a step in a Batch Process.

  1. Start by opening Grooper Config and going to the Settings tab.
    • Make sure Grooper Config is run as an Administrator.
  2. Make sure that the credentials listed in this tab have read/write access to the Grooper database.
Grooper ace architect deliver 029.png
  1. Click the Services tab.
  2. Click the Install button to bring up the Install New Service window.
Grooper ace architect deliver 030.png
  1. In the Install New Service window, assuming you have more than one Grooper Repository available, make sure the Repository property drop-down is set to the desired one.
  2. Select Activity Processing and click OK to bring up the Grooper Activity Processing window.
Grooper ace architect deliver 031.png
  1. Make sure the Thread Pool property is set to the desired pool.
    • Recall earlier the description of Thread Pool property on Activities. They are pointed at specific Thread Pools, and in turn, an Activity Processing Service knows which Activities to pick up because it is also pointed at a specific Thread Pool.
  2. Set the Number of Threads property to the desired amount.
    • This will have entirely to do with the amount of CPU Cores available to the system the Activity Processing Service is installed on. A Good standard is n-1, where n=# of CPU Cores.
  3. Notice the User Name and Password are populated based on the information entered from the Settings tab.
Grooper ace architect deliver 032.png
  1. There will now be a Grooper Activity Processing, Instance # in the Services list view.
    • Services are installed in a "Stopped" state, and have a red dot to indicate they are not active.
Grooper ace architect deliver 033.png
  1. With the newly installed Grooper Activity Processing service selected, click the Start button.
Grooper ace architect deliver 035.png
  1. The red dot will turn green to indicate that the service is active.
    • As Activities in a Batch Process "submit" their work, they are simply creating row entries in the dbo.ProcessingTask table in the Grooper database. This table is, in turn, being polled by Activity Processing services. As the services see entries added to this table, they know to then pick up the task and perform the work.
Grooper ace architect deliver 035.png

Manual Batch Process Start via File System Import

To get a better sense of what is occurring during the creation of a Batch and the associated execution of its assigned Batch Process we will start one manually and observe the process.

  1. Back in Grooper Design Studio, expand the node three to:
Batch Processing > Batches
...and select the Production folder.
  1. Click the Batch drop-down menu.
  2. Select Legacy > File System Import... from the menu to open the File System Import window.
Grooper ace architect deliver 036.png

The following configuration will probably seem familiar if you went through the steps in the Bonus! File System Import tab from the Acquire the Documents section from earlier. Feel free to review some of this information, especially Step 5 that covers the topic of Sparse Import.

  1. In the File System Import window insert into the Base Directory property the path to where the unzipped Closing Disclosure forms that were supplied with this article reside.
  2. Set the Include Subfolders property to False.
    • The reason for this setting being false is that, in the environment I have configured, the folder to which the imported files will ultimately reside is an "Archive" folder that is IN the folder from which I am initially importing. Given that, I would not want to import "archived" files.
Grooper ace architect deliver 037a.png
  1. Set the Sparse Import property to False.
  2. With the Sparse Import setting changed to False, you can now change the File Disposition property to Move.
    • It makes sense that you cannot set this property to Move with Sparse Import set to True. Moving a file upon its arrival into Grooper would inherently break the link.
  3. Setting the File Disposition property to Move exposes the Disposition Directory property which you should now supply with a path.
Grooper ace architect deliver 037b.png
  1. Set the Starting Step property to the first step, Content Action, of the recently created and published Closing Disclosures Batch Process.
  2. Set a Batch Name Prefix if you so choose.
  3. Leave the Start Paused property to True for now.
Grooper ace architect deliver 037c.png
  1. Select the newly created Batch and take note of the Task Status area being blank, as this Batch has started paused.
  2. Press the Start button to allow this Batch to begin processing.
Grooper ace architect deliver 038.png
  1. With the Batch now actively being processed you should notice a progress bar begin to fill on the Content Action step in the Task Status area.
    • This is an Unattended Activity that has been picked up by the recently installed Activity Processing Service.
  2. Take note of the legend for the meanings of the coloring on the progress bars.
Grooper ace architect deliver 039.png
  1. As steps complete you will eventually come to the Classify Review step and it will not begin to process, and its bar will stay gray.
    • As discussed earlier, this is an Attended Activity meaning that an Activity Processing Service will not pick it up because this is intended to be a human interactive step.
  2. Press the Process button to launch the Grooper Attended Client application
    • Due to the fact it’s being launched for this particular activity, the Grooper Attended Client application will launch in the Classify Review configuration.
Grooper ace architect deliver 040.png
  1. The point of this interface is for a human to make sure that the documents were classified properly.
    • A user merely need to scroll through the list of documents and make sure their name is that of the desired Document Type, in this case Closing Disclosure. The user can apply different Document Types if necessary.
  2. As you click on the documents you can see the system’s confidence it is this Document Type in the Document Type Candidates list view. This does not mean that it SHOULD be this choice, merely that, based on the configuration applied, it sees it as such. In this case, because we established a Rules Based approach, and the applied extractor returned a result, it will always be 100%.
  3. When done reviewing click the Complete Task button.
Grooper ace architect deliver 041.png
  1. Data Review is the next "Attended" step.
  2. Click the Process button to launch the Grooper Attended Client in its Data Review configuration.
Grooper ace architect deliver 042.png

Much like Classify Review, the point of this interface is for a human to review the extracted data and verify its veracity. One can click or tab through fields to see them highlighted in the Document Viewer and determine if they need to change the data or not.

  1. Click the Complete Task button once all fields are deemed accurate and not in error.
Grooper ace architect deliver 043.png

After the review of the data the documents will get exported with their metadata JSON file. After this the Batch will be disposed. Given that, the object will no longer exist in the Grooper database. Once Dispose Batch completes, and you refresh this list view, you will no longer see the Batch being listed. Grooper is intended to be a transient system that collects and delivers data. It is not truly meant to store Batches once they are complete, so it is common to have them "dispose" after their Batch Process comes to an end. This is not the only option, as one can perhaps choose to archive Batches, for example, for some period before disposal, among other things.

Grooper ace architect deliver 044.png

To view the exported documents and data, open a File Explorer window and navigate to the directory you configured for export.

  1. Notice for each exported document there is a searchable PDF, as well as a JSON file with metadata information. These files are named after the Data Element you chose in the configuration of Document Export, in this case Borrower Name(s).
Grooper ace architect deliver 045.png
  1. Open one of the JSON files and observe the extracted information.
Grooper ace architect deliver 046.png

Automated Batch Process Start via Import Watcher Service

Finally, to better grasp the scenario of a "real world" production environment we will activate a service that will poll a directory on a regular interval and automate the creation of a Batch and execution of its assigned Batch Process.

  1. Launch Grooper Config and go to the Services tab.
  2. Click the Install button to open the Install New Service window.
Grooper ace architect deliver 047.png
  1. In the Install New Service window select Import Watcher.
Grooper ace architect deliver 048.png
  1. In the Grooper Import Watcher window select the Provider Type property and select File System Import from the drop-down menu.
Grooper ace architect deliver 049.png
  1. Select the Provider Settings property and click the ellipsis button to launch the File System Import window.
Grooper ace architect deliver 050.png

The properties in this window should seem very familiar as they are exactly the settings we’ve seen when using the File System Import from earlier. To that end, set all the properties the same as that setup.

  1. For this service, however, set the Start Paused setting to False so Batches can being processing without a human having to press a button.
Grooper ace architect deliver 051.png
  1. The Timer Interval property should be considered carefully as this is the frequency by which this service will poll the Base Directory to look for documents to start a batch from.
  2. You can also choose to use the Enable Scheduling property and have polling happen on a designated schedule, as opposed to constant polling at an interval.
Grooper ace architect deliver 052.png
  1. Select the newly installed service.
  2. When you are ready for this service to begin its iterated, or scheduled polling, click the start button.
Grooper ace architect deliver 053.png