2023.1:Secondary Types (Property)

From Grooper Wiki
Revision as of 10:26, 27 August 2024 by Randallkinard (talk | contribs)

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252023.1

Secondary Types allow the application of multiple Content Types to a single folder Batch Folder.

You may download and import the file(s) below into your own Grooper environment (version 2023.1). There is a Batch with the example document(s) discussed in this tutorial, as well as a Project configured according to its instructions. Given the proprietary nature of SharePoint and Database connections, the connection objects and their configurations cannot be shared.
Please upload the Project to your Grooper environment before uploading the Batch. This will allow the documents within the Batch to maintain their classification status.

About

One of the most important questions to ask about a document traveling through Grooper is "what kind of document is this?". Of course, answering this question is represented in Grooper through the Classify activity. Specifically, Classification in Grooper associates a Batch Folder (or document) with a Content Type. Once a Content Type is applied to the folder, Grooper knows which Content Model to leverage for the remaining logical operations concerning the Batch Folder like what to extract from it, among other things.
Historically only one Content Type could be assigned to a Batch Folder, limiting the logical operations that could be applied to it, but with Secondary Types this is no longer a limitation.

A Simple Example

Following we will see a simple example of the role Secondary Types can play.

In the "Multiple Types, Single Document" Project we have two distinct Document Types with their own models, but the accompanying Batch has three documents, and one happens to be a document that could reasonably be seen as either of the two Document Types.

If you want to investigate the construction of the supplied Projects please feel free to poke around. This article will not cover the configuration of all of its parts, as it is not necessary to understanding the topic at hand, but what is relavent to Secondary Types will be covered.





How To

With a general understanding of what Secondary Types are let's now take a look at how to configure them using a couple of examples.

EOB Form with Check

Following is an example where a Batch contains EOB packets. In these packets you may or may not have a check that will come with it. In the cases where it does we want avoid a couple of old ways of doing things that would add complexity. In one case we could Separate the check into its own document. In another case we could have the Data Model contain extraction logic for checks and EOBs all the time. Both of these add unnecessary complexity. With Separation we're attempting to do one of the more challenging things in Grooper that require an extra level of logic and review. If we have the Data Model have extraction for checks and EOBs then often we will have blank Data Elements when there is not a check, and this can be confusing to people doing review and/or require a backend system to contain unnecessary extra fields that would remain blank or null.

Understanding the Setup

What we want to do is avoid this unnecessary complexity altogether, therefore, a packet with checks will not only be assigned an "EOB" Document Type, but also a "Check" Secondary Type when applicable. The key to having a second Document Type applied to a document, i.e. a "Secondary Type", is configuring multiple Batch Process Steps configured for Classification within a Batch Process. The first Classification will apply a "primary" type. The second Classification, due to the use of the Secondary setting of the Reclassify Mode property, will apply the "Secondary Type".

Follow the instructions in the screenshots below.

  1. In the Project provided for this article, "02 With Secondary Types"...
  2. ...there is a Batch Process called "EOB Demo (Secondary Types)"...
  3. ...and there is a Batch Process Step called "Primary Classification"...
  4. ...and the Content Model Scope property is scoped to the "Providers" Content Category.



  1. There is also a second classification Batch Process Step called "Secondary Classification"...
  2. ...and the Content Model Scope property is scoped to the "Payments" Content Category.
  3. However, because the Reclassify Mode property is set to Secondary, the classificaiton will not be overwritten, but instead a "Secondary Type" will be added when Classification executes.

Viewing Properties to See Secondary Type

  1. Select a document that has been classified with both its primary and secondary types.
  2. Be sure to be on the "Batch Folder" general tab of the object.
  3. In the "General" section of properties on the selected document you can see the Content Type property is populated by the Document Type it was classified as.
  4. If you click the ellipsis button for the Secondary Types property it will pull up the "Secondary Types" dialog box.
  5. In this "Secondary Types" dialog box you will see a check next to the Document Type that has been set as its "Secondary Type".



  1. You can also click on the "Advanced" tab...
  2. ...and from there you can see the the typical "Grooper.DocumentData.json" that was created when Extraction completed on this document.
  3. However, there is now also a .json file for the Secondary Type's extraction with the GUID of the Secondary Type's Document Type concatenated to the title of the file.
  4. If you click the ellipsis button for the Properties property it will open the "Properties" dialog box.
  5. In this "Properties" dialog box you can see the GUID for the "ContentTypeId" as well as the "SecondaryTypeId". These are the GUIDs of their respecitive Content Types.

Viewing Extraction Results for Document with Applied Secondary Type

  1. Select a Batch Process Step that has been configured for Review with a Data View. In the materials provided for this article, it will be the "EOB Review" Batch Process Step within the "EOB Demo (Secondary Types)" Batch Process of the "02 With Secondary Types" Project.
  2. Click on the "Activity Tester" tab.
  3. In the Batch Viewer select the appropriate scope. Becasue this is a "Review" activity, the "Batch" level is the appropriate scope.
  4. Click the "Test" button to test the activity.



  1. In the Data Viewer the data collected for the primary Document Type can be seen as usual and the name of that Document Type can be seen in the header above the data.
  2. If you double-click the header it will collapse the extracted data for that Document Type.



  1. Having collapsed the data for the primary type...
  2. You can now see a header for the Secondary Type, and underneath it the extracted data for that Document Type.

Using Content Type Filter to Limit Data Shown in Review

  1. Select a Batch Process Step that has been configured for Review with a Data View. In the materials provided for this article, it will be the "EOB Review" Batch Process Step within the "EOB Demo (Secondary Types)" Batch Process of the "02 With Secondary Types" Project.
  2. Be sure to be on the "Batch Process Step" general tab.
  3. Click the ellipsis button for the Views property to open the "Views" dialog box.
  4. In the "Views" dialog box select the "Data View" that was added to the list.
  5. Click the ellipsis button for the Content Type Filter property to open the "Content Type Filter" dialog box.
  6. From the "Content Type Filter" dialog box you can select a Content Type to limit the data that will be shown with this step is run. In this case, the "Check" Document Type is selected.



  • Now, in Review, only the data for the selected Content Type will be shown.

Document Variability and Data Normalization

In the screenshot below you will see three different documents that essentially have the same type of information, but each is laid out differently. The end point for collected data ideally does not have variability and as a result seeks to normalize the variation. However, to ease the work and lessen opportunities for mistakes for those doing review of collected data you would allow them to review the data in as close to a 1to1 way as possible. The model the reviewer interacts with should look just like the document. Long story short, we want to separate the normalization of extracted data from the review of said data.

To accomplish this we will take advantage of another benefit of Secondary Types and have two models. The first model will be as close to the documents as possible, while the second will be the normalized version. We will take advantage of the Convert Data activity which will associate a Secondary Type after the data is first reviewed, then via some expressions convert the information to its "final" form.

User Review of Extracted Data

First off, let's take a look at the setup for the user review. We will focus simply on the "Extract Review" step that is made.

Follow the instructions in the screenshots below.

  1. Expand the contents of a Project. In the materials provided for this article, it will be the "02 With Secondary Types" Project.
  2. Select a Batch Process Step. In the materials provided for this article, it will be the "Extract Review" Batch Process Step within the "Secondary Types Process" Batch Process.
  3. Set the Activity to Review and select an appropriate setting for the Scope property. For this example Batch is the appropriate scope.
  4. Click the ellipsis button for the Views property to open the "Views" dialog box.
  5. In the "Views" dialog box add a "Data View". For this example there is a view already added, so select it from the list.
  6. Click the ellipsis button for the Content Type Filter property to open the "Content Type Filter" dialog box.
  7. In the "Content Type Filter" dialog box, for this example, we can see the "Extract Model" Content Category is selected.



  1. Let's see the results for this example by going to the "Activity Tester" tab.
  2. In the Batch Viewer be sure to select the provided "Extract vs Export Models" Batch and select the top level folder.
  3. Click the "Test" button to run the activity.



  1. In the Data Viewer we can see that the data in the model matches that of the document making it easy to review.
  2. Feel free to navigate to the other documents using the navigation buttons at the top.

Convert Data Activity and its Role with Secondary Types

Next we will cover how the Convert Data activity is being leveraged in this model. Data Rules and Expressions are leveraged so feel free to visit those articles for more information (as well as Expressions Cookbook)

Follow the instructions in the screenshots below.

  1. We'll now focus on the "Convert Data" activity as it will be the cornerstone to how we leverage "Secondary Types" to normalize the data.
  2. The key thing to understand about this activity is that it will take data from the model of one Content Type and move it to the model of another, which is set by configuring the Source Type and the Target Type properties.
  3. The Save As property, set to Secondary, is critical because in converting the data we will want to save the converted data alongside the original by applying a Secondary Type. Multiple Content Types will apply to a Batch Folder (document), and therefore multiple sets of data.
  4. Considering all that, what now are the Actions we will take to perform the conversion? Click the ellipsis button for the Actions property to open the "Actions" dialog box.



  1. In the "Actions" list menu you can click the "Add" button to expand a drop-down menu from which you can select an "Action" to add.
  2. If you are familiar with Data Rules the "Actions" you can add from the list should seem familiar.
  3. In the material provided you can observe the list of "Actions" that have been added to the list. Feel free to see how the expressions are written on the "Actions" in this list.
  4. For each "Action" you will need to configure the Source Field property. Click the drop-down menu for this property and select a Data Element to pull data from.
  5. Click the ellipsis button for the Pattern property to open the "Pattern" dialog box where you can edit expressions, which will define the execution of the action. You may need to reference the Code Expressions or the Expressions Cookbook if you need information on writing expressions in Grooper.



You will typically need to actually run the "Convert Data" activity to apply its functionality, but for the purposes of the provided material we will not need to since it has already been run.

  1. If you would like to review the data, click the "Export Review" Batch Process Step within the "Secondary Types" Batch Process of the "02 Secondary Types" Project. Note, this step is set with a Content Type Filter to limit the data viewed to the "Export Model" Content Category.
  2. Click the "Activity Tester" tab.
  3. In the Batch Viewer be sure to select the provided "Extract vs Export Models" Batch and select the top level folder.
  4. Click the "Test" button to test the activity.



  • In the Data Viewer notice the returned data does not match the formatting layout of what is seen on the document, but the data is correct. This is a result of the "normalization" that was applied as a result of using the "Convert Data" activity.



  • A final note on the "Actions List". One of the Actions you can choose from the drop-down menu in the "Actions" dialog box is "Execute Rule", which allows you to reference a Data Rule for even more advanced capabilities.


Glossary

Activity: Grooper Activities define specific document processing operations done to a inventory_2 Batch, folder Batch Folder, or contract Batch Page. In a settings Batch Process, each edit_document Batch Process Step executes a single Activity (determined by the step's "Activity" property).

  • Batch Process Steps are frequently referred by the name of their configured Activity followed by the word "step". For example: "Classify step".

Batch Folder: The folder Batch Folder is an organizational unit within a inventory_2 Batch, allowing for a structured approach to managing and processing a collection of documents. Batch Folder nodes serve two purposes in a Batch. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other Batch Folders and/or contract Batch Page nodes as children.

  • Batch Folders are frequently referred to simply as "documents" or "folders" depending on how they are used in the Batch.

Batch Process Step: edit_document Batch Process Steps are specific actions within a settings Batch Process sequence. Each Batch Process Step performs an "Activity" specific to some document processing task. These Activities will either be a "Code Activity" or "Review" activities. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Process Steps are frequently referred to as simply "steps".
  • Because a single Batch Process Step executes a single Activity configuration, they are often referred to by their referenced Activity as well. For example, a "Recognize step".

Batch Process: settings Batch Process nodes are crucial components in Grooper's architecture. A Batch Process is the step-by-step processing instructions given to a inventory_2 Batch. Each step is comprised of a "Code Activity" or a Review activity. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Processes by themselves do nothing. Instead, they execute edit_document Batch Process Steps which are added as children nodes.
  • A Batch Process is often referred to as simply a "process".

Batch: inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

Classification: Classification is the process of identifying and organizing documents into categorical types based on their content or layout. Classification is key for efficient document management and data extraction workflows. Grooper has different methods for classifying documents. These include methods that use machine learning and text pattern recognition. In a Grooper Batch Process, the Classify Activity will assign a Content Type to a folder Batch Folder.

Classify: unknown_document Classify is an Activity that "classifies" folder Batch Folders in a inventory_2 Batch by assigning them a description Document Type.

  • Classification is key to Grooper's document processing. It affects how data is extracted from a document (during the Extract activity) and how Behaviors are applied.
  • Classification logic is controlled by a Content Model's "Classify Method". These methods include using text patterns, previously trained document examples, and Label Sets to identify documents.

Code Expressions: Code Expressions (not to be confused with regular expressions) are snippets of VB.NET code that expand Grooper's core functionality.

Content Category: collections_bookmark A Content Category is a container for other Content Category or description Document Type nodes in a stacks Content Model. Content Categories are often used simply as organizational buckets for Content Models with large numbers of Document Types. However, Content Categories are also necessary to create branches in a Content Model's classification taxonomy, allowing for more complex Data Element inheritance and Behavior inheritance.

Content Model: stacks Content Model nodes define a classification taxonomy for document sets in Grooper. This taxonomy is defined by the collections_bookmark Content Categories and description Document Types they contain. Content Models serve as the root of a Content Type hierarchy, which defines Data Element inheritance and Behavior inheritance. Content Models are crucial for organizing documents for data extraction and more.

Content Type Filter: The Content Type Filter property restricts Activities to specific collections_bookmark Content Categories and/or description Document Types.

Content Type: Content Types are a class of node types used used to classify folder Batch Folders. They represent categories of documents (stacks Content Models and collections_bookmark Content Categories) or distinct types of documents (description Document Types). Content Types serve an important role in defining Data Elements and Behaviors that apply to a document.

Data Element: Data Elements are a class of node types used to collect data from a document. These include: data_table Data Models, insert_page_break Data Sections, variables Data Fields, table Data Tables, and view_column Data Columns.

Data Model: data_table Data Models are leveraged during the Extract activity to collect data from documents (folder Batch Folders). Data Models are the root of a Data Element hierarchy. The Data Model and its child Data Elements define a schema for data present on a document. The Data Model's configuration (and its child Data Elements' configuration) define data extraction logic and settings for how data is reviewed in a Data Viewer.

Data Rule: flowsheet Data Rules are used to normalize or otherwise prepare data collected in a data_table Data Model for downstream processes. Data Rules define data manipulation logic for data extracted from documents (folder Batch Folders) to ensure data conforms to expected formats or meets certain standards.

  • Each Data Rule executes a "Data Action" which do things like computing a field's value, parse a field into other fields, perform lookups, and more.
  • Data Actions can be conditionally executed based on a Data Rule's "Trigger" expression.
  • A hierarchy of Data Rules can be created to execute multiple Data Actions and perform complex data transformation tasks.
  • Data Rules can be applied by:
    • The Apply Rules activity (must be done after data is collected by the Extract activity)
    • The Extract activity (will run after the Data Model extraction)
    • The Convert Data activity when converting document to another Document Type
    • They can be applied manually in a Data Viewer with the "Run Rule" command.

Document Type: description Document Type nodes represent a distinct type of document, such as an invoice or a contract. Document Types are created as child nodes of a stacks Content Model or a collections_bookmark Content Category. They serve three primary purposes:

  1. They are used to classify documents. Documents are considered "classified" when the folder Batch Folder is assigned a Content Type (most typically, a Document Type).
  2. The Document Type's data_table Data Model defines the Data Elements extracted by the Extract activity (including any Data Elements inherited from parent Content Types).
  3. The Document Type defines all "Behaviors" that apply (whether from the Document Type's Behavior settings or those inherited from a parent Content Type).

Execute: tv_options_edit_channels Execute is an Activity that runs one or more specified object commands. This gives access to a variety of Grooper commands in a settings Batch Process for which there is no Activity, such as the "Sort Children" command for Batch Folders or the "Expand Attachments" command for email attachments.

Export: output Export is an Activity that transfers documents and extracted information to external file systems and content management systems, completing the data processing workflow.

Expressions Cookbook: The "Expressions Cookbook" is a reference list for commonly used Code Expressions in Grooper.

Expressions: Expressions (not to be confused with regular expressions) are snippets of VB.NET code that expand Grooper's core functionality.

Extract: export_notes Extract is an Activity that retrieves information from folder Batch Folder documents, as defined by Data Elements in a data_table Data Model. This is how Grooper locates unstructured data on your documents and collects it in a structured, usable format.

Project: package_2 Projects are the primary containers for configuration nodes within Grooper. The Project is where various processing objects such as stacks Content Models, settings Batch Processes, profile objects are stored. This makes resources easier to manage, easier to save, and simplifies how node references are made in a Grooper Repository.

Review: person_search Review is an Activity that allows user attended review of Grooper's results. This allows human operators to validate processed contract Batch Page and folder Batch Folder content using specialized user interfaces called "Viewers". Different kinds of Viewers assist users in reviewing Grooper's image processing, document classification, data extraction and operating document scanners.

Scope: The Scope property of a edit_document Batch Process Step, as it relates to an Activity, determines at which level in a inventory_2 Batch hierarchy the Activity runs.

Secondary Types: Secondary Types allow the application of multiple Content Types to a single folder Batch Folder.

Separate: insert_page_break Separate is an Activity that sorts contract Batch Pages into individual folder Batch Folders. This distinguishes "loose pages" from the documents formed by those pages. Once loose pages are separated into Batch Folder documents, they can be further processed by unknown_document Classify, export_notes Extract, output Export and other Activities that need to run on the folder (i.e. document) level.

Separation: Separation is the process of taking an unorganized inventory_2 Batch of loose contract Batch Pages and organizing them into documents represented by folder Batch Folders in Grooper. This is done so Grooper can later assign a description Document Type to each document folder in a process known as "classification".

SharePoint: SharePoint is a connection option for cloud CMIS Connections. It Grooper to Microsoft SharePoint, providing access to content stored in "document libraries" and "picture libraries" for import and export operations.