2022:Project (Node Type)

From Grooper Wiki
WIP This article is a work-in-progress. It was written using a beta version of 2022. This article is subject to change and/or expansion as it is updated to the release version of 2022.

This tag will be removed upon draft completion.

A Project is the primary container in which document processing components are created, configured, and organized. It is a library of resources, such as Content Models, Batch Processes, OCR Profiles, Lexicons, and more, needed to process documents through Grooper.

About

After installing and setting up a Grooper Repository, creating a new Project is most likely the first thing you will do when starting work in Grooper Design Studio. A variety of different Grooper assets are required to process documents. A Content Model is required to classify documents and extract their data according to that classification. An OCR Profile is required to perform optical character recognition to get machine readable text from scanned pages. A Batch Process is required to define the step-by-step instructions to process documents from start to finish. A Project allows you to house these various resources related to a processing use case in one location.

Imagine you're processing vendor invoices. Pretty much anything and everything you need to process these documents can be organized into a Project.

  1. Here, we have a Project named "Invoices"
  2. This Project houses the Content Model configured for document classification and data extraction.
  3. It also holds the Batch Process used to process Batches.
  4. As well as other Grooper objects required for this use case.
    • "NTFS Connection" is a CMIS Connection utilized for exporting content. It is referenced by the "Invoices Model" Content Model's Export Behavior configuration which is executed when the "Invoices Process" Batch Process's Export activity is applied.
    • "Permanent IP" is an IP Profile referenced by the Image Processing step of the "Invoices Process" Batch Process.
    • "Scan Profile" is a Scanner Profile referenced by the Scan step of the "Invoices Process" Batch Process.

How you organize objects in your Project is largely up to you. However, in service of this task, be aware you can add any number of folder levels to your Project.

  1. For example, we've added an "OCR Resources" folder, which contains an OCR Profile and an IP Profile it references.
  2. In the "Separation Resources" folder, there is a Separation Profile and an extractor referenced in the profile's configuration.

What's With That Processes Folder?

If you're new to Grooper (or version 2022) you may be asking yourself, "What's with that "Processes" folder in the node tree?"

As mentioned before, one of the things a Project can (and should) house is a Batch Process. If a Project can hold a Batch Process what does the Processes folder hold?

  • Projects hold working Batch Processes.
  • The Processes folder holds published Batch Processes.

When adding and configuring a new Batch Process, you will always add it to a Project first. As you are editing it, you do not want it to be "live" or usable in a production-level environment as documents are coming into Grooper. This would cause partially or improperly processed documents to come through Grooper. So, while you are working on a Batch Process it is a working Batch Process.

Once that Batch Process is finished and ready to be implemented in a production-level environment, it is then published (using the "Publish" button in the Batch Process object's UI). This creates a read-only copy of the working Batch Process in the Processes folder. Production-level Batches only have access to Batch Processes in the Processes folder, ensuring they are processed using only published processing instructions, not working ones.

Adding a New Project

Add a Project

Projects are added to the Projects folder node in the node tree.

  1. To add a new Project first right-click the Projects folder.
  2. Select "Add", then "Project..."
  3. This will bring up a window to name your new Project.
    • In our scenario, we're starting a new Project to process human resources documents. So, we named it "Human Resources".
  4. After giving it a name, press the "OK" button to create the Project.

  1. This will add the Project to the Projects folder in the node tree.

Add Resources to the Project

The following Grooper objects can be added to a Project

  • Batch Processes
  • Content Models

Extractor objects

  • Value Readers
  • Data Types
  • Field Classes

Profile objects

  • OCR Profiles
  • IP Profiles
  • Separation Profiles
  • Scanner Profiles

Data integration objects

  • CMIS Connections
  • Data Connections

Other objects

  • Lexicons
  • Control Sheets
  • Object Libraries

So, how do you add them to a Project? Much like you would add an item to a node tree folder in Grooper.

  1. Right click the Project.
  2. Select "Add" then whichever object you want to add to the Project.
    • You can't do much without a Content Model in Grooper. So, we've selected "Content Model..."
  3. This will bring up a window to name the object.
  4. Press "OK" to add the object to the Project.

  1. Once added to the Project, you can select and configure the object as needed.

What About Batches?

One thing you cannot add to a Project are Batches. This includes Test as well as Production Batches.

  1. Batches are housed in the "Batches" node of the node tree.
  2. Test Batches can be added by expanding the Batches node and right clicking the "Test" node.

Test Batches can be accessed by any Grooper object with a Batch Selector in its UI.

  1. Here, we've added a Test Batch named "Sample Batch"
  2. A Value Reader, like this one named "Example" we have selected here, is just one of many objects with a Batch Selector panel.
  3. Using the Batch Selector's dropdown, you can select any Batch in the "Test" folder node.
  4. For example, our Batch named "Sample Batch".

Click here to return to the top

Referencing Objects in Other Projects

Projects are new to version 2022. If you're new to Grooper, this won't mean much to you. Just know Projects are a much better way of organizing and accessing Grooper assets in a node tree structure than in previous versions. (And, if you are upgrading to version 2022, please review the #Projects and Upgrading to 2022 section of this article)

Aside from organizational benefits, one of the big reasons for switching to a Projects based architecture was to maintain reference integrity woven throughout multiple objects in a repository.

Generally speaking, Projects are intended to "silo" the resources contained within. Objects within the Project can freely reference other objects within the same Project but cannot reference objects in other Projects (without being explicitly allowed to do so).

For example, in our "Invoices" Project, the "Invoices OCR" OCR Profile references the "Image Cleanup - OCR" IP Profile to perform temporary image processing prior to running OCR.

  1. The reference to "Image Cleanup - OCR" set using the OCR Profile's IP Profile property is allowed.
  2. Both objects are contained in the same Project.

However, imagine you're working in a different Project. Take our "Human Resources" Project. It makes perfect sense to have these two things separated into two Projects. They're two different use cases. They use different Content Models. They use different Batch Processes. There's good reason to keep the "invoice-y" things in one spot and the "human resource-y" things in another. There's no reason to clutter up our Project related to human resources documents with assets that only pertain to invoices.

But, particularly for Grooper users who use Grooper across a variety of use cases, you will run into situations where resources you build for one project can be utilized in another. In these cases, it would be beneficial to share resources so that you don't have to rebuild something you've already developed.

Let's say the "Image Cleanup - OCR" IP Profile would also work really well for our human resources (HR) documents. We've already done the work to get that IP Profile working well, and we don't want to duplicate our efforts by recreating it.

  1. In our "Human Resources" project we created earlier, we've added an OCR Profile.
  2. However, (at least initially) objects only have referenceable access to other objects in the "Human Resources" Project (which isn't much as this is basically a new Project).
    • So there is no "Image Cleanup - OCR" IP Profile to point to.
  3. The IP Profile lays out of scope, in a different Project.

So, if we want to use an object from an external Project, what can we do? Depending on the situation, there are basically three options:

  1. Directly copy the object from one Project to another.
  2. Reference the external Project to allow access to its resources.
  3. Create a shared resources Project that both Projects reference.

Depending on the situation, there will be strengths and weaknesses to each approach. Next, we will detail each option and discuss some of these associated drawbacks.

Option 1: Copying Objects from One Project to Another

This option is generally acceptable for only the most basic circumstances. As we will see, there are some significant drawbacks to this approach. However, for simpler Grooper environments and simple Grooper objects, simply copying the desired object from one Project to another can work out just fine.

Furthermore, sometimes this option is going to work for you, sometimes not, depending on the reference complexity of the object you're copying.

Option 2: Referencing a Project

Option 3: Creating and Referencing a Shared Resources Project

The Essentials Project

Projects and Upgrading to 2022