2022:Project (Node Type)

From Grooper Wiki
WIP This article is a work-in-progress. It was written using a beta version of 2022. This article is subject to change and/or expansion as it is updated to the release version of 2022.

This tag will be removed upon draft completion.

A Project is the primary container in which document processing components are created, configured, and organized. It is a library of resources, such as Content Models, Batch Processes, OCR Profiles, Lexicons, and more, needed to process documents through Grooper.

About

After installing and setting up a Grooper Repository, creating a new Project is most likely the first thing you will do when starting work in Grooper Design Studio. A variety of different Grooper assets are required to process documents. A Content Model is required to classify documents and extract their data according to that classification. An OCR Profile is required to perform optical character recognition to get machine readable text from scanned pages. A Batch Process is required to define the step-by-step instructions to process documents from start to finish. A Project allows you to house these various resources related to a processing use case in one location.

Imagine you're processing vendor invoices. Pretty much anything and everything you need to process these documents can be organized into a Project.

  1. Here, we have a Project named "Invoices"
  2. This Project houses the Content Model configured for document classification and data extraction.
  3. It also holds the Batch Process used to process Batches.
  4. As well as other Grooper objects required for this use case.
    • "NTFS Connection" is a CMIS Connection utilized for exporting content. It is referenced by the "Invoices Model" Content Model's Export Behavior configuration which is executed when the "Invoices Process" Batch Process's Export activity is applied.
    • "Permanent IP" is an IP Profile referenced by the Image Processing step of the "Invoices Process" Batch Process.
    • "Scan Profile" is a Scanner Profile referenced by the Scan step of the "Invoices Process" Batch Process.

How you organize objects in your Project is largely up to you. However, in service of this task, be aware you can add any number of folder levels to your Project.

  1. For example, we've added an "OCR Resources" folder, which contains an OCR Profile and an IP Profile it references.
  2. In the "Separation Resources" folder, there is a Separation Profile and an extractor referenced in the profile's configuration.

What's With That Processes Folder?

If you're new to Grooper (or version 2022) you may be asking yourself, "What's with that "Processes" folder in the node tree?"

As mentioned before, one of the things a Project can (and should) house is a Batch Process. If a Project can hold a Batch Process what does the Processes folder hold?

  • Projects hold working Batch Processes.
  • The Processes folder holds published Batch Processes.

When adding and configuring a new Batch Process, you will always add it to a Project first. As you are editing it, you do not want it to be "live" or usable in a production-level environment as documents are coming into Grooper. This would cause partially or improperly processed documents to come through Grooper. So, while you are working on a Batch Process it is a working Batch Process.

Once that Batch Process is finished and ready to be implemented in a production-level environment, it is then published (using the "Publish" button in the Batch Process object's UI). This creates a read-only copy of the working Batch Process in the Processes folder. Production-level Batches only have access to Batch Processes in the Processes folder, ensuring they are processed using only published processing instructions, not working ones.

Adding a New Project

Add a Project

Projects are added to the Projects folder node in the node tree.

  1. To add a new Project first right-click the Projects folder.
  2. Select "Add", then "Project..."
  3. This will bring up a window to name your new Project.
    • In our scenario, we're starting a new Project to process human resources documents. So, we named it "Human Resources".
  4. After giving it a name, press the "OK" button to create the Project.

  1. This will add the Project to the Projects folder in the node tree.

Add Resources to the Project

The following Grooper objects can be added to a Project

  • Batch Processes
  • Content Models

Extractor objects

  • Value Readers
  • Data Types
  • Field Classes

Profile objects

  • OCR Profiles
  • IP Profiles
  • Separation Profiles
  • Scanner Profiles

Data integration objects

  • CMIS Connections
  • Data Connections

Other objects

  • Lexicons
  • Control Sheets
  • Object Libraries

So, how do you add them to a Project? Much like you would add an item to a node tree folder in Grooper.

  1. Right click the Project.
  2. Select "Add" then whichever object you want to add to the Project.
    • You can't do much without a Content Model in Grooper. So, we've selected "Content Model..."
  3. This will bring up a window to name the object.
  4. Press "OK" to add the object to the Project.

  1. Once added to the Project, you can select and configure the object as needed.

What About Batches?

One thing you cannot add to a Project are Batches. This includes Test as well as Production Batches.

  1. Batches are housed in the "Batches" node of the node tree.
  2. Test Batches can be added by expanding the Batches node and right clicking the "Test" node.

Test Batches can be accessed by any Grooper object with a Batch Selector in its UI.

  1. Here, we've added a Test Batch named "Sample Batch"
  2. A Value Reader, like this one named "Example" we have selected here, is just one of many objects with a Batch Selector panel.
  3. Using the Batch Selector's dropdown, you can select any Batch in the "Test" folder node.
  4. For example, our Batch named "Sample Batch".

Click here to return to the top

Referencing Objects in Other Projects

Projects are new to version 2022. If you're new to Grooper, this won't mean much to you. Just know Projects are a much better way of organizing and accessing Grooper assets in a node tree structure than in previous versions. (And, if you are upgrading to version 2022, please review the #Projects and Upgrading to 2022 section of this article)

Aside from organizational benefits, one of the big reasons for switching to a Projects based architecture was to maintain reference integrity woven throughout multiple objects in a repository.

Generally speaking, Projects are intended to "silo" the resources contained within. Objects within the Project can freely reference other objects within the same Project but cannot reference objects in other Projects (without being explicitly allowed to do so).

For example, in our "Invoices" Project, the "Invoices OCR" OCR Profile references the "Image Cleanup - OCR" IP Profile to perform temporary image processing prior to running OCR.

  1. The reference to "Image Cleanup - OCR" set using the OCR Profile's IP Profile property is allowed.
  2. Both objects are contained in the same Project.

However, imagine you're working in a different Project. Take our "Human Resources" Project. It makes perfect sense to have these two things separated into two Projects. They're two different use cases. They use different Content Models. They use different Batch Processes. There's good reason to keep the "invoice-y" things in one spot and the "human resource-y" things in another. There's no reason to clutter up our Project related to human resources documents with assets that only pertain to invoices.

But, particularly for Grooper users who use Grooper across a variety of use cases, you will run into situations where resources you build for one project can be utilized in another. In these cases, it would be beneficial to share resources so that you don't have to rebuild something you've already developed.

Let's say the "Image Cleanup - OCR" IP Profile would also work really well for our human resources (HR) documents. We've already done the work to get that IP Profile working well, and we don't want to duplicate our efforts by recreating it.

  1. In our "Human Resources" project we created earlier, we've added an OCR Profile.
  2. However, (at least initially) objects only have referenceable access to other objects in the "Human Resources" Project (which isn't much as this is basically a new Project).
    • So there is no "Image Cleanup - OCR" IP Profile to point to.
  3. The IP Profile lays out of scope, in a different Project.

So, if we want to use an object from an external Project, what can we do? Depending on the situation, there are basically three options:

  1. Directly copy the object from one Project to another.
  2. Reference the external Project to allow access to its resources.
  3. Create a shared resources Project that both Projects reference.

Depending on the situation, there will be strengths and weaknesses to each approach. Next, we will detail each option and discuss some of these associated drawbacks.

Option 1: Copying Objects from One Project to Another

This option is generally acceptable for only the most basic circumstances. As we will see, there are some significant drawbacks to this approach. However, for simpler Grooper environments and simple Grooper objects, simply copying the desired object from one Project to another can work out just fine.

Furthermore, sometimes this option is going to work for you, sometimes not, depending on the reference complexity of the object you're copying.

Let's go back to our previous example. Long story short, we want to use an IP Profile from the "Invoices" Project in the "Human Resources" Project. There's nothing preventing us from doing this, in this case.

  1. We can copy the IP Profile in the "Invoices" Project.
    • Either by right-clicking the object and selecting "Copy" or selecting the object and pressing Ctrl + C
  2. And we can paste it into the "Human Resources" Project.
    • Either by right-clicking the Project and selecting "Paste" or selecting the Project and pressing Ctrl + V
  3. A copy of the IP Profile is now placed in the Project.
  4. This means all objects within the Project can reference it. For example, the "HR OCR" OCR Profile can now reference it for temporary, pre-OCR image cleanup using the IP Profile property.

Copying and pasting is a quick and easy solution for getting simple objects from one Project to another. We all know how to copy and paste. This isn't a groundbreaking concept. However, as with many simple things, it's not without its drawbacks.

First, be aware these are now two separate objects. One lives in one Project. The other lives in another Project. They are distinct resources.

Any changes made to the original object will not be reflected in the copied object (or vice versa).

  1. For example, here we've added a Shape Removal IP Step to the original IP Profile.
  2. Notice the copied IP Profile is unchanged. It just has the original two IP Steps from when it was copied to the Project.

This is one of the drawbacks to this approach. If you want to make changes to one object, you'll need to make the same changes to the other (assuming you want both objects to reflect the changes).

Furthermore, there are situations where Grooper will not let you copy objects from one Project and paste them into another. This is a very intentional part of the Project object's design, done to preserve reference integrity.

Grooper allowed us to copy and paste the IP Profile because it did not reference any other object in its original Project. If it did, its functionality would be dependent on that referenced object in the first Project being present in the second Project.

Let's look at another example. In our "Invoice" Project's Content Model, we've built some extractor assets, including an address extractor. Let's say we want to bring that extractor into our "Human Resources" Project's Content Model.

  1. So, we want to copy this Data Type from the "Invoices" Project.
  2. To this Local Resources folder in our "Human Resources" Project.

If we try to do this, Grooper is going to throw an error. Why? The Data Type, as part of its configuration, references several Lexicon objects.

  1. The error lets us know there is a reference violation.
  2. It tells in which Project the referenced objects are contained.

It also gives us the full node tree location within the Project of both the object doing the referencing (either the object you copied or one of its children) and the referenced object, using the following format:

referencing object's location -> referenced object's location

Think of Projects like a friend's house. If your friend invites you over, he or she isn't surprised when you show up. But if you show up with a bunch of friends unannounced, they're going to take issue with you. There's now a bunch of random strangers in their house they didn't expect.

That's just like copying and pasting objects with references. Bringing in an object by itself is no big deal, but bringing along who knows how many objects it references is a big deal (Even more so considering any objects the referenced objects reference, and the objects the referenced objects' referenced objects reference and so on down the line). There's now a bunch of random objects you didn't expect cluttering up your Project.

This puts the onus on you, the user, to decide how you want to resolve these references. Again, there are strengths and drawbacks to each approach. It's up to you to decide what works best for your situation.

Option 2: Referencing a Project

Option 3: Creating and Referencing a Shared Resources Project

The Essentials Project

Projects and Upgrading to 2022