2022:Project (Node Type)

From Grooper Wiki
WIP This article is a work-in-progress. It was written using a beta version of 2022. This article is subject to change and/or expansion as it is updated to the release version of 2022.

This tag will be removed upon draft completion.

A Project is the primary container in which document processing components are created, configured, and organized. It is a library of resources, such as Content Models, Batch Processes, OCR Profiles, Lexicons, and more, needed to process documents through Grooper.

About

After installing and setting up a Grooper Repository, creating a new Project is most likely the first thing you will do when starting work in Grooper Design Studio. A variety of different Grooper assets are required to process documents. A Content Model is required to classify documents and extract their data according to that classification. An OCR Profile is required to perform optical character recognition to get machine readable text from scanned pages. A Batch Process is required to define the step-by-step instructions to process documents from start to finish. A Project allows you to house these various resources related to a processing use case in one location.

Imagine you're processing vendor invoices. Pretty much anything and everything you need to process these documents can be organized into a Project.

  1. Here, we have a Project named "Invoices"
  2. This Project houses the Content Model configured for document classification and data extraction.
  3. It also holds the Batch Process used to process Batches.
  4. As well as other Grooper objects required for this use case.
    • "NTFS Connection" is a CMIS Connection utilized for exporting content. It is referenced by the "Invoices Model" Content Model's Export Behavior configuration which is executed when the "Invoices Process" Batch Process's Export activity is applied.
    • "Permanent IP" is an IP Profile referenced by the Image Processing step of the "Invoices Process" Batch Process.
    • "Scan Profile" is a Scanner Profile referenced by the Scan step of the "Invoices Process" Batch Process.

How you organize objects in your Project is largely up to you. However, in service of this task, be aware you can add any number of folder levels to your Project.

  1. For example, we've added an "OCR Resources" folder, which contains an OCR Profile and an IP Profile it references.
  2. In the "Separation Resources" folder, there is a Separation Profile and an extractor referenced in the profile's configuration.

What's With That Processes Folder?

If you're new to Grooper (or version 2022) you may be asking yourself, "What's with that "Processes" folder in the node tree?"

As mentioned before, one of the things a Project can (and should) house is a Batch Process. If a Project can hold a Batch Process what does the Processes folder hold?

  • Projects hold working Batch Processes.
  • The Processes folder holds published Batch Processes.

When adding and configuring a new Batch Process, you will always add it to a Project first. As you are editing it, you do not want it to be "live" or usable in a production-level environment as documents are coming into Grooper. This would cause partially or improperly processed documents to come through Grooper. So, while you are working on a Batch Process it is a working Batch Process.

Once that Batch Process is finished and ready to be implemented in a production-level environment, it is then published (using the "Publish" button in the Batch Process object's UI). This creates a read-only copy of the working Batch Process in the Processes folder. Production-level Batches only have access to Batch Processes in the Processes folder, ensuring they are processed using only published processing instructions, not working ones.

Adding a New Project

Add a Project

Projects are added to the Projects folder node in the node tree.

  1. To add a new Project first right-click the Projects folder.
  2. Select "Add", then "Project..."
  3. This will bring up a window to name your new Project.
    • In our scenario, we're starting a new Project to process human resources documents. So, we named it "Human Resources".
  4. After giving it a name, press the "OK" button to create the Project.

  1. This will add the Project to the Projects folder in the node tree.

Add Resources to the Project

The following Grooper objects can be added to a Project

  • Batch Processes
  • Content Models

Extractor objects

  • Value Readers
  • Data Types
  • Field Classes

Profile objects

  • OCR Profiles
  • IP Profiles
  • Separation Profiles
  • Scanner Profiles

Data integration objects

  • CMIS Connections
  • Data Connections

Other objects

  • Lexicons
  • Control Sheets
  • Object Libraries

So, how do you add them to a Project? Much like you would add an item to a node tree folder in Grooper.

  1. Right click the Project.
  2. Select "Add" then whichever object you want to add to the Project.
    • You can't do much without a Content Model in Grooper. So, we've selected "Content Model..."
  3. This will bring up a window to name the object.
  4. Press "OK" to add the object to the Project.

  1. Once added to the Project, you can select and configure the object as needed.

What About Batches?

One thing you cannot add to a Project are Batches. This includes Test as well as Production Batches.

  1. Batches are housed in the "Batches" node of the node tree.
  2. Test Batches can be added by expanding the Batches node and right clicking the "Test" node.

Test Batches can be accessed by any Grooper object with a Batch Selector in its UI.

  1. Here, we've added a Test Batch named "Sample Batch"
  2. A Value Reader, like this one named "Example" we have selected here, is just one of many objects with a Batch Selector panel.
  3. Using the Batch Selector's dropdown, you can select any Batch in the "Test" folder node.
  4. For example, our Batch named "Sample Batch".

Click here to return to the top

Referencing Objects in Other Projects

Projects are new to version 2022. If you're new to Grooper, this won't mean much to you. Just know Projects are a much better way of organizing and accessing Grooper assets in a node tree structure than in previous versions. (And, if you are upgrading to version 2022, please review the #Projects and Upgrading to 2022 section of this article)

Aside from organizational benefits, one of the big reasons for switching to a Projects based architecture was to maintain reference integrity woven throughout multiple objects in a repository.

Generally speaking, Projects are intended to "silo" the resources contained within. Objects within the Project can freely reference other objects within the same Project but cannot reference objects in other Projects (without being explicitly allowed to do so).

For example, in our "Invoices" Project, the "Invoices OCR" OCR Profile references the "Image Cleanup - OCR" IP Profile to perform temporary image processing prior to running OCR.

  1. The reference to "Image Cleanup - OCR" set using the OCR Profile's IP Profile property is allowed.
  2. Both objects are contained in the same Project.

However, imagine you're working in a different Project. Take our "Human Resources" Project. It makes perfect sense to have these two things separated into two Projects. They're two different use cases. They use different Content Models. They use different Batch Processes. There's good reason to keep the "invoice-y" things in one spot and the "human resource-y" things in another. There's no reason to clutter up our Project related to human resources documents with assets that only pertain to invoices.

But, particularly for Grooper users who use Grooper across a variety of use cases, you will run into situations where resources you build for one project can be utilized in another. In these cases, it would be beneficial to share resources so that you don't have to rebuild something you've already developed.

Let's say the "Image Cleanup - OCR" IP Profile would also work really well for our human resources (HR) documents. We've already done the work to get that IP Profile working well, and we don't want to duplicate our efforts by recreating it.

  1. In our "Human Resources" project we created earlier, we've added an OCR Profile.
  2. However, (at least initially) objects only have referenceable access to other objects in the "Human Resources" Project (which isn't much as this is basically a new Project).
    • So there is no "Image Cleanup - OCR" IP Profile to point to.
  3. The IP Profile lays out of scope, in a different Project.

So, if we want to use an object from an external Project, what can we do? Depending on the situation, there are basically three options:

  1. Directly copy the object from one Project to another.
  2. Reference the external Project to allow access to its resources.
  3. Create a shared resources Project that both Projects reference.

Depending on the situation, there will be strengths and weaknesses to each approach. Next, we will detail each option and discuss some of these associated drawbacks.

Option 1: Copying Objects from One Project to Another

This option is generally acceptable for only the most basic circumstances. As we will see, there are some significant drawbacks to this approach. However, for simpler Grooper environments and simple Grooper objects, simply copying the desired object from one Project to another can work out just fine.

Furthermore, sometimes this option is going to work for you, sometimes not, depending on the reference complexity of the object you're copying.

FYI While the following guidance deals specifically with "copying and pasting", the same follows for "cutting and pasting" or "moving" objects from one Project to another.


Let's go back to our previous example. Long story short, we want to use an IP Profile from the "Invoices" Project in the "Human Resources" Project. There's nothing preventing us from doing this, in this case.

  1. We can copy the IP Profile in the "Invoices" Project.
    • Either by right-clicking the object and selecting "Copy" or selecting the object and pressing Ctrl + C
  2. And we can paste it into the "Human Resources" Project.
    • Either by right-clicking the Project and selecting "Paste" or selecting the Project and pressing Ctrl + V
  3. A copy of the IP Profile is now placed in the Project.
  4. This means all objects within the Project can reference it. For example, the "HR OCR" OCR Profile can now reference it for temporary, pre-OCR image cleanup using the IP Profile property.

Copying and pasting is a quick and easy solution for getting simple objects from one Project to another. We all know how to copy and paste. This isn't a groundbreaking concept. However, as with many simple things, it's not without its drawbacks.

First, be aware these are now two separate objects. One lives in one Project. The other lives in another Project. They are distinct resources.

Any changes made to the original object will not be reflected in the copied object (or vice versa).

  1. For example, here we've added a Shape Removal IP Step to the original IP Profile.
  2. Notice the copied IP Profile is unchanged. It just has the original two IP Steps from when it was copied to the Project.

This is one of the drawbacks to this approach. If you want to make changes to one object, you'll need to make the same changes to the other (assuming you want both objects to reflect the changes).

Furthermore, there are situations where Grooper will not let you copy objects from one Project and paste them into another. This is a very intentional part of the Project object's design, done to preserve reference integrity.

Grooper allowed us to copy and paste the IP Profile because it did not reference any other object in its original Project. If it did, its functionality would be dependent on that referenced object in the first Project being present in the second Project.

Let's look at another example. In our "Invoice" Project's Content Model, we've built some extractor assets, including an address extractor. Let's say we want to bring that extractor into our "Human Resources" Project's Content Model.

  1. So, we want to copy this Data Type from the "Invoices" Project.
  2. To this Local Resources folder in our "Human Resources" Project.

If we try to do this, Grooper is going to throw an error. Why? The Data Type, as part of its configuration, references several Lexicon objects.

  1. The error lets us know there is a reference violation.
  2. It tells in which Project the referenced objects are contained.

It also gives us the full node tree location within the Project of both the object doing the referencing (either the object you copied or one of its children) and the referenced object, using the following format:

referencing object's location -> referenced object's location

Think of Projects like a friend's house. If your friend invites you over, he or she isn't surprised when you show up. But if you show up with a bunch of friends unannounced, they're going to take issue with you. There's now a bunch of random strangers in their house they didn't expect.

That's just like copying and pasting objects with references. Bringing in an object by itself is no big deal, but bringing along who knows how many objects it references is a big deal (Even more so considering any objects the referenced objects reference, and the objects the referenced objects' referenced objects reference and so on down the line). There's now a bunch of random objects you didn't expect cluttering up your Project.

This puts the onus on you, the user, to decide how you want to resolve these references. Again, there are strengths and drawbacks to each approach. It's up to you to decide what works best for your situation.

One thing you could do is copy all the needed referenced objects over to the second Project. Depending on the number of references you're dealing with, this could be a time consuming process, as it would involve the following steps:

  1. Copy and paste all the referenced objects from the first Project to the second.
  2. Unassign all the references in the object to be copied from the first Project
  3. Paste the object from the first Project to the second.
  4. Reassign all the references in the copied object to all the referenced objects pasted in step 1.

Another option is to use Project references. This gives a Project referenceable access to all resources within one Project to another.

Option 2: Referencing a Project

Resources can be shared between two (or more) Projects by referencing the full Project. This gives explicit access to all objects within a Project, just as if they were created locally.

Let's go back to our problem copying an address extractor that references multiple Lexicons from one Project to another.

  1. We want a copy of this Data Type from the "Invoices" Project...
  2. ...in this Local Resources folder in the "Human Resources" Project.

As we saw previously, Grooper will not allow us to do this (yet).

All we need to do in order to make this happen, is effectively tell Grooper it's ok for the "Human Resources" Project to share assets with the "Invoices" Project. We do this by referencing the whole Project.

  1. To allow access to another Project's resources, first select the Project requesting access in the node tree.
    • The "Human Resources" Project wants access to the address extractor in the "Invoice" Project. So we've selected "Human Resources".
  2. Select the Referenced Projects property and expand its dropdown menu.
  3. Choose which Project whose resources you want to access by checking the box next to its name.
    • In this case, we've selected the "Invoices" Project.
    • FYI: You can reference multiple Projects by checking the box next to multiple Project's names.

  1. You'll see the referenced Project listed in the property grid.
  2. Be sure and save when finished.

Now we can copy and paste all day long.

  1. We no longer get that error message if we copy the address extractor from the "Invoices" Project and paste it somewhere in the "Human Resources" Project.
  2. Because the Project is shared, it has a path to navigate to the Lexicons referenced by the extractor.

You may also make direct references to any object in a referenced Project.

For example, because we've referenced the "Invoices" Project we could have simply referenced the address extractor without copying and pasting it.

  1. Here, we've added a Value Reader named "Address Ex 2" to illustrate this example.
  2. We've set its Extractor Type to Reference to demonstrate the reference.
    • FYI: The Reference Extractor Type simply returns the results of a referenced extractor.
  3. Using its Extractor property to select a reference, you can see we now have access to the "Invoices" Project.
  4. This means we can reference any and all objects contained within, including this address extractor.

This is an effective way of sharing resources between multiple Projects without duplicating your efforts by creating multiple copies of shared resources that you have to manage independently in each Project.

The only downside to this approach lies in how many different Projects utilize a set of shared resources. If it boils down to a limited number of resources, or resources shared between very similar Projects (in terms of their use case), this approach can work out just fine. But when you get into more and more resources shared between more and more Projects the crisscrossed references between them can be difficult to navigate when you're trying to track down a single object used across a variety of Projects.

In those cases, you may want to do a little extra work and create an entirely separate Project just devoted to housing resources shared between multiple Projects. We will discuss creating and utilizing a "Shared Resources" Project in the next tab.

Please read the following before continuing. It contains best practice advice to avoid potential system corruption when dealing with Project referencing.

Just as you can make references to other Projects, you can remove those references as well. However, to prevent future corruption down the line, you should always ensure no object in your Project references objects in the other Project before removing its reference.

The easiest way to do this is with the "Analyze References" button at the top of Project's UI screen.

  1. Select the Project whose references you want to analyze.
  2. Press the "Analyze References" button.
  3. This will bring up a list of outbound references.
    • These are references objects in the selected Project make out to external Projects listed in the Referenced Projects.
  4. Any referenced object will be listed.
    • In our case, we made references to various Lexicons as well as the "VAL - Address" Data Type in the "Invoices" Project.

These outbound references indicate there are resources in this Project that are dependent on resources in the "Invoices" Project to function.

CAUTION!!!!

While it is technically possible to remove the reference to a Project without resolving these references, YOU SHOULD NOT DO SO. It is best practice to either:

  1. Keep the reference to the Project intact.
  2. Or, manually unassign the references to each object.

Please ensure there are no outbound references to the Project before removing the reference.

Option 3: Creating and Referencing a Shared Resources Project

The last option is to use an entirely separate Project which is solely devoted to housing objects used and referenced by multiple Projects. This option is most appropriate for larger environments, processing different kinds of documents from different use cases. Given a big enough body of documents, despite the fact they may come from different industries or use cases, you will find commonly used resources that are generalizable across a variety of documents. This can include generic or semi-generic extractors, Lexicons, even IP Profiles and OCR Profiles.

In these cases, it often makes sense to create a "bucket" of resources from which all Projects can draw from. The idea is to create shared resources in a single Project referenced by multiple others. Or, in our case, we're going to move these assets to a "Shared Resources" Project.

For instance, there are some fairly generic extractors in the "Invoices" Project we may want accessible to the "Human Resources" Project and future Projects as well.

  1. First we're going to move this generic text segment extractor.
    • This one is going to be the easier of the two.
  2. We'll also end up moving this address extractor, but that will take some extra work.
    • The downside to this approach is there is typically some work up front you'll need to engage in to organize your resources in order to get the benefit down the road.

We are going to move these extractors to a new Project, which we will name "Shared Resources".

  1. Here, we've added the new Project.
  2. Since we want to move objects from the "Invoices" Project, we've also made a reference to that Project, using the Referenced Projects property.

For the first extractor, this job is very easy.

  1. We can simply cut this "VAL - Generic Segment" Value Reader from the "Invoices" Project, and we'll paste it into the "Shared Resources" Project.
    • Or, simply move it by dragging and dropping it.

  1. The Value Reader moves to the "Shared Resources" Project with no issue.
    • Why? Noting else in the "Invoices" Project referenced it!
  2. We won't be so lucky with the "VAL - Generic Decimal" Value Reader.
  3. If we attempt to move this object, we will get a series of reference violation errors. There are several objects in the "Invoices" Project using (i.e. referencing) this extractor.

Here's where we get into the extra work on the front end.

What we can do first, is copy the Value Reader. It makes no references to other objects. The issue here is that other objects are referencing it.

  1. So, we can copy it.
  2. And we can paste that copy into the "Shared Resources" Project.

Now, if you truly want to use this as a "shared" or "global" resource, you can reassign all the references to the "VAL - Generic Decimal" extractor within the "Invoices" Project.

Ultimately, we will need the "Invoices" Project to reference the "Shared Resources" Project to reassign the references.

  1. First, to avoid a circular reference, we will need to unassign the "Shared Resources" Project's reference to the the "Invoices" Project.
  2. Before removing a Project reference, it is always best practice to analyze any outbound references to the external Project, using the "Analyze References" button.
  3. This will bring up the following diagnostic.
    • No outbound references are detected (meaning there is no object in the "Shared Resources" Project referencing out to objects in the "Invoices" Project). This is what we want to see. If there were outbound references, we would want to resolve them before removing the reference to the external Project.
  4. Press "OK" to continue.
You should always use the "Analyze Reference" button before removing a reference to a Project.

Grooper will technically allow you to remove a reference to a Project even with outbound or inbound refences outstanding. However, doing so is not best practice as it can cause corruption of your system down the road.

  1. With no references detected from the "Invoices" Project, we can remove the Project reference without issue.
  2. Be sure to Save the project when finished.

  1. Next, we need to get rid of the local extractor in the "Invoices" Project and replace it with the copy we placed in the "Shared Resources" Project.
  2. In order to access the extractor in the "Shared Resources" Project, the "Invoices" Project must reference the "Shared Resources" Project.
    • Here, we have selected the "Invoices" Project.
  3. Using the Referenced Projects property, we have selected the "Shared Resources" Project.

  1. Now, we can go about the business of reassigning any reference to our local extractor to the one in our "Shared Resources" Project.

The quickest way to figure out every object that references a selected object in the node tree, is to use the "References" tab.

  1. To access this tab, (after selecting the object whose references you want to verify) select the "Advanced" tab.
  2. Then, select the "References" tab.
  3. This will list every object that references the object.
    • In our case, there's one Data Type extractor ("VE - Invoice Total") and three Data Column objects ("Quantity" "Price" and "Extended Price") referencing the selected extractor ("VAL - Generic Decimal")

What we could do from here is track down each of these objects, find where in their property grid the extractor is referenced, and reassign that reference to the version in the "Shared Resources" Project. That is a perfectly acceptable, although somewhat time consuming way to reassign references. Luckily, we have a shortcut available to us.

The "Reassign References..." button will allow us to change the reference for each object in the list from the selected object, to a different one.

This is exactly what we want to do. We want to change the reference set on these Data Columns and Data Type from the "VAL - Generic Decimal" extractor in the "Invoices" Project to the copy we made in the "Shared Resources" Project.

  1. Press the "Reassign References..." button.
  2. This will bring up a window to select a new object for the reference.
  3. Check it out. Here's our referenced "Shared Resources" Project.
  4. Selecting the "VAL - Generic Decimal" Value Reader, we will reassign the reference to this extractor in our "Shared Resources" Project.
  5. Press "OK" to finish reassigning the references.

  1. Because the extractor is no longer referenced by any other object, the "Referenced By" list is now empty. All the objects that were listed here, are now referencing the extractor we chose in our "Shared Resources" Project.
    • In other words, we reassigned the references.
    • We've effectively replaced the local Project's decimal extractor with one in an external Project, accessible to any other Project that references it.
  2. Since no other object references the local decimal extractor, and we've replaced its references with something else, it is now safe to delete it.

As we've demonstrated, it's a little extra work if you decide you want to move resources from one Project to a shared resources Project. However, the benefit to organizing assets like this is any Project referencing our "Shared Resources" Project now have access to its assets.

  1. For example, we could tell our "Human Resources" Project to reference the "Shared Resources" Project.
  2. Now, both the "Human Resources" Project and the "Invoices" Project have access to its resources.
    • Furthermore, any changes we make to the object in the "Shared Resources" Project will be reflected by any object in any Project that touches it. This can prevent duplication of efforts when updating an object's properties.
    • If any other Projects or any future Projects can make use of these resources, all you have to do is assign it a reference to the "Shared Resources" Project. It acts as one big community bucket of resources other Projects can draw from.

The Essentials Project

Projects and Upgrading to 2022