2023:Project (Object)

From Grooper Wiki

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

202520232022

package_2 Project node objects are the primary containers for configuration nodes within Grooper. The Project is where various processing objects such as stacks Content Models, settings Batch Processes, profile objects, and more are organized and managed. It allows for the encapsulation and modularization of these resources for easier management and reusability.

You may download the ZIP below and upload it into your own Grooper environment (version 2023). This ZIP contains several Projects which we use as examples throughout this article.

About

A Project is a container for Grooper nodes in the node tree. It allows users to organize Grooper resources (such as Batch Processes, Content Models and other Grooper objects) in some logical manner. Access to nodes inside a Project can be shared with other Projects (discussed later) or can be restricted, only allowing nodes within the Project to reference (or use) them.

You can use Projects to organize nodes:

  • By use case - Use Projects to distinguish between different "kinds of work". Projects help keep processing assets organized and separate from each other. For example, the Content Model, Batch Process, OCR Profile and all other Grooper objects used to implement and invoice document processing solution could be placed in a single Project called "Invoice Processing".
  • By resource type - Use Projects to hold commonly used resources. Commonly, CMIS Connections and Data Connections are used by multiple Grooper resources. A Project holding these kids of "connection resources" makes them easier to find.
  • For modular use - Use Projects to hold a set of resources useful for one logical end. You may develop a series of resources that are useful in your industry (but not a specific use case). These resources can be shared across multiple Projects without needing to duplicate them by putting them in a Project.

Imagine you're processing vendor invoices. Pretty much anything and everything you need to process these documents can be organized into a Project.

  1. Here, we have a Project named "Invoices"
  2. This Project houses the Content Model configured for document classification and data extraction.
  3. It also holds the Batch Process used to process Batches.
  4. As well as other Grooper objects required for this use case.
    • "NTFS Connection" is a CMIS Connection utilized for exporting content. It is referenced by the "Invoices Model" Content Model's Export Behavior configuration which is executed when the "Invoices Process" Batch Process's Export activity is applied.
    • "Permanent IP" is an IP Profile referenced by the Image Processing step of the "Invoices Process" Batch Process.
    • "Scan Profile" is a Scanner Profile referenced by the Scan step of the "Invoices Process" Batch Process.

How you organize objects in your Project is largely up to you. However, in service of this task, be aware you can add any number of folder levels to your Project.

  1. For example, we've added an "OCR Resources" folder, which contains an OCR Profile and an IP Profile it references.
  2. In the "Separation Resources" folder, there is a Separation Profile and an extractor referenced in the profile's configuration.

Adding a New Project

Projects are added to the Projects folder node in the node tree.

  1. To add a new Project first right-click the Projects folder (or one of its subfolders).
  2. Select "Add", then "Project..."

  1. This will bring up a window to name your new Project.
    • In our scenario, we're starting a new Project to process human resources documents. So, we named it "Human Resources".
  2. After giving it a name, press the "Execute" button to create the Project.

  1. This will add the Project to the Projects folder in the node tree.
    • Note that, since we already had a Human Resources Project created, we've created a Dummy Project titled Human Resources (1).

Add Resources to the Project

The following Grooper objects can be added to a Project

  • Batch Processes
  • Content Models

Extractor objects

  • Value Readers
  • Data Types
  • Field Classes

Profile objects

  • OCR Profiles
  • IP Profiles
  • Separation Profiles
  • Scanner Profiles

Data integration objects

  • CMIS Connections
  • Data Connections

Other objects

  • Lexicons
  • Control Sheets
  • Object Libraries

So, how do you add them to a Project? Much like you would add an item to a node tree folder in Grooper.

  1. Right click the Project.
  2. Select "Add" then whichever object you want to add to the Project.
    • You can't do much without a Content Model in Grooper. So, we've selected "Content Model..."

  1. This will bring up a window to name the object.
  2. Press "OK" to add the object to the Project.

  1. Once added to the Project, you can select and configure the object as needed.

What About Batches?

One thing you cannot add to a Project are Batches. This includes Test as well as Production Batches.

  1. Batches are housed in the "Batches" node of the node tree.
  2. Test Batches can be added by expanding the Batches node and right clicking the "Test" node.

Test Batches can be accessed by any Grooper object with a Batch Selector in its UI.

  1. Here, we've added a Test Batch named "Sample Batch"
  2. A Value Reader, like this one named "Example" we have selected here, is just one of many objects with a Batch Selector panel.
  3. To view the Selector dropdown, click the "Tester" tab.
  4. Then, click the icon in the upper-right corner of the Test Batch Section.

This will bring up the Batch Selector window. To select your desired Batch, expand the Test folder and select a Batch.

  1. For example, our Batch named "Sample Batch".

Click here to return to the top

Referencing Objects in Other Projects

Projects were introduced in Grooper version 2022. If you're new to Grooper, this won't mean much to you. Just know Projects are a much better way of organizing and accessing Grooper assets in a node tree structure than in previous versions. (And, if you are upgrading to version 2023, please review the #Projects and Upgrading to 2023 section of this article)

Aside from organizational benefits, one of the big reasons for switching to a Projects based architecture was to maintain reference integrity woven throughout multiple objects in a repository.

Generally speaking, Projects are intended to "silo" the resources contained within. Objects within the Project can freely reference other objects within the same Project but cannot reference objects in other Projects (without being explicitly allowed to do so).

For example, in our "Invoices" Project, the "Invoices OCR" OCR Profile references the "Image Cleanup - OCR" IP Profile to perform temporary image processing prior to running OCR.

  1. The reference to "Image Cleanup - OCR" set using the OCR Profile's IP Profile property is allowed.
  2. Both objects are contained in the same Project.

Generally speaking, maintaining reference integrity is ideal. The more narrowly you can define an object's allowable scope of reference, the better. This makes it easier to track down references, limits the number of object dependencies, making your system easier to manage, and limits possible system corruption down the line if a mess of "reference spaghetti" gets tangled up in one way or another.

However, imagine you're working in a different Project. Take our "Human Resources" Project. It makes perfect sense to have these two things separated into two Projects. They're two different use cases—they use different Content Models, they use different Batch Processes. There's good reason to keep the "invoice-y" things in one spot and the "human resource-y" things in another. There's no reason to clutter up our Project related to human resources documents with assets that only pertain to invoices.

But, particularly for Grooper users who use Grooper across a variety of use cases, you will run into situations where resources you build for one project can be utilized in another. In these cases, it would be beneficial to share resources so that you don't have to rebuild something you've already developed.

Let's say the "Image Cleanup - OCR" IP Profile would also work really well for our human resources (HR) documents. We've already done the work to get that IP Profile working well, and we don't want to duplicate our efforts by recreating it.

  1. In our "Human Resources" project we created earlier, we've added an OCR Profile.
  2. However, (at least initially) objects only have referenceable access to other objects in the "Human Resources" Project (which isn't much as this is basically a new Project).
    • So there is no "Image Cleanup - OCR" IP Profile to point to.
  3. The IP Profile lays out of scope, in a different Project.

Using Resources in Other Projects

So, if we want to use an object from an external Project, what can we do? There are three options:

  1. Directly copy the object from one Project to another.
  2. Reference the external Project to allow access to its resources.
  3. Create a shared resources Project that both Projects reference.

Depending on the situation, there will be strengths and weaknesses to each approach. Next, we will detail each option and discuss some of these associated drawbacks.

Option 1: Copying Objects from One Project to Another

For simpler Grooper environments and simple Grooper objects, simply copying the desired object from one Project to another can work out just fine. This option is often the best for the most basic of circumstances.

However, there can be significant drawbacks to this approach. Furthermore, sometimes this option is going to work for you, sometimes not, depending on the reference complexity of the object you're copying.

FYI

While the following guidance deals specifically with "copying and pasting", the same follows for "cutting and pasting" or "moving" objects from one Project to another.


Let's go back to our previous example. Long story short, we want to use an IP Profile from the "Invoices" Project in the "Human Resources" Project. There's nothing preventing us from doing this, in this case.

  1. We can copy the IP Profile in the "OCR Resources" Folder found within the "Invoices" Project.
    • Either by right-clicking the object and selecting "Copy" or selecting the object and pressing Ctrl + C
  2. And we can paste it into the "Human Resources" Project.
    • Either by right-clicking the Project and selecting "Paste" or selecting the Project and pressing Ctrl + V
  3. A copy of the IP Profile is now placed in the Project.
  4. This means all objects within the Project can reference it. For example, the "HR OCR" OCR Profile can now reference it for temporary, pre-OCR image cleanup using the IP Profile property.

Copying and pasting is a quick and easy solution for getting simple objects from one Project to another. We all know how to copy and paste. This isn't a groundbreaking concept. However, as with many simple things, it's not without its drawbacks.

First, be aware these are now two separate objects. One lives in one Project. The other lives in another Project. They are distinct resources.

Any changes made to the original object will not be reflected in the copied object (or vice versa).

  1. For example, here we've added a Shape Removal IP Step to the original IP Profile.
  2. Notice the copied IP Profile is unchanged. It just has the original two IP Steps from when it was copied to the Project.

This is one of the drawbacks to this approach. If you want to make changes to one object, you'll need to make the same changes to the other (assuming you want both objects to reflect the changes).

Furthermore, there are situations where Grooper will not let you copy objects from one Project and paste them into another. This is a very intentional part of the Project object's design, done to preserve reference integrity.

Grooper allowed us to copy and paste the IP Profile because it did not reference any other object in its original Project. If it did, its functionality would be dependent on that referenced object in the first Project being present in the second Project.

Let's look at another example. In our "Invoice" Project's Content Model, we've built some extractor assets, including an address extractor. Let's say we want to bring that extractor into our "Human Resources" Project's Content Model.

  1. So, we want to copy this Data Type from the "Invoices" Project.
  2. To this Local Resources folder in our "Human Resources" Project.

If we try to do this, Grooper is going to throw an error. Why? The Data Type, as part of its configuration, references several Lexicon objects.

  1. The error lets us know there is a reference violation.
  2. It tells in which Project the referenced objects are contained.

It also gives us the full node tree location within the Project of both the object doing the referencing (either the object you copied or one of its children) and the referenced object, using the following format:

referencing object's location -> referenced object's location

Think of Projects like a friend's house. If your friend invites you over, he or she isn't surprised when you show up. But if you show up with a bunch of friends unannounced, they're going to take issue with you. There's now a bunch of random strangers in their house they didn't expect.

That's just like copying and pasting objects with references. Bringing in an object by itself is no big deal, but bringing along who knows how many objects it references is a big deal (Even more so considering any objects the referenced objects reference, and the objects the referenced objects' referenced objects reference and so on down the line). There's now a bunch of random objects you didn't expect cluttering up your Project.

This puts the onus on you, the user, to decide how you want to resolve these references. Again, there are strengths and drawbacks to each approach. It's up to you to decide what works best for your situation.

One thing you could do is copy all the needed referenced objects over to the second Project. Depending on the number of references you're dealing with, this could be a time consuming process, as it would involve the following steps:

  1. Copy and paste all the referenced objects from the first Project to the second.
  2. Unassign all the references in the object to be copied from the first Project
  3. Paste the object from the first Project to the second.
  4. Reassign all the references in the copied object to all the referenced objects pasted in step 1.

Depending on how these objects are organized, you could also copy and paste multiple objects at a time. Since we're working in the Web Client, you'll notice that the Contents tab is gone. That's because you can select multiple things at once, and copy-paste from there. See below:

  1. Crtl + left-click the desired items and select Copy
  2. Paste the selected items into the Human Resources Local Resources Folder in the Content Model.

Since we were able to copy the extractor and a folder containing all the Lexicons it references and paste them all at the same time, Grooper allowed the move without any issue.

Keep in mind, however, if you copy a folder, you're going to get everything in that folder.

Option 2: Referencing a Project

Resources can be shared between two (or more) Projects by referencing the full Project. This gives explicit access to all objects within a Project, just as if they were created locally.

Let's go back to our problem copying an address extractor that references multiple Lexicons from one Project to another.

  1. We want a copy of this Data Type from the "Invoices" Project...
  2. ...in this Local Resources folder in the "Human Resources" Project.

As we saw previously, Grooper will not allow us to do this (yet).

All we need to do in order to make this happen, is effectively tell Grooper it's ok for the "Human Resources" Project to share assets with the "Invoices" Project. We do this by referencing the whole Project.

  1. To allow access to another Project's resources, first select the Project requesting access in the node tree.
    • The "Human Resources" Project wants access to the address extractor in the "Invoice" Project. So we've selected "Human Resources".
  2. Select the Referenced Projects property and click the ellipses button to expand its dropdown menu.
  3. Choose which Project whose resources you want to access by checking the box next to its name.
    • In this case, we've selected the "Invoices" Project.
    • FYI: You can reference multiple Projects by checking the more than one box.
  4. Click OK.

  1. You'll see the referenced Project listed in the property grid.
  2. Be sure and save when finished.

Now we can copy and paste all day long.

  1. We no longer get that error message if we copy the address extractor from the "Invoices" Project and paste it somewhere in the "Human Resources" Project.
  2. Because the Project is shared, it has a path to navigate to the Lexicons referenced by the extractor.

You may also make direct references to any object in a referenced Project.

For example, because we've referenced the "Invoices" Project we could have simply referenced the address extractor without copying and pasting it.

  1. Here, we've added a Value Reader named "Address Ex 2" to illustrate this example.
  2. We've set its Extractor Type to Reference to demonstrate the reference.
    • FYI: The Reference Extractor Type simply returns the results of a referenced extractor.
  3. Using its Extractor property to select a reference, you can see we now have access to the "Invoices" Project.
  4. This means we can reference any and all objects contained within, including this address extractor.
  5. Once again, be sure to save your changes.

This is an effective way of sharing resources between multiple Projects without duplicating your efforts by creating multiple copies of shared resources that you have to manage independently in each Project.

The only downside to this approach lies in how many different Projects utilize a set of shared resources. If it boils down to a limited number of resources, or resources shared between very similar Projects (in terms of their use case), this approach can work out just fine. But when you get into more and more resources shared between more and more Projects the crisscrossed references between them can be difficult to navigate when you're trying to track down a single object used across a variety of Projects.

In those cases, you may want to do a little extra work and create an entirely separate Project just devoted to housing resources shared between multiple Projects. We will discuss creating and utilizing a "Shared Resources" Project in the next tab.

Please read the following before continuing. It contains best practice advice to avoid potential system corruption when dealing with Project referencing.

Just as you can make references to other Projects, you can remove those references as well. However, to prevent future corruption down the line, you should always ensure no object in your Project references objects in the other Project before removing its reference.

The easiest way to do this is with the "Usage" tab.

  1. Select the Project whose references you want to analyze.
  2. Select the "Usage" tab.
  3. This will bring up a list of outbound references.
    • These are references objects in the selected Project make out to external Projects listed in the Referenced Projects.

These outbound references indicate there are resources in this Project that are dependent on resources in the "Invoices" Project to function.

CAUTION!!!!

While it is technically possible to remove the reference to a Project without resolving these references, YOU SHOULD NOT DO SO. It is best practice to either:

  1. Keep the reference to the Project intact.
  2. Or, manually unassign the references to each object.

Please ensure there are no outbound references to the Project before removing the reference.

Option 3: Creating and Referencing a Shared Resources Project

The last option is to use an entirely separate Project which is solely devoted to housing objects used and referenced by multiple Projects. This option is most appropriate for larger environments, processing different kinds of documents from different use cases. Given a big enough body of documents, despite the fact they may come from different industries or use cases, you will find commonly used resources that are generalizable across a variety of documents. This can include generic or semi-generic extractors, Lexicons, even IP Profiles and OCR Profiles.

In these cases, it often makes sense to create a "bucket" of resources from which all Projects can draw from. The idea is to create shared resources in a single Project referenced by multiple others. Or, in our case, we're going to move these assets to a "Shared Resources" Project.

FYI

Another common example of a shared resources are CMIS Connections and Data Connections.

It is often the case that multiple projects will reuse these connection objects to integrate Grooper with external storage platforms (such as content management systems and databases). Therefore, it would make sense to create something like a "Connections" Project containing these CMIS Connections and Data Connections. Instead of re-creating each connection object for each Project, all Projects can simply reference the "Connection" Project to gain access to the CMIS Connections and/or Data Connections required for import/export operations.

For instance, there are some fairly generic extractors in the "Invoices" Project we may want accessible to the "Human Resources" Project and future Projects as well.

  1. First we're going to move this generic text segment extractor.
    • This one is going to be the easier of the two.
  2. We'll also end up moving this address extractor, but that will take some extra work.
    • The downside to this approach is there is typically some work up front you'll need to engage in to organize your resources in order to get the benefit down the road.

We are going to move these extractors to a new Project, which we will name "Shared Resources".

  1. Here, we've added the new Project.
  2. Since we want to move objects from the "Invoices" Project, we've also made a reference to that Project, using the Referenced Projects property.

For the first extractor, this job is very easy.

  1. We can simply cut this "VAL - Generic Segment" Value Reader from the "Invoices" Project, and we'll paste it into the "Shared Resources" Project.
    • Or, simply move it by dragging and dropping it.

  1. The Value Reader moves to the "Shared Resources" Project with no issue.
    • Why? Noting else in the "Invoices" Project referenced it!
  2. We won't be so lucky with the "VAL - Generic Decimal" Value Reader.
  3. If we attempt to move this object, we will get a series of reference violation errors. There are several objects in the "Invoices" Project using (i.e. referencing) this extractor.

Here's where we get into the extra work on the front end.

What we can do first, is copy the Value Reader. It makes no references to other objects. The issue here is that other objects are referencing it.

  1. So, we can copy it.
  2. And we can paste that copy into the "Shared Resources" Project.

Now, if you truly want to use this as a "shared" or "global" resource, you can reassign all the references to the "VAL - Generic Decimal" extractor within the "Invoices" Project.

Ultimately, we will need the "Invoices" Project to reference the "Shared Resources" Project to reassign the references.

  1. First, to avoid a circular reference, we will need to unassign the "Shared Resources" Project's reference to the the "Invoices" Project.
  2. Before removing a Project reference, it is always best practice to analyze any outbound references to the external Project, using the "Usage" tab.
  3. No outbound references are detected (meaning there is no object in the "Shared Resources" Project referencing out to objects in the "Invoices" Project). This is what we want to see. If there were outbound references, we would want to resolve them before removing the reference to the external Project.

You should always check the "Usage" tab before removing a reference to a Project.

Grooper will technically allow you to remove a reference to a Project even with outbound refences outstanding. However, doing so is not best practice as it can cause corruption of your system down the road.

  1. With no references detected from the "Invoices" Project, we can remove the Project reference without issue.
  2. Be sure to Save the project when finished.

  1. Next, we need to get rid of the local extractor in the "Invoices" Project and replace it with the copy we placed in the "Shared Resources" Project.
  2. In order to access the extractor in the "Shared Resources" Project, the "Invoices" Project must reference the "Shared Resources" Project.
    • Here, we have selected the "Invoices" Project.
  3. Click the ellipses button.
  4. Using the Referenced Projects property, we have selected the "Shared Resources" Project.
  5. Select OK.

  1. Now, we can go about the business of reassigning any reference to our local extractor to the one in our "Shared Resources" Project.

The quickest way to figure out every object that references a selected object in the node tree, is to use the "References" area on the Property Grid.

  1. To access this, (after selecting the object whose references you want to verify) select the "Advanced" tab.
  2. The References area will be in the bottom-right corner and will list every object that references the selected object.
    • In our case, there's one Data Type extractor ("VE - Invoice Total") and three Data Column objects ("Quantity" "Price" and "Extended Price") referencing the selected extractor ("VAL - Generic Decimal")

What we could do from here is track down each of these objects, find where in their property grid the extractor is referenced, and reassign that reference to the version in the "Shared Resources" Project. That is a perfectly acceptable, although somewhat time consuming way to reassign references. Luckily, we have a shortcut available to us.

The "Reassign References..." button will allow us to change the reference for each object in the list from the selected object, to a different one.

This is exactly what we want to do. We want to change the reference set on these Data Columns and Data Type from the "VAL - Generic Decimal" extractor in the "Invoices" Project to the copy we made in the "Shared Resources" Project.

  1. Press the "Reassign References..." button.
  2. This will bring up a window to select a new object for the reference.
  3. Check it out. Here's our referenced "Shared Resources" Project.
    • Selecting the "VAL - Generic Decimal" Value Reader, we will reassign the reference to this extractor in our "Shared Resources" Project.
  4. Press "OK" to finish reassigning the references.

  1. Because the extractor is no longer referenced by any other object, the "Referenced By" list is now empty. All the objects that were listed here, are now referencing the extractor we chose in our "Shared Resources" Project.
    • In other words, we reassigned the references.
    • We've effectively replaced the local Project's decimal extractor with one in an external Project, accessible to any other Project that references it.
  2. Since no other object references the local decimal extractor, and we've replaced its references with something else, it is now safe to delete it.

As we've demonstrated, it's a little extra work if you decide you want to move resources from one Project to a shared resources Project. However, the benefit to organizing assets like this is any Project referencing our "Shared Resources" Project now has access to its assets.

  1. For example, we could tell our "Human Resources" Project to reference the "Shared Resources" Project.
  2. Now, both the "Human Resources" Project and the "Invoices" Project have access to its resources.
    • Furthermore, any changes we make to the object in the "Shared Resources" Project will be reflected by any object in any Project that touches it. This can prevent duplication of efforts when updating an object's properties.
    • If any other Projects or any future Projects can make use of these resources, all you have to do is assign it a reference to the "Shared Resources" Project. It acts as one big community bucket of resources other Projects can draw from.

Click here to return to the top of the tab

The Essentials Project

Every newly created Grooper Repository in version 2023, will come with a Project named "Essentials". This Project contains several resources you may find useful when designing your document processing assets. Just like any other Project, you can access these resources by making a reference to the "Essentials" Project. The objects contained within can be examples of different types of objects you create, resources you can copy into your own Projects and build on top of, or simply resources you directly reference in your Projects.

In this project you will find various:

  1. Data Type and Value Reader Extractors
  2. Lexicons
  3. Profile objects (OCR Profiles, IP Profiles etc.)

The Usage Tab

In this section, we will outline the "Usage" tab in more detail. We discuss the "Usage" tab throughout this article. But what is it? It is a way for Grooper users to track down node references when one Project references objects in another Project.

  1. To access the "Usage" tab, click a Project.
  2. Select the "Usage" tab.
  3. Notice that for the "Human Resources" Project, we have Outbound References.
    • Outbound References are references to nodes in other Projects that the current Project is using.
      • In our case, the "Human Resources" Project is referencing nodes in the "Global Resources" Project. Hence, it has Outbound References to it.
    • Any attempt to remove or change these references will be met with an error.

If you view your projects like a building, our "Human Resources" Project is like an upper floor, while "Global Resources" Project is the lower floor, and the references from the former to the latter connect and support the two floors. If you try to remove the support structure, your building collapses. If you want to the upper floor to somehow stand on its own, you will need to make it its own separate building—something we will demonstrate further in this article.

Outbound References

References bind projects to one another. If you want to better view which links in your Project are connected to each other, you'll have to dig into the References.

For these Outbound References, note the "To" and "Fr" preceding the reference path. "Fr" is what is using the reference, "To" is where the reference originates.

  1. The Data Type, "Words" found along the path, Global Resources -> Downloads -> Essentials -> Data Types -> Features -> Words, is being referenced by The Content Model, Human Resources(1).
  2. The Image Processing Process Step is referencing the IP Profile Image Cleanup found in the Shared Resources Folder of the Global Resources Project.

Inbound References

Inbound References are the opposite of Outbound References. For inbound, the "Usage" tab displays all the other Projects that reference the selected Project. Note the lack of a Project name in the "To" section.

  1. The Content Model for Human Resources is using the Data Type Words as a reference.
  2. Similarly, the Image Processing Batch Step is dipping into the Global Resources project and linking itself up to the Image Cleanup Permanent IP Profile.

Essentially, Outbound References refer to references that the selected Project has reached OUT to other projects to grab, while Inbound References have other projects reaching INTO them to reference their bits and pieces.

Upgrade guidance

You may download and import the files below into your own Grooper environment (version 2023).


Projects are a new way of organizing Grooper resources in version 2023. In previous versions, Grooper resources were organized primarily in one of three folders in the node tree:

  • The Content Models folder
  • The Global Resources folder
  • The Processes folder of the Batch Processing folder

Users would have to go back and forth between these locations in order to configure what they needed to process documents through Grooper. This often resulted in a time consuming and cumbersome process, sifting through the node tree's hierarchy to get to the objects you needed.

Projects simplify this issue by allowing you to place all associated resources for a given use case (or "project") in a single node location.


You can see the difference in the image to the right. All the required Grooper assets for one single document processing project are highlighted.

Before the introduction of Projects in 2022, these objects were interspersed throughout various locations in the node tree.

In version 2023, everything can be neatly placed in one, single location, making finding what you're looking for much simpler.


FYI

If you have certain Grooper resources that can be used by multiple Projects (such as extractors, profile objects, or CMIS Connections), you can grant multiple Projects access to them through Project references.

For more information, visit the #Referencing Objects in Other Projects section of this article.

What happens when you upgrade?

Obviously, this architecture is much different than how your assets are currently organized in Grooper. So, what's going to happen when you upgrade?

  1. Upon upgrading to version 2023, most Grooper objects in your repository will simply be placed into a new Project named "Project 1".
  2. All Content Models will be organized into a folder named "Content Models"
  3. All working Batch Processes will be placed in a folder named "Processes" within the Project
  4. Any published Batch Processes will be placed in the "Processes" folder at the first level of the node tree.

Anything in the "Global Resources" folder will be placed throughout "Project 1"

  1. If these objects were organized into a subfolder in the "Global Resources" folder, a folder of the same name will be created.
    • For example, in this Grooper Repository, there was a folder named "HR Docs Resources", containing a handful of Grooper objects. Upon upgrading, a folder of the same name, containing the same objects, was placed in "Project 1"
  2. Any unfoldered objects in the "Global Resources" folder will be placed at the first level of the "Project 1" Project.
  3. Last, all Production and Test Batches will be placed in the "Batches" folder at the first level of the node tree.

Deciding What to Do Next

It's important to point out your Grooper environment will work just fine with everything organized into the single "Project 1" Project. You can leave everything as is in "Project 1" upon upgrading to version 2023 and continue processing Batches of documents as if nothing happened.

Going forward you have two options:

  1. Do nothing. Leave all Grooper resources organized into "Project 1"
  2. Migrate resources into their own Projects.

You should consider this an "all or nothing" choice. There are some significant benefits to organizing resources into their own Projects, but it should not be done haphazardly. You will not see the true benefits of this new architecture if you take a "half in/half out" approach. That said, migrating resources to new Projects will take time. There are some utilities that will aid you in this task, but there will necessarily be some manual moving of objects from one node location to another.

So, should you migrate away from "Project 1" at all? Here are some things to keep in mind, when making this decision.

  1. It's all or nothing.
    • Again, we stress the importance of committing to the move. You should commit to migrating everything to new Projects (with the exception of a handful of shared resources), rather than just a few. The benefits of the Project architecture will not be realized until you've completed the entire process. Not following this advice increases the likelihood of a time-sensitive call to the help-desk in the future. This call will likely be time-consuming as we attempt to track down the issue through a partially architected system.
  2. You don't have to move things from "Project 1" at all."
    • If you do not have the time or resources to migrate out of "Project 1", it's best to leave everything in "Project 1". Everything will continue to work as it did previously.
  3. Do you have time to do it?
    • This is probably the biggest question you need to ask yourself. The migration will take time. The larger the repository is, with many Content Models, Batch Processes, profiles and other objects, the longer it's going to take.
  4. Do you have a lot of "shared resources"?
    • If you frequently have individual Data Types, Lexicons, profiles, CMIS Connections or other objects used across many different Content Models and Batch Processes, this will take the highest amount of time and effort to migrate. Ensuring these shared resources are accessible to each Project created is the most time consuming part of any migration out of "Project 1".
  5. Do you frequently promote objects from a "test" or "dev" Grooper Repository to a "production" Grooper Repository?
    • If so, Projects are for you. The new architecture provides multiple advantages to this kind of workflow. You should seriously consider devoting the time to migrate resources into their own Projects, if you maintain multiple environments to publish Grooper objects from development to production repositories.
  6. Do you use third-party data entry companies to review work in Grooper?
    • If so, Projects are for you. You'll benefit from being able to push complete and tidy project packages to an environment dedicated to that company.
  7. Do you have multiple Grooper engineers working in the same Grooper Repository(ies)?
    • If so, Projects are really for you. Aside from object organization, the other big reason for creating the Project architecture was to maintain object reference integrity. Projects will greatly assist you in preventing reference corruption in your Grooper environments.

Project Migration Plan

Ok, you've decided Projects are for you, and you want to move resources out of "Project 1" to best take advantage of them. What are the next steps forward?

We've narrowed the process down to seven general steps:

  1. Clean up your repository. Delete items that are no longer in use and will not be used in the future.
  2. Use the "Create Project" feature for each Batch Process.
  3. As each Project is created, rename any objects as needed if your prior naming conventions no longer make sense.
  4. For each Project, use the "Usage" tab to decide what to do about "shared resources" used by multiple Projects.
  5. Remove Project references if the "Outbound References" list is empty.
  6. Reorganize any shared resource objects that remain in "Project 1"
  7. Rename "Project 1" to something like "Global Resources" or "Shared Resources"

1. Clean House

If you're going to take the time to reorganize your resources into Projects, now is a good time to take a look at the Grooper objects in your repository and get rid of anything not in use littering up your environment.

This is entirely optional, but now is as good a time as any to clean house.

  1. For example, we have a dummy IP Profile named "temp" and Value Reader named "test" in our newly upgraded "Project 1" Project.
    • I have no idea where these objects came from or what their original purpose was. They aren't being used by anything else in this environment. It's best to just delete them to get them out of the way.

2. Create Project

Now we can start in earnest and create some Projects. You could do this manually. The steps would be as follows.

  1. Add a Project to the "Projects" folder.
  2. Using the Project's Referenced Projects property, reference "Project 1".
  3. Move a Batch Process to that Project.
  4. Move the Content Model associated with that Batch Process to the Project.
  5. Move any other Grooper objects referenced by the Batch Process or Content Model's objects to the Project.
    • Or keep any "shared resources" put in "Project 1", maintaining access to them through the Project reference (We'll discuss this further in Step 4: Analyze References).

There's nothing wrong with this approach, but there's a quicker way of doing things (or at least starting this process) using the "Create Project" feature.

The "Create Project" feature is accessed by selecting a Batch Process. If you think about it, a Batch Process should reference any Grooper object necessary to do work for a particular use case. All the necessary objects will be referenced in the steps of the Batch Process as part of its execution, such as a Content Model referenced for a Classify step or an OCR Profile referenced for a Recognize step.

The "Create Project" utility will create a new Project, named the same as the Batch Process's name, look for any objects referenced as part of its execution, and move them to the new Project.

Important! "Create Project" will only move objects not referenced by anything else. If another Batch Process uses the same OCR Profile, for example, that OCR Profile will remain in "Project 1". We will discuss this further in "Step 4: Analyze References".

We will start by creating a new Project using a fairly simple document redaction Batch Process. This is an entirely "self-contained" Batch Process. No other Batch Process utilizes its resources.

  1. This is the Batch Process we will create the Project from.
  2. It references this Content Model, including resources in its Local Resources folder.
  3. Specifically, the "URLA" Document Type is referenced as the Extract step's Default Content Type. Since this Document Type is referenced, its parent Content Model (and all its children, including Local Resources folder and Data Model) will be moved to the new Project.
  4. This OCR Profile is also referenced (by the Recognize step). So, it will move to the new Project as well.

To create the Project, perform the following steps.

  1. Select the Batch Process you wish to use to create the new Project.
  2. Right-click the project and select "Create Project."

When the util

There are two configurable options when creating the new Project.

  1. The Remove Emptied Folders property will delete a folder from "Project 1" if it is empty after objects are moved to the new Project.
    • Generally speaking, you'll want this property set to True. It cleans up empty folders in "Project 1". Why would you want to keep a bunch of empty folders around? I don't know. If you have a reason to keep these empty folders, you can keep this property False.
  2. Press "Execute" to create the Project.

When the utility finishes running, a new Project will be created. All objects associated with the Batch Process are moved from "Project 1" to the new Project (as long as that move is allowed. Again, we'll talk more about moves that aren't allowed during Step 4).

  1. In our case, this Project named "URLA Redaction" was created.
    • The new Project will always be named after the Batch Process.
  2. A total of three objects were moved from "Project 1" and placed in the new Project.
    • This was a relatively simple Batch Process. More complicated Batch Processes will more likely than not have more objects referenced, and therefore, more objects moved. But, you can pretty much guarantee you'll at least end up with the Batch Process and a Content Model in the new Project. With few exceptions, you're always going to need a Batch Process and a Content Model to do work in Grooper. In general, each Project will have one Batch Process and one Content Model.
FYI

Grooper renamed our Content Model to "Content Model". Why? In "Project 1", the Batch Process and Content Model were both named "URLA Redaction". Object names in the same branch of the node tree must be unique. Grooper will rename any object sharing the name of the source Batch Process after their object type. Therefore the Content Model named "URLA Redaction" was renamed "Content Model".

3. Rename Resources

With the switch to the Project architecture, you may find your naming convention no longer makes sense or could be adjusted. Much like the "Clean Up" step, this step is not strictly necessary. But, if you're going through the effort to reorganize your repository into a new structure, you might as well make sure how you're naming things make sense in that new structure.

If you're coming from an environment with a lot of Batch Processes and a lot of Content Models you've probably named your resources according to their intended use case. So you might have "Use Case X Content Model", "Use Case X OCR Profile" "Use Case X IP Profile" and so on. You may find this naming superfluous once all these assets are moved over to a Project. So, it might make more sense to you to just rename these objects after their generic object type or function in the Batch Process workflow.

  1. For example, we've renamed our OCR Profile from "URLA OCR" to simply "OCR"
  2. Name your resources whatever makes sense to you in your environment. "Batch Process" may be too generic of a name if you're executing multiple different Batch Processes. We went ahead and stuck with "URLA Redaction" for our Batch Process here.

4. Analyze References

So far, this process has been fairly simple. With the use of the "Create Project," all resources associated with a Batch Process are moved to a new Project. Our previous example was so simple, because all the resources were fairly self-contained, or "local" to the Batch Process.

That is not always the case. Particularly with larger environments, you will find you reuse resources across a variety of Batch Processes. For example, the "Full Text - Accurate" OCR Profile, in our "Essentials" downloads, is many Grooper users "go to" OCR Profile if they don't have the time or the will to create their own. This OCR Profile would be a shared or "global" resource, touched by multiple different Batch Processes.

When you have shared resources, they will not be copied over to a newly created Project when using the "Create Project" feature. They can't. Other Batch Processes need to use that resource too. Instead, all resources that are not shared are moved to a newly created Project, and a Project reference is made to "Project 1". The Project reference allows the new Project access to the resources it needs in "Project 1".

  1. We will create Projects for these two Batch Processes next.
  2. However, they share a number of resources in their Batch Process, two IP Profiles and a CMIS Connection.
  3. For example, they both use the "Image Cleanup - Permanent" IP Profile in executing the Image Processing steps of their Batch Processes.

Next, we will select "Create Project" to create a new Project from each Batch Process, starting with the "Invoices Process" Project.

  1. Select the Batch Process you wish to use to create the Project.
    • In our case, we're starting with the Batch Process named "Invoices Process".
  2. Right-click and select "Create Project."

  1. Configure the Project creation properties as desired.
  2. Press the "Execute" button to create the Project.

  1. A new Project is created and several resources have been moved.
  2. However, select the Project created.
  3. Notice the Referenced Projects property shows a Project reference to "Project 1".

This indicates there is something in "Project 1" the Batch Process utilizes that can't be moved because another Batch Process (or its associated objects) utilize it in one way or another.

Essentially, both Batch Processes need to reference one or more objects. So those objects stay put in "Project 1" and are accessible through the Project reference. By referencing "Project 1", the Project we just created has access to all its resources, including whatever it is it needs to function.

So, just what resources are out in "Project 1" that our new Project needs? Good question. You can quickly answer this with the "Usage" tab.

  1. Select the "Usage" tab to view all objects in "Project 1" referenced by the selected Project.
  2. This will bring up a list of outbound references.
    • These are references objects in the selected Project make out to external Projects listed in the Referenced Projects.
  3. Any referenced object will be listed.
    • In our case, we made references to three objects in the "Shared Resources" folder of "Project 1"
      • Two IP Profiles: "Image Cleanup - OCR" and "Image Cleanup - Permanent"
      • One CMIS Connection's ("NTFS - Local Hard Drive") CMIS Repository ("Import Export")

Now that we know what is shared. We have two options:

  1. Copy these resources from "Project 1" so that we have local copies of these resources.
  2. Keep these shared resources put so that every Project that needs them can reference the same object.

Your choice will largely depend on how big your environment is, how many times the resources in "Project 1" are referenced by different potential Projects, and if you prefer to have local copies of these resources that can be edited independently or if you want these resources to be truly shared across different Projects (meaning changing one single object will impact how multiple Projects implement it).

Option 1: Copy the Resources

The process will be most time consuming if you want to copy these objects over to the Project. However, doing so does have benefits. Once you copy these shared resources from "Project 1", you will no longer need the reference to "Project 1". At that point the Project is independent and self contained. If you need to share this Project with another Grooper environment (say promoting it from a "test" repository to a "development" repository) it will have no dependencies to another Project that need to be shared along side it.

Aside from the time it takes to do this, the only drawback is the resources are then completely local to the Project. Changes to the versions copied from "Project 1" will be separate objects from the versions copied to the Project. This means they must be edited independently if you want to make changes to them.

Copying these resources over is not necessarily the hard part. It's even easier in this case because they're already all in the same folder.

  1. We can simply copy the whole folder.
  2. Or we can select the folder, and go to the "Contents" tab.
  3. Then, multi-select all the objects we want to copy.
    • Crtl + click to do this.
    • In this case, all of them.
  4. Paste them into the Project you want.

FYI

Depending on the reference complexity of the objects you are copying, this process will be more complicated. It will require you to track down the referenced objects, and bring them over either a) before you copy and paste the object referencing it (which will require you to reset the reference after everything is copied over) or b) copy and paste everything at the same time.

For more information on copying and pasting objects from one Project to another, please refer to the #Referencing Objects in Other Projects section of this article.

  1. Now we have local copies in our Project of these referenced resources from "Project 1".

Now the time consuming part of this process. All the references must be reassigned from the source objects in "Project 1" to their local copies in the Project.

  1. For example, this OCR Profile uses one of the IP Profiles we just copied.
  2. We would need to select the property where the reference is assigned.
    • The IP Profile' property in this case.
  3. Then, we'd need to change the reference from the source object in "Project 1"
  4. ...to the local copy in our Project.

The more resources you copy over, and the more complex references and sub-references you're faced with, the more time consuming this process will take.

Option 2: Keep Project 1 Shared

The other option is to just keep these resources shared. Your Batch Process will function as it did before. The only difference is your Project will be dependent on the referenced Project ("Project 1") to function.

  1. Here, we selected "Create Project" to create a new Project from the last Batch Process in the Grooper Repository.
  2. We've selected the "Usage" tab to see what's referenced in "Project 1"
  3. It's utilizing one IP Profile and one CMIS Connection we saw in the previous example.
  4. It's also utilizing some of the resources in our Grooper Essentials package.

If you're fine with keeping these as shared resources in a referenced Project, you're done here. There's no need to go through the time consuming, copy and paste and reference reassignment dance we did earlier. The only potential drawback here is you've made this Project dependent on "Project 1". You will need to evaluate for yourself whether this is a drawback, a benefit, or doesn't really matter one way or another.

Option 3: Option 1 + Option 2

There's also no reason why you can't copy some items over to your new Project and keep others referenced through Project references.

For example, in our first example, we could have copied over the two IP Profiles but kept the CMIS Connection as a shared resource. In fact, that would have made more sense. If we have to make changes to a CMIS Connection (like entering new access permissions), we would only want to do that once by manipulating one single object, rather than reproducing our efforts by editing multiple copies of the same object in multiple Projects.

5. Remove Project References

Now, we're at a point where we've used the "Create Project" feature for every Batch Process in "Project 1". What's next?

In our situation, we've created some Projects that need to retain a reference to "Project 1" and some that do not. For those that do not, we should go ahead and remove the reference to Project 1. To do this, the "Usage" tab will once again be useful. For each Project we will want to use the "Usage" tab to check and see if there are any outbound references. If there are, we'll keep the reference intact. If there are not, we will remove the reference.

We've decided the "Human Resources" Project will retain its reference to "Project 1." If you want, you can verify the references with the "Usage" tab.

  1. Select the "Human Resources" Project
  2. Select the "Usage" tab.
  3. Take note of the outbound references.

  1. Next, we'll analyze the "Invoices Process" Project's references.
  2. Select the "Usage" tab.
  3. There are not any outbound references.

Now, we know it's safe to remove the reference to "Project 1".

  1. Using the Referenced Projects property, remove the reference to "Project 1"
  2. Press "Save" when finished.

You should always use the "Usage" tab before removing a reference to a Project.

Grooper will technically allow you to remove a reference to a Project even with outbound refences outstanding. However, doing so is not best practice as it can cause corruption of your system down the road.

  1. Continue selecting your Projects to analyze their references.
  2. Once all Projects with no outstanding references to "Project 1" have had their Project reference to "Project 1" removed, you are done.

6. Reorganize Shared Resources

After you're done creating Projects from the Batch Process in "Project 1", you'll want to clean up the remaining resources in "Project 1".

This will include:

  1. Organizing any leftover assets into manually created Projects, if applicable.
  2. Organizing any remaining shared resources into folders and deleting empty folders, as you see fit.

Manually Creating Projects

There are some situations where you may need to manually create a Project for certain assets remaining in "Project 1". Most commonly, this can happen if you have resources you've created for testing purposes that are not tied to a Batch Process and you want to keep them around.

  1. For example, we have two Content Models leftover.
  2. We also have this IP Profile

It seems like these are partially architected resources from whenever this Grooper user went through Grooper's ACE - Architect training. We may want to keep these around so this user can continue their training.

The only issue is there is not Batch Process utilizing these resources, so we can't take advantage of our "Create Project" utility.

So, we'll need to manually create a Project

  1. Right click the Projects folder in the node tree.
  2. Select "Add", then "Project..."
  3. Name the new Project.
  4. Press "Ok" when finished.

From here, you can cut and paste or move resources from "Project 1" to the new Project.

  1. For example, we moved this IP Profile from "Project 1" to the "ACE Training" Project we created.

Next, we're going to look at a potential issue you may encounter when moving resources out of "Project 1" to another Project.

  1. I want to move either of these two Content Models from "Project 1" to the "ACE Training" Project.
  2. If I try to move one, I get the following error message.
    • This is telling me there is a reference violation. The Content Model is dependent on some external reference, and can't be moved without resolving the reference.

Why did this happen? Long story short, someone did something they should have never done, but was technically allowed in previous versions of Grooper.

v

The problem is someone made a reference within a Content Model to something in another Content Model.

  1. This Data Field is causing the problem.
  2. Its Value Extractor is set to Reference a Data Type.
  3. The problem is the Data Type it's referencing is in a different Content Model.

This is a big "no-no" that was technically possible in Grooper, but never considered best practice. This may have happened by accident when copying a Content Model. This may have been a "quick fix" that some Grooper designer did, intending to go back and resolve but never got around to it. Who knows. The main issue is these types of reference violations can cause problems down the road, potentially causing corruption in your Grooper environment.

Part of the reason Projects were created was to avoid this type of corruption due to improperly referenced objects. Referencing Projects via the Referenced Projects property makes external resource references much more intentional, avoiding accidental reference violations (as much as possible).

In order to resolve this reference violation and get these objects moved to the new Project we need to resolve the reference violation in one way or another.

  1. We could clear the offending reference, or reset it to something that doesn't violate a reference across multiple Content Models.
  2. Or, since this Content Model is really just a copy of the other one, we could just delete it.
    • This is what I will elect to do.

  1. With the reference violation resolved, the Content Model can be moved with no issue.

Clean Up Remaining Folders

After you've moved everything out of "Project 1" into a new Project (whether manually or through the "Create Project" feature), the only resources left should be shared resources, objects intended to be used by any current or future Project.

All that's left is to organize the objects and folders remaining.

  1. You may have empty folders you can simply delete.

FYI: This "Content Models" folder will be "Read Only". This is a carry over from this folder being a system node in the previous version of Grooper's architecture.

  1. To delete it, you will need to select the folder, then go to the "Advanced" tab.
  2. Then, under Attributes, change Read Only to False.
    • Then navigate off the object. You will be prompted to "Save". After saving, you will be able to delete it.

You may also want to create some new folders for stray objects or move some folders to the root of the Project.

  1. For example, many users find it helpful to move the Grooper "Essentials" folder to the root of the Project, then delete the "Downloads" folder.

7. Rename Project 1

Once all resources are out of "Project 1" and you've organized any shared resources remaining, rename "Project 1" to something that reflects its true utility, something like "Shared Resources" or "Global Resources".

  1. We named ours "Global Resources".

That's it! The migration from "Project 1" to Grooper's new Project based architecture is complete!!

Click here to return to the top of the tab

What's with that "Processes" folder?

If you're new to Grooper (or version 2023) you may be asking yourself, "What's with that "Processes" folder in the node tree?"

As mentioned before, one of the things a Project can (and should) house is a Batch Process. If a Project can hold a Batch Process what does the Processes folder hold?

  • Projects hold working Batch Processes.
  • The Processes folder holds published Batch Processes.

When adding and configuring a new Batch Process, you will always add it to a Project first. As you are editing it, you do not want it to be "live" or usable in a production-level environment as documents are coming into Grooper. This would cause partially or improperly processed documents to come through Grooper. So, while you are working on a Batch Process it is a working Batch Process.

Once that Batch Process is finished and ready to be implemented in a production-level environment, it is then published (using the "Publish" button in the Batch Process object's UI). This creates a read-only copy of the working Batch Process in the Processes folder. Production-level Batches only have access to Batch Processes in the Processes folder, ensuring they are processed using only published processing instructions, not working ones.