Project - 2022

From Grooper Wiki
Revision as of 14:28, 1 March 2023 by Dgreenwood (talk | contribs) (→‎Projects and Upgrading to 2022)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

A Project is the primary container in which document processing components are created, configured, and organized. It is a library of resources, such as Content Models, Batch Processes, OCR Profiles, Lexicons, and more, needed to process documents through Grooper.

About

After installing and setting up a Grooper Repository, creating a new Project is most likely the first thing you will do when starting work in Grooper Design Studio. A variety of different Grooper assets are required to process documents. A Content Model is required to classify documents and extract their data according to that classification. An OCR Profile is required to perform optical character recognition to get machine readable text from scanned pages. A Batch Process is required to define the step-by-step instructions to process documents from start to finish. A Project allows you to house these various resources related to a processing use case in one location.

Imagine you're processing vendor invoices. Pretty much anything and everything you need to process these documents can be organized into a Project.

  1. Here, we have a Project named "Invoices"
  2. This Project houses the Content Model configured for document classification and data extraction.
  3. It also holds the Batch Process used to process Batches.
  4. As well as other Grooper objects required for this use case.
    • "NTFS Connection" is a CMIS Connection utilized for exporting content. It is referenced by the "Invoices Model" Content Model's Export Behavior configuration which is executed when the "Invoices Process" Batch Process's Export activity is applied.
    • "Permanent IP" is an IP Profile referenced by the Image Processing step of the "Invoices Process" Batch Process.
    • "Scan Profile" is a Scanner Profile referenced by the Scan step of the "Invoices Process" Batch Process.

2022-project-about-01.png

How you organize objects in your Project is largely up to you. However, in service of this task, be aware you can add any number of folder levels to your Project.

  1. For example, we've added an "OCR Resources" folder, which contains an OCR Profile and an IP Profile it references.
  2. In the "Separation Resources" folder, there is a Separation Profile and an extractor referenced in the profile's configuration.

2022-project-about-02.png

What's With That Processes Folder?

If you're new to Grooper (or version 2022) you may be asking yourself, "What's with that "Processes" folder in the node tree?"

As mentioned before, one of the things a Project can (and should) house is a Batch Process. If a Project can hold a Batch Process what does the Processes folder hold?

  • Projects hold working Batch Processes.
  • The Processes folder holds published Batch Processes.

2022-project-about-03.png

When adding and configuring a new Batch Process, you will always add it to a Project first. As you are editing it, you do not want it to be "live" or usable in a production-level environment as documents are coming into Grooper. This would cause partially or improperly processed documents to come through Grooper. So, while you are working on a Batch Process it is a working Batch Process.

Once that Batch Process is finished and ready to be implemented in a production-level environment, it is then published (using the "Publish" button in the Batch Process object's UI). This creates a read-only copy of the working Batch Process in the Processes folder. Production-level Batches only have access to Batch Processes in the Processes folder, ensuring they are processed using only published processing instructions, not working ones.

2022-project-about-04.png

Adding a New Project

Add a Project

Projects are added to the Projects folder node in the node tree.

  1. To add a new Project first right-click the Projects folder.
  2. Select "Add", then "Project..."
  3. This will bring up a window to name your new Project.
    • In our scenario, we're starting a new Project to process human resources documents. So, we named it "Human Resources".
  4. After giving it a name, press the "OK" button to create the Project.

2022-project-adding-a-new-project-01.png

  1. This will add the Project to the Projects folder in the node tree.

2022-project-adding-a-new-project-02.png

Add Resources to the Project

The following Grooper objects can be added to a Project

  • Batch Processes
  • Content Models

Extractor objects

  • Value Readers
  • Data Types
  • Field Classes

Profile objects

  • OCR Profiles
  • IP Profiles
  • Separation Profiles
  • Scanner Profiles

Data integration objects

  • CMIS Connections
  • Data Connections

Other objects

  • Lexicons
  • Control Sheets
  • Object Libraries

So, how do you add them to a Project? Much like you would add an item to a node tree folder in Grooper.

  1. Right click the Project.
  2. Select "Add" then whichever object you want to add to the Project.
    • You can't do much without a Content Model in Grooper. So, we've selected "Content Model..."
  3. This will bring up a window to name the object.
  4. Press "OK" to add the object to the Project.

2022-project-adding-a-new-project-03.png

  1. Once added to the Project, you can select and configure the object as needed.

2022-project-adding-a-new-project-04.png

What About Batches?

One thing you cannot add to a Project are Batches. This includes Test as well as Production Batches.

  1. Batches are housed in the "Batches" node of the node tree.
  2. Test Batches can be added by expanding the Batches node and right clicking the "Test" node.

2022-project-adding-a-new-project-05.png

Test Batches can be accessed by any Grooper object with a Batch Selector in its UI.

  1. Here, we've added a Test Batch named "Sample Batch"
  2. A Value Reader, like this one named "Example" we have selected here, is just one of many objects with a Batch Selector panel.
  3. Using the Batch Selector's dropdown, you can select any Batch in the "Test" folder node.
  4. For example, our Batch named "Sample Batch".

2022-project-adding-a-new-project-06.png

Click here to return to the top

Referencing Objects in Other Projects

Projects are new to version 2022. If you're new to Grooper, this won't mean much to you. Just know Projects are a much better way of organizing and accessing Grooper assets in a node tree structure than in previous versions. (And, if you are upgrading to version 2022, please review the #Projects and Upgrading to 2022 section of this article)

Aside from organizational benefits, one of the big reasons for switching to a Projects based architecture was to maintain reference integrity woven throughout multiple objects in a repository.

Generally speaking, Projects are intended to "silo" the resources contained within. Objects within the Project can freely reference other objects within the same Project but cannot reference objects in other Projects (without being explicitly allowed to do so).

For example, in our "Invoices" Project, the "Invoices OCR" OCR Profile references the "Image Cleanup - OCR" IP Profile to perform temporary image processing prior to running OCR.

  1. The reference to "Image Cleanup - OCR" set using the OCR Profile's IP Profile property is allowed.
  2. Both objects are contained in the same Project.

Generally speaking, maintaining reference integrity is ideal. The more narrowly you can define an object's allowable scope of reference, the better. This makes it easier to track down references, limits the number of object dependencies, making your system easier to manage, and limits possible system corruption down the line if a mess of "reference spaghetti" gets tangled up in one way or another.

2022-project-references-01.png

However, imagine you're working in a different Project. Take our "Human Resources" Project. It makes perfect sense to have these two things separated into two Projects. They're two different use cases. They use different Content Models. They use different Batch Processes. There's good reason to keep the "invoice-y" things in one spot and the "human resource-y" things in another. There's no reason to clutter up our Project related to human resources documents with assets that only pertain to invoices.

But, particularly for Grooper users who use Grooper across a variety of use cases, you will run into situations where resources you build for one project can be utilized in another. In these cases, it would be beneficial to share resources so that you don't have to rebuild something you've already developed.

Let's say the "Image Cleanup - OCR" IP Profile would also work really well for our human resources (HR) documents. We've already done the work to get that IP Profile working well, and we don't want to duplicate our efforts by recreating it.

  1. In our "Human Resources" project we created earlier, we've added an OCR Profile.
  2. However, (at least initially) objects only have referenceable access to other objects in the "Human Resources" Project (which isn't much as this is basically a new Project).
    • So there is no "Image Cleanup - OCR" IP Profile to point to.
  3. The IP Profile lays out of scope, in a different Project.

2022-project-references-02.png

Using Resources in Other Projects

So, if we want to use an object from an external Project, what can we do? There are three options:

  1. Directly copy the object from one Project to another.
  2. Reference the external Project to allow access to its resources.
  3. Create a shared resources Project that both Projects reference.

Depending on the situation, there will be strengths and weaknesses to each approach. Next, we will detail each option and discuss some of these associated drawbacks.

Option 1: Copying Objects from One Project to Another

For simpler Grooper environments and simple Grooper objects, simply copying the desired object from one Project to another can work out just fine. This option is often the best for the most basic of circumstances.

However, there can be significant drawbacks to this approach. Furthermore, sometimes this option is going to work for you, sometimes not, depending on the reference complexity of the object you're copying.

FYI

While the following guidance deals specifically with "copying and pasting", the same follows for "cutting and pasting" or "moving" objects from one Project to another.

Let's go back to our previous example. Long story short, we want to use an IP Profile from the "Invoices" Project in the "Human Resources" Project. There's nothing preventing us from doing this, in this case.

  1. We can copy the IP Profile in the "Invoices" Project.
    • Either by right-clicking the object and selecting "Copy" or selecting the object and pressing Ctrl + C
  2. And we can paste it into the "Human Resources" Project.
    • Either by right-clicking the Project and selecting "Paste" or selecting the Project and pressing Ctrl + V
  3. A copy of the IP Profile is now placed in the Project.
  4. This means all objects within the Project can reference it. For example, the "HR OCR" OCR Profile can now reference it for temporary, pre-OCR image cleanup using the IP Profile property.

Copying and pasting is a quick and easy solution for getting simple objects from one Project to another. We all know how to copy and paste. This isn't a groundbreaking concept. However, as with many simple things, it's not without its drawbacks.

2022-project-references-03.png

First, be aware these are now two separate objects. One lives in one Project. The other lives in another Project. They are distinct resources.

Any changes made to the original object will not be reflected in the copied object (or vice versa).

  1. For example, here we've added a Shape Removal IP Step to the original IP Profile.
  2. Notice the copied IP Profile is unchanged. It just has the original two IP Steps from when it was copied to the Project.

This is one of the drawbacks to this approach. If you want to make changes to one object, you'll need to make the same changes to the other (assuming you want both objects to reflect the changes).

2022-project-references-04.png

Furthermore, there are situations where Grooper will not let you copy objects from one Project and paste them into another. This is a very intentional part of the Project object's design, done to preserve reference integrity.

Grooper allowed us to copy and paste the IP Profile because it did not reference any other object in its original Project. If it did, its functionality would be dependent on that referenced object in the first Project being present in the second Project.

Let's look at another example. In our "Invoice" Project's Content Model, we've built some extractor assets, including an address extractor. Let's say we want to bring that extractor into our "Human Resources" Project's Content Model.

  1. So, we want to copy this Data Type from the "Invoices" Project.
  2. To this Local Resources folder in our "Human Resources" Project.

2022-project-references-05.png

If we try to do this, Grooper is going to throw an error. Why? The Data Type, as part of its configuration, references several Lexicon objects.

  1. The error lets us know there is a reference violation.
  2. It tells in which Project the referenced objects are contained.

It also gives us the full node tree location within the Project of both the object doing the referencing (either the object you copied or one of its children) and the referenced object, using the following format:

referencing object's location -> referenced object's location

2022-project-references-06.png

Think of Projects like a friend's house. If your friend invites you over, he or she isn't surprised when you show up. But if you show up with a bunch of friends unannounced, they're going to take issue with you. There's now a bunch of random strangers in their house they didn't expect.

That's just like copying and pasting objects with references. Bringing in an object by itself is no big deal, but bringing along who knows how many objects it references is a big deal (Even more so considering any objects the referenced objects reference, and the objects the referenced objects' referenced objects reference and so on down the line). There's now a bunch of random objects you didn't expect cluttering up your Project.

This puts the onus on you, the user, to decide how you want to resolve these references. Again, there are strengths and drawbacks to each approach. It's up to you to decide what works best for your situation.

One thing you could do is copy all the needed referenced objects over to the second Project. Depending on the number of references you're dealing with, this could be a time consuming process, as it would involve the following steps:

  1. Copy and paste all the referenced objects from the first Project to the second.
  2. Unassign all the references in the object to be copied from the first Project
  3. Paste the object from the first Project to the second.
  4. Reassign all the references in the copied object to all the referenced objects pasted in step 1.

Depending on how these objects are organized, you could also copy and paste multiple objects at a time.

  1. The only reason we're having an issue here, is we need multiple objects coming into the "Human Resources" Project at a time.
    • The "VAL - Address" Data Type and (some of) the Lexicons in the "Lexicons" folder.
  2. Since the "VAL - Address" Data Type and "Lexicons" folder are siblings in the Local Resources folder, we can use the "Contents" tab to copy and paste both at the same time.
  3. Navigate to the "Contents" tab.
  4. Select all the objects you want to copy (using Ctrl + Left Click or Shift + Left Click).
    • In our case, the "Lexicons" folder and "VAL - Address" Data Type.
  5. Paste them into the desired location in the Project.

Since, we were able to copy the extractor, and a folder containing all the Lexicons it references and paste them all at the same time, Grooper allowed the move without any issue.

2022-project-references-24.png

Keep in mind, however, if you copy a folder, you're going to get everything in that folder.

  1. In our case, we copied over a couple additional Lexicons we may or may not want.

2022-project-references-25.png

Another option is to use Project references. This gives a Project referenceable access to all resources within one Project to another.

Option 2: Referencing a Project

Resources can be shared between two (or more) Projects by referencing the full Project. This gives explicit access to all objects within a Project, just as if they were created locally.

Let's go back to our problem copying an address extractor that references multiple Lexicons from one Project to another.

  1. We want a copy of this Data Type from the "Invoices" Project...
  2. ...in this Local Resources folder in the "Human Resources" Project.

As we saw previously, Grooper will not allow us to do this (yet).

2022-project-references-07.png

All we need to do in order to make this happen, is effectively tell Grooper it's ok for the "Human Resources" Project to share assets with the "Invoices" Project. We do this by referencing the whole Project.

  1. To allow access to another Project's resources, first select the Project requesting access in the node tree.
    • The "Human Resources" Project wants access to the address extractor in the "Invoice" Project. So we've selected "Human Resources".
  2. Select the Referenced Projects property and expand its dropdown menu.
  3. Choose which Project whose resources you want to access by checking the box next to its name.
    • In this case, we've selected the "Invoices" Project.
    • FYI: You can reference multiple Projects by checking the box next to multiple Project's names.

2022-project-references-08.png

  1. You'll see the referenced Project listed in the property grid.
  2. Be sure and save when finished.

2022-project-references-09.png

Now we can copy and paste all day long.

  1. We no longer get that error message if we copy the address extractor from the "Invoices" Project and paste it somewhere in the "Human Resources" Project.
  2. Because the Project is shared, it has a path to navigate to the Lexicons referenced by the extractor.

2022-project-references-10.png

You may also make direct references to any object in a referenced Project.

For example, because we've referenced the "Invoices" Project we could have simply referenced the address extractor without copying and pasting it.

  1. Here, we've added a Value Reader named "Address Ex 2" to illustrate this example.
  2. We've set its Extractor Type to Reference to demonstrate the reference.
    • FYI: The Reference Extractor Type simply returns the results of a referenced extractor.
  3. Using its Extractor property to select a reference, you can see we now have access to the "Invoices" Project.
  4. This means we can reference any and all objects contained within, including this address extractor.

2022-project-references-11.png

This is an effective way of sharing resources between multiple Projects without duplicating your efforts by creating multiple copies of shared resources that you have to manage independently in each Project.

The only downside to this approach lies in how many different Projects utilize a set of shared resources. If it boils down to a limited number of resources, or resources shared between very similar Projects (in terms of their use case), this approach can work out just fine. But when you get into more and more resources shared between more and more Projects the crisscrossed references between them can be difficult to navigate when you're trying to track down a single object used across a variety of Projects.

In those cases, you may want to do a little extra work and create an entirely separate Project just devoted to housing resources shared between multiple Projects. We will discuss creating and utilizing a "Shared Resources" Project in the next tab.

Please read the following before continuing. It contains best practice advice to avoid potential system corruption when dealing with Project referencing.

Just as you can make references to other Projects, you can remove those references as well. However, to prevent future corruption down the line, you should always ensure no object in your Project references objects in the other Project before removing its reference.

The easiest way to do this is with the "Analyze References" button at the top of Project's UI screen.

  1. Select the Project whose references you want to analyze.
  2. Press the "Analyze References" button.
  3. This will bring up a list of outbound references.
    • These are references objects in the selected Project make out to external Projects listed in the Referenced Projects.
  4. Any referenced object will be listed.
    • In our case, we made references to various Lexicons as well as the "VAL - Address" Data Type in the "Invoices" Project.

These outbound references indicate there are resources in this Project that are dependent on resources in the "Invoices" Project to function.

CAUTION!!!!

While it is technically possible to remove the reference to a Project without resolving these references, YOU SHOULD NOT DO SO. It is best practice to either:

  1. Keep the reference to the Project intact.
  2. Or, manually unassign the references to each object.

Please ensure there are no outbound references to the Project before removing the reference.

2022-project-references-12.png

Option 3: Creating and Referencing a Shared Resources Project

The last option is to use an entirely separate Project which is solely devoted to housing objects used and referenced by multiple Projects. This option is most appropriate for larger environments, processing different kinds of documents from different use cases. Given a big enough body of documents, despite the fact they may come from different industries or use cases, you will find commonly used resources that are generalizable across a variety of documents. This can include generic or semi-generic extractors, Lexicons, even IP Profiles and OCR Profiles.

In these cases, it often makes sense to create a "bucket" of resources from which all Projects can draw from. The idea is to create shared resources in a single Project referenced by multiple others. Or, in our case, we're going to move these assets to a "Shared Resources" Project.

FYI

Another common example of a shared resources are CMIS Connections and Data Connections.

It is often the case that multiple projects will reuse these connection objects to integrate Grooper with external storage platforms (such as content management systems and databases). Therefore, it would make sense to create something like a "Connections" Project containing these CMIS Connections and Data Connections. Instead of re-creating each connection object for each Project, all Projects can simply reference the "Connection" Project to gain access to the CMIS Connections and/or Data Connections required for import/export operations.

For instance, there are some fairly generic extractors in the "Invoices" Project we may want accessible to the "Human Resources" Project and future Projects as well.

  1. First we're going to move this generic text segment extractor.
    • This one is going to be the easier of the two.
  2. We'll also end up moving this address extractor, but that will take some extra work.
    • The downside to this approach is there is typically some work up front you'll need to engage in to organize your resources in order to get the benefit down the road.

2022-project-references-13.png

We are going to move these extractors to a new Project, which we will name "Shared Resources".

  1. Here, we've added the new Project.
  2. Since we want to move objects from the "Invoices" Project, we've also made a reference to that Project, using the Referenced Projects property.

For the first extractor, this job is very easy.

  1. We can simply cut this "VAL - Generic Segment" Value Reader from the "Invoices" Project, and we'll paste it into the "Shared Resources" Project.
    • Or, simply move it by dragging and dropping it.

2022-project-references-14.png

  1. The Value Reader moves to the "Shared Resources" Project with no issue.
    • Why? Noting else in the "Invoices" Project referenced it!
  2. We won't be so lucky with the "VAL - Generic Decimal" Value Reader.
  3. If we attempt to move this object, we will get a series of reference violation errors. There are several objects in the "Invoices" Project using (i.e. referencing) this extractor.

2022-project-references-15.png

Here's where we get into the extra work on the front end.

What we can do first, is copy the Value Reader. It makes no references to other objects. The issue here is that other objects are referencing it.

  1. So, we can copy it.
  2. And we can paste that copy into the "Shared Resources" Project.

2022-project-references-16.png

Now, if you truly want to use this as a "shared" or "global" resource, you can reassign all the references to the "VAL - Generic Decimal" extractor within the "Invoices" Project.

Ultimately, we will need the "Invoices" Project to reference the "Shared Resources" Project to reassign the references.

  1. First, to avoid a circular reference, we will need to unassign the "Shared Resources" Project's reference to the the "Invoices" Project.
  2. Before removing a Project reference, it is always best practice to analyze any outbound references to the external Project, using the "Analyze References" button.
  3. This will bring up the following diagnostic.
    • No outbound references are detected (meaning there is no object in the "Shared Resources" Project referencing out to objects in the "Invoices" Project). This is what we want to see. If there were outbound references, we would want to resolve them before removing the reference to the external Project.
  4. Press "OK" to continue.

You should always use the "Analyze Reference" button before removing a reference to a Project.

Grooper will technically allow you to remove a reference to a Project even with outbound refences outstanding. However, doing so is not best practice as it can cause corruption of your system down the road.

2022-project-references-17.png

  1. With no references detected from the "Invoices" Project, we can remove the Project reference without issue.
  2. Be sure to Save the project when finished.

2022-project-references-18.png

  1. Next, we need to get rid of the local extractor in the "Invoices" Project and replace it with the copy we placed in the "Shared Resources" Project.
  2. In order to access the extractor in the "Shared Resources" Project, the "Invoices" Project must reference the "Shared Resources" Project.
    • Here, we have selected the "Invoices" Project.
  3. Using the Referenced Projects property, we have selected the "Shared Resources" Project.

2022-project-references-19.png

  1. Now, we can go about the business of reassigning any reference to our local extractor to the one in our "Shared Resources" Project.

The quickest way to figure out every object that references a selected object in the node tree, is to use the "References" tab.

  1. To access this tab, (after selecting the object whose references you want to verify) select the "Advanced" tab.
  2. Then, select the "References" tab.
  3. This will list every object that references the object.
    • In our case, there's one Data Type extractor ("VE - Invoice Total") and three Data Column objects ("Quantity" "Price" and "Extended Price") referencing the selected extractor ("VAL - Generic Decimal")

What we could do from here is track down each of these objects, find where in their property grid the extractor is referenced, and reassign that reference to the version in the "Shared Resources" Project. That is a perfectly acceptable, although somewhat time consuming way to reassign references. Luckily, we have a shortcut available to us.

2022-project-references-20.png

The "Reassign References..." button will allow us to change the reference for each object in the list from the selected object, to a different one.

This is exactly what we want to do. We want to change the reference set on these Data Columns and Data Type from the "VAL - Generic Decimal" extractor in the "Invoices" Project to the copy we made in the "Shared Resources" Project.

  1. Press the "Reassign References..." button.
  2. This will bring up a window to select a new object for the reference.
  3. Check it out. Here's our referenced "Shared Resources" Project.
  4. Selecting the "VAL - Generic Decimal" Value Reader, we will reassign the reference to this extractor in our "Shared Resources" Project.
  5. Press "OK" to finish reassigning the references.

2022-project-references-21.png

  1. Because the extractor is no longer referenced by any other object, the "Referenced By" list is now empty. All the objects that were listed here, are now referencing the extractor we chose in our "Shared Resources" Project.
    • In other words, we reassigned the references.
    • We've effectively replaced the local Project's decimal extractor with one in an external Project, accessible to any other Project that references it.
  2. Since no other object references the local decimal extractor, and we've replaced its references with something else, it is now safe to delete it.

2022-project-references-22.png

As we've demonstrated, it's a little extra work if you decide you want to move resources from one Project to a shared resources Project. However, the benefit to organizing assets like this is any Project referencing our "Shared Resources" Project now have access to its assets.

  1. For example, we could tell our "Human Resources" Project to reference the "Shared Resources" Project.
  2. Now, both the "Human Resources" Project and the "Invoices" Project have access to its resources.
    • Furthermore, any changes we make to the object in the "Shared Resources" Project will be reflected by any object in any Project that touches it. This can prevent duplication of efforts when updating an object's properties.
    • If any other Projects or any future Projects can make use of these resources, all you have to do is assign it a reference to the "Shared Resources" Project. It acts as one big community bucket of resources other Projects can draw from.

2022-project-references-23.png

Click here to return to the top of the tab

The Essentials Project

Every newly created Grooper Repository in version 2022, will come with a Project named "Essentials". This Project contains several resources you may find useful when designing your document processing assets. Just like any other Project, you can access these resources by making a reference to the "Essentials" Project. The objects contained within can be examples of different types of objects you create, resources you can copy into your own Projects and build on top of, or simply resources you directly reference in your Projects.

2022-project-essentials-01.png

In this project you will find various:

  1. Data Type and Value Reader Extractors
  2. Lexicons
  3. Profile objects (OCR Profiles, IP Profiles etc.)

2022-project-essentials-02.png

Projects and Upgrading to 2022

Projects are a new way of organizing Grooper resources in version 2022. In previous versions, Grooper resources were organized primarily in one of three folders in the node tree:

  • The Content Models folder
  • The Global Resources folder
  • The Processes folder of the Batch Processing folder

Users would have to go back and forth between these locations in order to configure what they needed to process documents through Grooper. This often resulted in a time consuming and cumbersome process, sifting through the node tree's hierarchy to get to the objects you needed.

Projects simplify this issue by allowing you to place all associated resources for a given use case (or "project") in a single node location.


You can see the difference in the image to the right. All the required Grooper assets for one single document processing project are highlighted.

Before the introduction of Projects in 2022, these objects were interspersed throughout various locations in the node tree.

In version 2022, everything can be neatly placed in one, single location, making finding what you're looking for much simpler.


FYI

If you have certain Grooper resources that can be used by multiple Projects (such as extractors, profile objects, or CMIS Connections), you can grant multiple Projects access to them through Project references.

For more information, visit the #Referencing Objects in Other Projects section of this article.

2022-project-upgrade-about-01.png

What Happens When You Upgrade?

Obviously, this architecture is much different than how your assets are currently organized in Grooper. So, what's going to happen when you upgrade?

  1. Upon upgrading to version 2022, most Grooper objects in your repository will simply be placed into a new Project named "Project 1".
  2. All Content Models will be organized into a folder named "Content Models"
  3. All working Batch Processes will be placed in a folder named "Processes" within the Project
  4. Any published Batch Processes will be placed in the "Processes" folder at the first level of the node tree.

Anything in the "Global Resources" folder will be placed throughout "Project 1"

  1. If these objects were organized into a subfolder in the "Global Resources" folder, a folder of the same name will be created.
    • For example, in this Grooper Repository, there was a folder named "HR Docs Resources", containing a handful of Grooper objects. Upon upgrading, a folder of the same name, containing the same objects, was placed in "Project 1"
  2. Any unfoldered objects in the "Global Resources" folder will be placed at the first level of the "Project 1" Project.
  3. Last, all Production and Test Batches will be placed in the "Batches" folder at the first level of the node tree.

2022-project-upgrade-project1-01.png

Deciding What to Do Next

It's important to point out your Grooper environment will work just fine with everything organized into the single "Project 1" Project. You can leave everything as is in "Project 1" upon upgrading to version 2022 and continue processing Batches of documents as if nothing happened.

Going forward you have two options:

  1. Do nothing. Leave all Grooper resources organized into "Project 1"
  2. Migrate resources into their own Projects.

You should consider this an "all or nothing" choice. There are some significant benefits to organizing resources into their own Projects, but it should not be done haphazardly. You will not see the true benefits of this new architecture if you take a "half in/half out" approach. That said, migrating resources to new Projects will take time. There are some utilities that will aid you in this task, but there will necessarily be some manual moving of objects from one node location to another.

So, should you migrate away from "Project 1" at all? Here are some things to keep in mind, when making this decision.

  1. It's all or nothing.
    • Again, we stress the importance of committing to the move. You should commit to migrating everything to new Projects (with the exception of a handful of shared resources), rather than just a few. The benefits of the Project architecture will not be realized until you've completed the entire process. Not following this advice increases the likelihood of a time-sensitive call to the help-desk in the future. This call will likely be time-consuming as we attempt to track down the issue through a partially architected system.
  2. You don't have to move things from "Project 1" at all."
    • If you do not have the time or resources to migrate out of "Project 1", it's best to leave everything in "Project 1". Everything will continue to work as it did previously.
  3. Do you have time to do it?
    • This is probably the biggest question you need to ask yourself. The migration will take time. The larger the repository is, with many Content Models, Batch Processes, profiles and other objects, the longer it's going to take.
  4. Do you have a lot of "shared resources"?
    • If you frequently have individual Data Types, Lexicons, profiles, CMIS Connections or other objects used across many different Content Models and Batch Processes, this will take the highest amount of time and effort to migrate. Ensuring these shared resources are accessible to each Project created is the most time consuming part of any migration out of "Project 1".
  5. Do you frequently promote objects from a "test" or "dev" Grooper Repository to a "production" Grooper Repository?
    • If so, Projects are for you. The new architecture provides multiple advantages to this kind of workflow. You should seriously consider devoting the time to migrate resources into their own Projects, if you maintain multiple environments to publish Grooper objects from development to production repositories.
  6. Do you use third-party data entry companies to review work in Grooper?
    • If so, Projects are for you. You'll benefit from being able to push complete and tidy project packages to an environment dedicated to that company.
  7. Do you have multiple Grooper engineers working in the same Grooper Repository(ies)?
    • If so, Projects are really for you. Aside from object organization, the other big reason for creating the Project architecture was to maintain object reference integrity. Projects will greatly assist you in preventing reference corruption in your Grooper environments.

Project Migration Plan

Ok, you've decided Projects are for you, and you want to move resources out of "Project 1" to best take advantage of them. What are the next steps forward?

We've narrowed the process down to seven general steps:

  1. Clean up your repository. Delete items that are no longer in use and will not be used in the future.
  2. Use the "Create Project" feature for each Batch Process.
  3. As each Project is created, rename any objects as needed if your prior naming conventions no longer make sense.
  4. For each Project, use the "Analyze References" feature to decide what to do about "shared resources" used by multiple Projects.
  5. Remove Project references if the "Outbound References" list is empty.
  6. Reorganize any shared resource objects that remain in "Project 1"
  7. Rename "Project 1" to something like "Global Resources" or "Shared Resources"

1. Clean House

If you're going to take the time to reorganize your resources into Projects, now is a good time to take a look at the Grooper objects in your repository and get rid of anything not in use littering up your environment.

This is entirely optional, but now is as good a time as any to clean house.

  1. For example, we have a dummy IP Profile named "temp" and Value Reader named "test" in our newly upgraded "Project 1" Project.
    • I have no idea where these objects came from or what their original purpose was. They aren't being used by anything else in this environment. It's best to just delete them to get them out of the way.

2022-project-upgrade-steps-01.png

2. Create Project

Now we can start in earnest and create some Projects. You could do this manually. The steps would be as follows.

  1. Add a Project to the "Projects" folder.
  2. Using the Project's Referenced Projects property, reference "Project 1".
  3. Move a Batch Process to that Project.
  4. Move the Content Model associated with that Batch Process to the Project.
  5. Move any other Grooper objects referenced by the Batch Process or Content Model's objects to the Project.
    • Or keep any "shared resources" put in "Project 1", maintaining access to them through the Project reference (We'll discuss this further in Step 4: Analyze References).

There's nothing wrong with this approach, but there's a quicker way of doing things (or at least starting this process) using the "Create Project" feature.

The "Create Project" feature is accessed by selecting a Batch Process. If you think about it, a Batch Process should reference any Grooper object necessary to do work for a particular use case. All the necessary objects will be referenced in the steps of the Batch Process as part of its execution, such as a Content Model referenced for a Classify step or an OCR Profile referenced for a Recognize step.

The "Create Project" utility will create a new Project, named the same as the Batch Process's name, look for any objects referenced as part of its execution, and move them to the new Project.

Important! "Create Project" will only move objects not referenced by anything else. If another Batch Process uses the same OCR Profile, for example, that OCR Profile will remain in "Project 1". We will discuss this further in "Step 4: Analyze References".

We will start by creating a new Project using a fairly simple document redaction Batch Process. This is an entirely "self-contained" Batch Process. No other Batch Process utilizes its resources.

  1. This is the Batch Process we will create the Project from.
  2. It references this Content Model, including resources in its Local Resources folder.
  3. Specifically, the "URLA" Document Type is referenced as the Extract step's Default Content Type. Since this Document Type is referenced, its parent Content Model (and all its children, including Local Resources folder and Data Model) will be moved to the new Project.
  4. This OCR Profile is also referenced (by the Recognize step). So, it will move to the new Project as well.

2022-project-upgrade-steps-02.png

To create the Project, perform the following steps.

  1. Select the Batch Process you wish to use to create the new Project.
  2. Press the "Create Project" button in the Batch Process's toolbar.

There are two configurable options when creating the new Project.

  1. The Remove Emptied Folders property will delete a folder from "Project 1" if it is empty after objects are moved to the new Project.
    • Generally speaking, you'll want this property set to True. It cleans up empty folders in "Project 1". Why would you want to keep a bunch of empty folders around? I don't know. If you have a reason to keep these empty folders, you can keep this property False.
  2. The Organize Into Folders button will create folders in the new Project for each type of object moved.
    • If this property is set to True an "OCR Profiles" folder would be created for any OCR Profile moved to the new Project. A "Content Models" folder would be created for any Content Models moved.
    • Most people will elect to keep this property False, as you probably want to establish your own organizational structure to your Project. However, this option is present if you find it helpful when initially moving objects to the Project to group like objects into like folders.
  3. Press "Execute" to create the Project.

2022-project-upgrade-steps-03.png

When the utility finishes running, a new Project will be created. All objects associated with the Batch Process are moved from "Project 1" to the new Project (as long as that move is allowed. Again, we'll talk more about moves that aren't allowed during Step 4).

  1. In our case, this Project named "URLA Redaction" was created.
    • The new Project will always be named after the Batch Process.
  2. A total of three objects were moved from "Project 1" and placed in the new Project.
    • This was a relatively simple Batch Process. More complicated Batch Processes will more likely than not have more objects referenced, and therefore, more objects moved. But, you can pretty much guarantee you'll at least end up with the Batch Process and a Content Model in the new Project. With few exceptions, you're always going to need a Batch Process and a Content Model to do work in Grooper. In general, each Project will have one Batch Process and one Content Model.
FYI Grooper renamed our Content Model to "Content Model". Why? In "Project 1", the Batch Process and Content Model were both named "URLA Redaction". Object names in the same branch of the node tree must be unique. Grooper will rename any object sharing the name of the source Batch Process after their object type. Therefore the Content Model named "URLA Redaction" was renamed "Content Model".

2022-project-upgrade-steps-04.png

3. Rename Resources

With the switch to the Project architecture, you may find your naming convention no longer makes sense or could be adjusted. Much like the "Clean Up" step, this step is not strictly necessary. But, if you're going through the effort to reorganize your repository into a new structure, you might as well make sure how you're naming things make sense in that new structure.

If you're coming from an environment with a lot of Batch Processes and a lot of Content Models you've probably named your resources according to their intended use case. So you might have "Use Case X Content Model", "Use Case X OCR Profile" "Use Case X IP Profile" and so on. You may find this naming superfluous once all these assets are moved over to a Project. So, it might make more sense to you to just rename these objects after their generic object type or function in the Batch Process workflow.

  1. For example, we've renamed our OCR Profile from "URLA OCR" to simply "OCR"
  2. Name your resources whatever makes sense to you in your environment. "Batch Process" may be too generic of a name if you're executing multiple different Batch Processes. We went ahead and stuck with "URLA Redaction" for our Batch Process here.

2022-project-upgrade-steps-05.png

4. Analyze References

So far, this process has been fairly simple. With the press of the "Create Project" button, all resources associated with a Batch Process are moved to a new Project. Our previous example was so simple, because all the resources were fairly self-contained, or "local" to the Batch Process.

That is not always the case. Particularly with larger environments, you will find you reuse resources across a variety of Batch Processes. For example, the "Full Text - Accurate" OCR Profile, in our "Essentials" downloads, is many Grooper users "go to" OCR Profile if they don't have the time or the will to create their own. This OCR Profile would be a shared or "global" resource, touched by multiple different Batch Processes.

When you have shared resources, they will not be copied over to a newly created Project when using the "Create Project" feature. They can't. Other Batch Processes need to use that resource too. Instead, all resources that are not shared are moved to a newly created Project, and a Project reference is made to "Project 1". The Project reference allows the new Project access to the resources it needs in "Project 1".

  1. We will create Projects for these two Batch Processes next.
  2. However, they share a number of resources in their Batch Process, two IP Profiles and a CMIS Connection.
  3. For example, they both use the "Image Cleanup - Permanent" IP Profile in executing the Image Processing steps of their Batch Processes.

2022-project-upgrade-steps-06.png

Next, we will press the "Create Project" button to create a new Project from each Batch Process, starting with the "Invoices Process" Project.

  1. Select the Batch Process you wish to use to create the Project.
    • In our case, we're starting with the Batch Process named "Invoices Process".
  2. Press the "Create Project" button.
  3. Configure the Project creation properties as desired.
  4. Press the "Execute" button to create the Project.

2022-project-upgrade-steps-07.png

  1. A new Project is created and several resources have been moved.
  2. However, select the Project created.
  3. Notice the Referenced Projects property shows a Project reference to "Project 1".

This indicates there is something in "Project 1" the Batch Process utilizes that can't be moved because another Batch Process (or its associated objects) utilize it in one way or another.

Essentially, both Batch Processes need to reference one or more objects. So those objects stay put in "Project 1" and are accessible through the Project reference. By referencing "Project 1", the Project we just created has access to all its resources, including whatever it is it needs to function.

So, just what resources are out in "Project 1" that our new Project needs? Good question. You can quickly answer this with the "Analyze References" feature.

2022-project-upgrade-steps-08.png

  1. Press the "Analyze References" button to view all objects in "Project 1" referenced by the selected Project.
  2. This will bring up a list of outbound references.
    • These are references objects in the selected Project make out to external Projects listed in the Referenced Projects.
  3. Any referenced object will be listed.
    • In our case, we made references to three objects in the "Shared Resources" folder of "Project 1"
      • Two IP Profiles: "Image Cleanup - OCR" and "Image Cleanup - Permanent"
      • One CMIS Connection's ("NTFS - Local Hard Drive") CMIS Repository ("Import Export")

2022-project-upgrade-steps-09.png

Now that we know what is shared. We have two options:

  1. Copy these resources from "Project 1" so that we have local copies of these resources.
  2. Keep these shared resources put so that every Project that needs them can reference the same object.

Your choice will largely depend on how big your environment is, how many times the resources in "Project 1" are referenced by different potential Projects, and if you prefer to have local copies of these resources that can be edited independently or if you want these resources to be truly shared across different Projects (meaning changing one single object will impact how multiple Projects implement it).

Option 1: Copy the Resources

The process will be most time consuming if you want to copy these objects over to the Project. However, doing so does have benefits. Once you copy these shared resources from "Project 1", you will no longer need the reference to "Project 1". At that point the Project is independent and self contained. If you need to share this Project with another Grooper environment (say promoting it from a "test" repository to a "development" repository) it will have no dependencies to another Project that need to be shared along side it.

Aside from the time it takes to do this, the only drawback is the resources are then completely local to the Project. Changes to the versions copied from "Project 1" will be separate objects from the versions copied to the Project. This means they must be edited independently if you want to make changes to them.

Copying these resources over is not necessarily the hard part. It's even easier in this case because they're already all in the same folder.

  1. We can simply copy the whole folder.
  2. Or we can select the folder, and go to the "Contents" tab.
  3. Then, multi-select all the objects we want to copy.
    • In this case, all of them.
  4. Paste them into the Project you want.
FYI Depending on the reference complexity of the objects you are copying, this process will be more complicated. It will require you to track down the referenced objects, and bring them over either a) before you copy and paste the object referencing it (which will require you to reset the reference after everything is copied over) or b) copy and paste everything at the same time.

For more information on copying and pasting objects from one Project to another, please refer to the #Referencing Objects in Other Projects

2022-project-upgrade-steps-10.png

  1. Now we have local copies in our Project of these referenced resources from "Project 1".

2022-project-upgrade-steps-11.png

Now the time consuming part of this process. All the references must be reassinged from the source objects in "Project 1" to their local copies in the Project.

  1. For example, this OCR Profile uses one of the IP Profiles we just copied.
  2. We would need to select the property where the reference is assigned.
    • The IP Profile' property in this case.
  3. Then, we'd need to change the reference from the source object in "Project 1"
  4. ...to the local copy in our Project.

The more resources you copy over, and the more complex references and sub-references you're faced with, the more time consuming this process will take.

2022-project-upgrade-steps-12.png

Option 2: Keep Project 1 Shared

The other option is to just keep these resources shared. Your Batch Process will function as it did before. The only difference is your Project will be dependent on the referenced Project ("Project 1") to function.

  1. Here, we used the "Create Project" button to create a new Project from the last Batch Process in the Grooper Repository.
  2. We've used the "Analyze References" button to see what's referenced in "Project 1"
  3. It's utilizing one IP Profile and one CMIS Connection we saw in the previous example.
  4. It's also utilizing some of the resources in our Grooper Essentials package.

If you're fine with keeping these as shared resources in a referenced Project, you're done here. There's no need to go through the time consuming, copy and paste and reference reassignment dance we did earlier. The only potential drawback here is you've made this Project dependent on "Project 1". You will need to evaluate for yourself whether this is a drawback, a benefit, or doesn't really matter one way or another.

2022-project-upgrade-steps-13.png

Option 3: Option 1 + Option 2

There's also no reason why you can't copy some items over to your new Project and keep others referenced through Project references.

For example, in our first example, we could have copied over the two IP Profiles but kept the CMIS Connection as a shared resource. In fact, that would have made more sense. If we have to make changes to a CMIS Connection (like entering new access permissions), we would only want to do that once by manipulating one single object, rather than reproducing our efforts by editing multiple copies of the same object in multiple Projects.

5. Remove Project References

Now, we're at a point where we've used the "Create Project" feature for every Batch Process in "Project 1". What's next?

In our situation, we've created some Projects that need to retain a reference to "Project 1" and some that do not. For those that do not, we should go ahead and remove the reference to Project 1. To do this, the "Analyze References" feature will once again be useful. For each Project we will want to use the "Analyze References" button to check and see if there are any outbound references. If there are, we'll keep the reference intact. If there are not, we will remove the reference.

  1. We'll start with the "Human Resources" Project
  2. Press the "Analyze References" button.
  3. There are outbound references.

So, we do nothing.

  1. The reference to "Project 1" must stay.

2022-project-upgrade-steps-14.png

  1. Next, we'll analyze the "Invoices Process" Project's references.
  2. Press the "Analyze References" button.
  3. There are not any outbound references.

Now, we know it's safe to remove the reference to "Project 1".

2022-project-upgrade-steps-15.png

  1. Using the Referenced Projects property, remove the reference to "Project 1"
  2. Press "Save" when finished.
You should always use the "Analyze Reference" button before removing a reference to a Project.

Grooper will technically allow you to remove a reference to a Project even with outbound refences outstanding. However, doing so is not best practice as it can cause corruption of your system down the road.

2022-project-upgrade-steps-16.png

  1. Continue selecting your Projects to analyze their references.
  2. Once all Projects with no outstanding references to "Project 1" have had their Project reference to "Project 1" removed, you are done.

2022-project-upgrade-steps-17.png

6. Reorganize Shared Resources

After you're done creating Projects from the Batch Process in "Project 1", you'll want to clean up the remaining resources in "Project 1".

This will include:

  1. Organizing any leftover assets into manually created Projects, if applicable.
  2. Organizing any remaining shared resources into folders and deleting empty folders, as you see fit.

Manually Creating Projects

There are some situations where you may need to manually create a Project for certain assets remaining in "Project 1". Most commonly, this can happen if you have resources you've created for testing purposes that are not tied to a Batch Process and you want to keep them around.

  1. For example, we have two Content Models leftover.
  2. We also have this IP Profile

It seems like these are partially architected resources from whenever this Grooper user went through Grooper's ACE - Architect training. We may want to keep these around so this user can continue their training.

The only issue is there is not Batch Process utilizing these resources, so we can't take advantage of our "Create Project" utility.

2022-project-upgrade-steps-18.png

So, we'll need to manually create a Project

  1. Right click the Projects folder in the node tree.
  2. Select "Add", then "Project..."
  3. Name the new Project.
  4. Press "Ok" when finished.

2022-project-upgrade-steps-19.png

From here, you can cut and paste or move resources from "Project 1" to the new Project.

  1. For example, we moved this IP Profile from "Project 1" to the "ACE Training" Project we created.

Next, we're going to look at a potential issue you may encounter when moving resources out of "Project 1" to another Project.

2022-project-upgrade-steps-20.png

  1. I want to move either of these two Content Models from "Project 1" to the "ACE Training" Project.
  2. If I try to move one, I get the following error message.
    • This is telling me there is a reference violation. The Content Model is dependent on some external reference, and can't be moved without resolving the reference.

Why did this happen? Long story short, someone did something they should have never done, but was technically allowed in previous versions of Grooper.

2022-project-upgrade-steps-25.png

The problem is someone made a reference within a Content Model to something in another Content Model.

  1. This Data Field is causing the problem.
  2. Its Value Extractor is set to Reference a Data Type.
  3. The problem is the Data Type it's referencing is in a different Content Model.

This is a big "no-no" that was technically possible in Grooper, but never considered best practice. This may have happened by accident when copying a Content Model. This may have been a "quick fix" that some Grooper designer did, intending to go back and resolve but never got around to it. Who knows. The main issue is these types of reference violations can cause problems down the road, potentially causing corruption in your Grooper environment.

Part of the reason Projects were created was to avoid this type of corruption due to improperly referenced objects. Referencing Projects via the Referenced Projects property makes external resource references much more intentional, avoiding accidental reference violations (as much as possible).

2022-project-upgrade-steps-21.png

In order to resolve this reference violation and get these objects moved to the new Project we need to resolve the reference violation in one way or another.

  1. We could clear the offending reference, or reset it to something that doesn't violate a reference across multiple Content Models.
  2. Or, since this Content Model is really just a copy of the other one, we could just delete it.
    • This is what I will elect to do.

2022-project-upgrade-steps-22.png

  1. With the reference violation resolved, the Content Model can be moved with no issue.

2022-project-upgrade-steps-23.png

Clean Up Remaining Folders

After you've moved everything out of "Project 1" into a new Project (whether manually or through the "Create Project" feature), the only resources left should be shared resources, objects intended to be used by any current or future Project.

All that's left is to organize the objects and folders remaining.

  1. You may have empty folders you can simply delete.

FYI: This "Content Models" folder will be "Read Only". This is a carry over from this folder being a system node in the previous version of Grooper's architecture.

  1. To delete it, you will need to select the folder, then go to the "Advanced" tab.
  2. Then, under Attributes, change Read Only to False.
    • Then navigate off the object. You will be prompted to "Save". After saving, you will be able to delete it.

You may also want to create some new folders for stray objects or move some folders to the root of the Project.

  1. For example, many users find it helpful to move the Grooper "Essentials" folder to the root of the Project, then delete the "Downloads" folder.

2022-project-upgrade-steps-24.png

7. Rename Project 1

Once all resources are out of "Project 1" and you've organized any shared resources remaining, rename "Project 1" to something that reflects its true utility, something like "Shared Resources" or "Global Resources".

  1. We named ours "Global Resources".

That's it! The migration from "Project 1" to Grooper's new Project based architecture is complete!!

2022-project-upgrade-steps-26.png