Project Migration Plan

From Grooper Wiki

This article was migrated from an older version and has not been updated for the current version of Grooper.

This tag will be removed upon article review and update.

This article is about the current version of Grooper.

Note that some content may still need to be updated.

20252024

WHO IS THIS ARTICLE FOR?

This article is intended for users who are:

  1. Upgrading to version 2023 or newer
  2. Coming from version 2021 or older
  3. And, unfamiliar with Projects

package_2 Project node objects are the primary containers for configuration nodes within Grooper. The Project is where various processing objects such as stacks Content Models, settings Batch Processes, profile objects, and more are organized and managed. It allows for the encapsulation and modularization of these resources for easier management and reusability.

Projects are a new way of organizing Grooper resources first implemented in Grooper version 2022. Before Projects were introduced, node objects were spread throughout various folder locations in the node tree.

Projects allow users to organize resources used for a similar purpose (or "project") in a single container.

The screenshot to the right shows the difference.

  • Before Projects, similar resources were scattered throughout the node tree.
  • With Projects, similar resources can be grouped together under a single node.

FYI

If you have certain Grooper resources that can be used by multiple Projects (such as extractors, profile objects, or CMIS Connections), you can grant multiple Projects access to them through Project references.

For more information, visit the Project article.

What happens when you upgrade?

Upon upgrading to a version with Projects, the organization of your Grooper Repository's node tree will change.

What happens to my Batches?

Not much.

The "Batch Processing" node has gone away. The "Batches" node has now moved to just below the Grooper Root node.

  • Production Batches will stay in the "Production" branch of the "Batches" node.
  • Test Batches will stay in the "Test" branch of the "Batches node.

What happens to my published Batch Processes?

Both published Batch Processes are moving.

There is no longer a "Processing" folder. Any published Batch Processes will be placed in the "Processes" folder at the first level of the node tree.

What happens to my working Batch Processes, Content Models and everything else?

Find them in the "Projects" node.

Most Grooper objects in your repository will be placed into a new Project named "Project 1".

  • All Content Models will be organized into a folder named "Content Models".
  • All working Batch Processes will be placed in a folder named "Processes".
  • Anything in the "Global Resources" folder will be placed throughout "Project 1".

Deciding What to Do Next

It's important to point out your Grooper environment will work just fine with everything organized into the single "Project 1" Project. You can leave everything as is in "Project 1" upon upgrading to version 2023 and continue processing Batches of documents as if nothing happened.

Going forward you have two options:

  1. Do nothing. Leave all Grooper resources organized into "Project 1"
  2. Migrate resources into their own Projects.

You should consider this an "all or nothing" choice. There are some significant benefits to organizing resources into their own Projects, but it should not be done haphazardly. You will not see the true benefits of this new architecture if you take a "half in/half out" approach. That said, migrating resources to new Projects will take time. There are some utilities that will aid you in this task, but there will necessarily be some manual moving of objects from one node location to another.

So, should you migrate away from "Project 1" at all? Here are some things to keep in mind, when making this decision.

  1. It's all or nothing.
    • Again, we stress the importance of committing to the move. You should commit to migrating everything to new Projects (with the exception of a handful of shared resources), rather than just a few. The benefits of the Project architecture will not be realized until you've completed the entire process. Not following this advice increases the likelihood of a time-sensitive call to the help-desk in the future. This call will likely be time-consuming as we attempt to track down the issue through a partially architected system.
  2. You don't have to move things from "Project 1" at all."
    • If you do not have the time or resources to migrate out of "Project 1", it's best to leave everything in "Project 1". Everything will continue to work as it did previously.
  3. Do you have time to do it?
    • This is probably the biggest question you need to ask yourself. The migration will take time. The larger the repository is, with many Content Models, Batch Processes, profiles and other objects, the longer it's going to take.
  4. Do you have a lot of "shared resources"?
    • If you frequently have individual Data Types, Lexicons, profiles (ex: OCR Profiles), CMIS Connections or other objects used across many different Content Models and Batch Processes, this will take the highest amount of time and effort to migrate. Ensuring these shared resources are accessible to each Project created is the most time consuming part of any migration out of "Project 1".
  5. Do you frequently promote objects from a "test" or "dev" Grooper Repository to a "production" Grooper Repository?
    • If so, Projects are for you. The new architecture provides multiple advantages to this kind of workflow. You should seriously consider devoting the time to migrate resources into their own Projects, if you maintain multiple environments to publish Grooper objects from development to production repositories.
  6. Do you use third-party data entry companies to review work in Grooper?
    • If so, Projects are for you. You'll benefit from being able to push complete and tidy project packages to an environment dedicated to that company.
  7. Do you have multiple Grooper engineers working in the same Grooper Repository(ies)?
    • If so, Projects are really for you. Aside from object organization, the other big reason for creating the Project architecture was to maintain object reference integrity. Projects will greatly assist you in preventing reference corruption in your Grooper environments.

Project Migration Plan

Ok, you've decided Projects are for you, and you want to move resources out of "Project 1" to best take advantage of them. What are the next steps forward?

We've narrowed the process down to seven general steps:

  1. Clean up your repository. Delete items that are no longer in use and will not be used in the future.
  2. Use the "Create Project" command for each Batch Process.
  3. As each Project is created, rename any objects as needed if your prior naming conventions no longer make sense.
  4. For each Project, use the "Usage" tab to decide what to do about "shared resources" used by multiple Projects.
  5. Remove Project references if the "Outbound References" list is empty.
  6. Reorganize any shared resource objects that remain in "Project 1"
  7. Rename "Project 1" to something like "Global Resources" or "Shared Resources"

You may download and import the files below into your own Grooper environment (version 2024).

1. Clean House

If you're going to take the time to reorganize your resources into Projects, now is a good time to take a look at the Grooper objects in your repository and get rid of anything not in use littering up your environment.

This is entirely optional, but now is as good a time as any to clean house.

  1. For example, we have a dummy IP Profile named "temp" and Value Reader named "test" in our newly upgraded "Project 1" Project.
    • I have no idea where these objects came from or what their original purpose was. They aren't being used by anything else in this environment. It's best to just delete them to get them out of the way.

2. Create Project

Now we can start in earnest and create some Projects. You could do this manually. The steps would be as follows.

  1. Add a Project to the "Projects" folder.
  2. Using the Project's Referenced Projects property, reference "Project 1".
  3. Move a Batch Process to that Project.
  4. Move the Content Model associated with that Batch Process to the Project.
  5. Move any other Grooper objects referenced by the Batch Process or Content Model's objects to the Project.
    • Or keep any "shared resources" put in "Project 1", maintaining access to them through the Project reference (We'll discuss this further in Step 4: Analyze References).

There's nothing wrong with this approach, but there's a quicker way of doing things (or at least starting this process) using the "Create Project" feature.

The "Create Project" feature is accessed by selecting a Batch Process. If you think about it, a Batch Process should reference any Grooper object necessary to do work for a particular use case. All the necessary objects will be referenced in the steps of the Batch Process as part of its execution, such as a Content Model referenced for a Classify step or an OCR Profile referenced for a Recognize step.

The "Create Project" utility will create a new Project, named the same as the Batch Process's name, look for any objects referenced as part of its execution, and move them to the new Project.

Important! "Create Project" will only move objects not referenced by anything else. If another Batch Process uses the same OCR Profile, for example, that OCR Profile will remain in "Project 1". We will discuss this further in "Step 4: Analyze References".

We will start by creating a new Project using a fairly simple document redaction Batch Process. This is an entirely "self-contained" Batch Process. No other Batch Process utilizes its resources.

  1. This is the Batch Process we will create the Project from.
  2. It references this Content Model, including resources in its Local Resources folder.
  3. Specifically, the "URLA" Document Type is referenced as the Extract step's Default Content Type. Since this Document Type is referenced, its parent Content Model (and all its children, including Local Resources folder and Data Model) will be moved to the new Project.
  4. This OCR Profile is also referenced (by the Recognize step). So, it will move to the new Project as well.

To create the Project, perform the following steps.

  1. Select the Batch Process you wish to use to create the new Project.
  2. Right-click the project and select "Create Project."

When the util

There are two configurable options when creating the new Project.

  1. The Remove Emptied Folders property will delete a folder from "Project 1" if it is empty after objects are moved to the new Project.
    • Generally speaking, you'll want this property set to True. It cleans up empty folders in "Project 1". Why would you want to keep a bunch of empty folders around? I don't know. If you have a reason to keep these empty folders, you can keep this property False.
  2. Press "Execute" to create the Project.

When the utility finishes running, a new Project will be created. All objects associated with the Batch Process are moved from "Project 1" to the new Project (as long as that move is allowed. Again, we'll talk more about moves that aren't allowed during Step 4).

  1. In our case, this Project named "URLA Redaction" was created.
    • The new Project will always be named after the Batch Process.
  2. A total of three objects were moved from "Project 1" and placed in the new Project.
    • This was a relatively simple Batch Process. More complicated Batch Processes will more likely than not have more objects referenced, and therefore, more objects moved. But, you can pretty much guarantee you'll at least end up with the Batch Process and a Content Model in the new Project. With few exceptions, you're always going to need a Batch Process and a Content Model to do work in Grooper. In general, each Project will have one Batch Process and one Content Model.
FYI

Grooper renamed our Content Model to "Content Model". Why? In "Project 1", the Batch Process and Content Model were both named "URLA Redaction". Object names in the same branch of the node tree must be unique. Grooper will rename any object sharing the name of the source Batch Process after their object type. Therefore the Content Model named "URLA Redaction" was renamed "Content Model".

3. Rename Resources

With the switch to the Project architecture, you may find your naming convention no longer makes sense or could be adjusted. Much like the "Clean Up" step, this step is not strictly necessary. But, if you're going through the effort to reorganize your repository into a new structure, you might as well make sure how you're naming things make sense in that new structure.

If you're coming from an environment with a lot of Batch Processes and a lot of Content Models you've probably named your resources according to their intended use case. So you might have "Use Case X Content Model", "Use Case X OCR Profile" "Use Case X IP Profile" and so on. You may find this naming superfluous once all these assets are moved over to a Project. So, it might make more sense to you to just rename these objects after their generic object type or function in the Batch Process workflow.

  1. For example, we've renamed our OCR Profile from "URLA OCR" to simply "OCR"
  2. Name your resources whatever makes sense to you in your environment. "Batch Process" may be too generic of a name if you're executing multiple different Batch Processes. We went ahead and stuck with "URLA Redaction" for our Batch Process here.

4. Analyze References

So far, this process has been fairly simple. With the use of the "Create Project," all resources associated with a Batch Process are moved to a new Project. Our previous example was so simple, because all the resources were fairly self-contained, or "local" to the Batch Process.

That is not always the case. Particularly with larger environments, you will find you reuse resources across a variety of Batch Processes. For example, the "Full Text - Accurate" OCR Profile, in our "Essentials" downloads, is many Grooper users "go to" OCR Profile if they don't have the time or the will to create their own. This OCR Profile would be a shared or "global" resource, touched by multiple different Batch Processes.

When you have shared resources, they will not be copied over to a newly created Project when using the "Create Project" feature. They can't. Other Batch Processes need to use that resource too. Instead, all resources that are not shared are moved to a newly created Project, and a Project reference is made to "Project 1". The Project reference allows the new Project access to the resources it needs in "Project 1".

  1. We will create Projects for these two Batch Processes next.
  2. However, they share a number of resources in their Batch Process, two IP Profiles and a CMIS Connection.
  3. For example, they both use the "Image Cleanup - Permanent" IP Profile in executing the Image Processing steps of their Batch Processes.

Next, we will select "Create Project" to create a new Project from each Batch Process, starting with the "Invoices Process" Project.

  1. Select the Batch Process you wish to use to create the Project.
    • In our case, we're starting with the Batch Process named "Invoices Process".
  2. Right-click and select "Create Project."

  1. Configure the Project creation properties as desired.
  2. Press the "Execute" button to create the Project.

  1. A new Project is created and several resources have been moved.
  2. However, select the Project created.
  3. Notice the Referenced Projects property shows a Project reference to "Project 1".

This indicates there is something in "Project 1" the Batch Process utilizes that can't be moved because another Batch Process (or its associated objects) utilize it in one way or another.

Essentially, both Batch Processes need to reference one or more objects. So those objects stay put in "Project 1" and are accessible through the Project reference. By referencing "Project 1", the Project we just created has access to all its resources, including whatever it is it needs to function.

So, just what resources are out in "Project 1" that our new Project needs? Good question. You can quickly answer this with the "Usage" tab.

  1. Select the "Usage" tab to view all objects in "Project 1" referenced by the selected Project.
  2. This will bring up a list of outbound references.
    • These are references objects in the selected Project make out to external Projects listed in the Referenced Projects.
  3. Any referenced object will be listed.
    • In our case, we made references to three objects in the "Shared Resources" folder of "Project 1"
      • Two IP Profiles: "Image Cleanup - OCR" and "Image Cleanup - Permanent"
      • One CMIS Connection's ("NTFS - Local Hard Drive") CMIS Repository ("Import Export")

Now that we know what is shared. We have two options:

  1. Copy these resources from "Project 1" so that we have local copies of these resources.
  2. Keep these shared resources put so that every Project that needs them can reference the same object.

Your choice will largely depend on how big your environment is, how many times the resources in "Project 1" are referenced by different potential Projects, and if you prefer to have local copies of these resources that can be edited independently or if you want these resources to be truly shared across different Projects (meaning changing one single object will impact how multiple Projects implement it).

Option 1: Copy the Resources

The process will be most time consuming if you want to copy these objects over to the Project. However, doing so does have benefits. Once you copy these shared resources from "Project 1", you will no longer need the reference to "Project 1". At that point the Project is independent and self contained. If you need to share this Project with another Grooper environment (say promoting it from a "test" repository to a "development" repository) it will have no dependencies to another Project that need to be shared along side it.

Aside from the time it takes to do this, the only drawback is the resources are then completely local to the Project. Changes to the versions copied from "Project 1" will be separate objects from the versions copied to the Project. This means they must be edited independently if you want to make changes to them.

Copying these resources over is not necessarily the hard part. It's even easier in this case because they're already all in the same folder.

  1. We can simply copy the whole folder.
  2. Or we can select the folder, and go to the "Contents" tab.
  3. Then, multi-select all the objects we want to copy.
    • Crtl + click to do this.
    • In this case, all of them.
  4. Paste them into the Project you want.

FYI

Depending on the reference complexity of the objects you are copying, this process will be more complicated. It will require you to track down the referenced objects, and bring them over either a) before you copy and paste the object referencing it (which will require you to reset the reference after everything is copied over) or b) copy and paste everything at the same time.

For more information on copying and pasting objects from one Project to another, please refer to the #Referencing Objects in Other Projects section of this article.

  1. Now we have local copies in our Project of these referenced resources from "Project 1".

Now the time consuming part of this process. All the references must be reassigned from the source objects in "Project 1" to their local copies in the Project.

  1. For example, this OCR Profile uses one of the IP Profiles we just copied.
  2. We would need to select the property where the reference is assigned.
    • The IP Profile' property in this case.
  3. Then, we'd need to change the reference from the source object in "Project 1"
  4. ...to the local copy in our Project.

The more resources you copy over, and the more complex references and sub-references you're faced with, the more time consuming this process will take.

Option 2: Keep Project 1 Shared

The other option is to just keep these resources shared. Your Batch Process will function as it did before. The only difference is your Project will be dependent on the referenced Project ("Project 1") to function.

  1. Here, we selected "Create Project" to create a new Project from the last Batch Process in the Grooper Repository.
  2. We've selected the "Usage" tab to see what's referenced in "Project 1"
  3. It's utilizing one IP Profile and one CMIS Connection we saw in the previous example.
  4. It's also utilizing some of the resources in our Grooper Essentials package.

If you're fine with keeping these as shared resources in a referenced Project, you're done here. There's no need to go through the time consuming, copy and paste and reference reassignment dance we did earlier. The only potential drawback here is you've made this Project dependent on "Project 1". You will need to evaluate for yourself whether this is a drawback, a benefit, or doesn't really matter one way or another.

Option 3: Option 1 + Option 2

There's also no reason why you can't copy some items over to your new Project and keep others referenced through Project references.

For example, in our first example, we could have copied over the two IP Profiles but kept the CMIS Connection as a shared resource. In fact, that would have made more sense. If we have to make changes to a CMIS Connection (like entering new access permissions), we would only want to do that once by manipulating one single object, rather than reproducing our efforts by editing multiple copies of the same object in multiple Projects.

5. Remove Project References

Now, we're at a point where we've used the "Create Project" feature for every Batch Process in "Project 1". What's next?

In our situation, we've created some Projects that need to retain a reference to "Project 1" and some that do not. For those that do not, we should go ahead and remove the reference to Project 1. To do this, the "Usage" tab will once again be useful. For each Project we will want to use the "Usage" tab to check and see if there are any outbound references. If there are, we'll keep the reference intact. If there are not, we will remove the reference.

We've decided the "Human Resources" Project will retain its reference to "Project 1." If you want, you can verify the references with the "Usage" tab.

  1. Select the "Human Resources" Project
  2. Select the "Usage" tab.
  3. Take note of the outbound references.

  1. Next, we'll analyze the "Invoices Process" Project's references.
  2. Select the "Usage" tab.
  3. There are not any outbound references.

Now, we know it's safe to remove the reference to "Project 1".

  1. Using the Referenced Projects property, remove the reference to "Project 1"
  2. Press "Save" when finished.

You should always use the "Usage" tab before removing a reference to a Project.

Grooper will technically allow you to remove a reference to a Project even with outbound refences outstanding. However, doing so is not best practice as it can cause corruption of your system down the road.

  1. Continue selecting your Projects to analyze their references.
  2. Once all Projects with no outstanding references to "Project 1" have had their Project reference to "Project 1" removed, you are done.

6. Reorganize Shared Resources

After you're done creating Projects from the Batch Process in "Project 1", you'll want to clean up the remaining resources in "Project 1".

This will include:

  1. Organizing any leftover assets into manually created Projects, if applicable.
  2. Organizing any remaining shared resources into folders and deleting empty folders, as you see fit.

Manually Creating Projects

There are some situations where you may need to manually create a Project for certain assets remaining in "Project 1". Most commonly, this can happen if you have resources you've created for testing purposes that are not tied to a Batch Process and you want to keep them around.

  1. For example, we have two Content Models leftover.
  2. We also have this IP Profile

It seems like these are partially architected resources from whenever this Grooper user went through Grooper's ACE - Architect training. We may want to keep these around so this user can continue their training.

The only issue is there is not Batch Process utilizing these resources, so we can't take advantage of our "Create Project" utility.

So, we'll need to manually create a Project

  1. Right click the Projects folder in the node tree.
  2. Select "Add", then "Project..."
  3. Name the new Project.
  4. Press "Ok" when finished.

From here, you can cut and paste or move resources from "Project 1" to the new Project.

  1. For example, we moved this IP Profile from "Project 1" to the "ACE Training" Project we created.

Next, we're going to look at a potential issue you may encounter when moving resources out of "Project 1" to another Project.

  1. I want to move either of these two Content Models from "Project 1" to the "ACE Training" Project.
  2. If I try to move one, I get the following error message.
    • This is telling me there is a reference violation. The Content Model is dependent on some external reference, and can't be moved without resolving the reference.

Why did this happen? Long story short, someone did something they should have never done, but was technically allowed in previous versions of Grooper.

v

The problem is someone made a reference within a Content Model to something in another Content Model.

  1. This Data Field is causing the problem.
  2. Its Value Extractor is set to Reference a Data Type.
  3. The problem is the Data Type it's referencing is in a different Content Model.

This is a big "no-no" that was technically possible in Grooper, but never considered best practice. This may have happened by accident when copying a Content Model. This may have been a "quick fix" that some Grooper designer did, intending to go back and resolve but never got around to it. Who knows. The main issue is these types of reference violations can cause problems down the road, potentially causing corruption in your Grooper environment.

Part of the reason Projects were created was to avoid this type of corruption due to improperly referenced objects. Referencing Projects via the Referenced Projects property makes external resource references much more intentional, avoiding accidental reference violations (as much as possible).

In order to resolve this reference violation and get these objects moved to the new Project we need to resolve the reference violation in one way or another.

  1. We could clear the offending reference, or reset it to something that doesn't violate a reference across multiple Content Models.
  2. Or, since this Content Model is really just a copy of the other one, we could just delete it.
    • This is what I will elect to do.

  1. With the reference violation resolved, the Content Model can be moved with no issue.

Clean Up Remaining Folders

After you've moved everything out of "Project 1" into a new Project (whether manually or through the "Create Project" feature), the only resources left should be shared resources, objects intended to be used by any current or future Project.

All that's left is to organize the objects and folders remaining.

  1. You may have empty folders you can simply delete.

FYI: This "Content Models" folder will be "Read Only". This is a carry over from this folder being a system node in the previous version of Grooper's architecture.

  1. To delete it, you will need to select the folder, then go to the "Advanced" tab.
  2. Then, under Attributes, change Read Only to False.
    • Then navigate off the object. You will be prompted to "Save". After saving, you will be able to delete it.

You may also want to create some new folders for stray objects or move some folders to the root of the Project.

  1. For example, many users find it helpful to move the Grooper "Essentials" folder to the root of the Project, then delete the "Downloads" folder.

7. Rename Project 1

Once all resources are out of "Project 1" and you've organized any shared resources remaining, rename "Project 1" to something that reflects its true utility, something like "Shared Resources" or "Global Resources".

  1. We named ours "Global Resources".

That's it! The migration from "Project 1" to Grooper's new Project based architecture is complete!!