2022:Project (Node Type)
| WIP | This article is a work-in-progress. It was written using a beta version of 2022. This article is subject to change and/or expansion as it is updated to the release version of 2022.
This tag will be removed upon draft completion. |
A Project is the primary container in which document processing components are created, configured, and organized. It is a library of resources, such as Content Models, Batch Processes, OCR Profiles, Lexicons, and more, needed to process documents through Grooper.
About
After installing and setting up a Grooper Repository, creating a new Project is most likely the first thing you will do when starting work in Grooper Design Studio. A variety of different Grooper assets are required to process documents. A Content Model is required to classify documents and extract their data according to that classification. An OCR Profile is required to perform optical character recognition to get machine readable text from scanned pages. A Batch Process is required to define the step-by-step instructions to process documents from start to finish. A Project allows you to house these various resources related to a processing use case in one location.
|
Imagine you're processing vendor invoices. Pretty much anything and everything you need to process these documents can be organized into a Project.
|
|
|
How you organize objects in your Project is largely up to you. However, in service of this task, be aware you can add any number of folder levels to your Project.
|
What's With That Processes Folder?
|
If you're new to Grooper (or version 2022) you may be asking yourself, "What's with that "Processes" folder in the node tree?" As mentioned before, one of the things a Project can (and should) house is a Batch Process. If a Project can hold a Batch Process what does the Processes folder hold?
|
|
|
When adding and configuring a new Batch Process, you will always add it to a Project first. As you are editing it, you do not want it to be "live" or usable in a production-level environment as documents are coming into Grooper. This would cause partially or improperly processed documents to come through Grooper. So, while you are working on a Batch Process it is a working Batch Process. Once that Batch Process is finished and ready to be implemented in a production-level environment, it is then published (using the "Publish" button in the Batch Process object's UI). This creates a read-only copy of the working Batch Process in the Processes folder. Production-level Batches only have access to Batch Processes in the Processes folder, ensuring they are processed using only published processing instructions, not working ones. |
Adding a New Project
Add a Project
|
Projects are added to the Projects folder node in the node tree.
|
|
|
Add Resources to the Project
The following Grooper objects can be added to a Project
|
|
Extractor objects
|
Profile objects
|
Data integration objects
|
Other objects
|
|
So, how do you add them to a Project? Much like you would add an item to a node tree folder in Grooper.
|
|
|
What About Batches?
|
One thing you cannot add to a Project are Batches. This includes Test as well as Production Batches.
|
|
|
Test Batches can be accessed by any Grooper object with a Batch Selector in its UI.
|
Referencing Objects in Other Projects
Projects are new to version 2022. If you're new to Grooper, this won't mean much to you. Just know Projects are a much better way of organizing and accessing Grooper assets in a node tree structure than in previous versions. (And, if you are upgrading to version 2022, please review the #Projects and Upgrading to 2022 section of this article)
Aside from organizational benefits, one of the big reasons for switching to a Projects based architecture was to maintain reference integrity woven throughout multiple objects in a repository.
Generally speaking, Projects are intended to "silo" the resources contained within. Objects within the Project can freely reference other objects within the same Project but cannot reference objects in other Projects (without being explicitly allowed to do so).
|
For example, in our "Invoices" Project, the "Invoices OCR" OCR Profile references the "Image Cleanup - OCR" IP Profile to perform temporary image processing prior to running OCR.
Generally speaking, maintaining reference integrity is ideal. The more narrowly you can define an object's allowable scope of reference, the better. This makes it easier to track down references, limits the number of object dependencies, making your system easier to manage, and limits possible system corruption down the line if a mess of "reference spaghetti" gets tangled up in one way or another. |
|
|
However, imagine you're working in a different Project. Take our "Human Resources" Project. It makes perfect sense to have these two things separated into two Projects. They're two different use cases. They use different Content Models. They use different Batch Processes. There's good reason to keep the "invoice-y" things in one spot and the "human resource-y" things in another. There's no reason to clutter up our Project related to human resources documents with assets that only pertain to invoices. But, particularly for Grooper users who use Grooper across a variety of use cases, you will run into situations where resources you build for one project can be utilized in another. In these cases, it would be beneficial to share resources so that you don't have to rebuild something you've already developed. Let's say the "Image Cleanup - OCR" IP Profile would also work really well for our human resources (HR) documents. We've already done the work to get that IP Profile working well, and we don't want to duplicate our efforts by recreating it.
|
So, if we want to use an object from an external Project, what can we do? Depending on the situation, there are basically three options:
- Directly copy the object from one Project to another.
- Reference the external Project to allow access to its resources.
- Create a shared resources Project that both Projects reference.
Depending on the situation, there will be strengths and weaknesses to each approach. Next, we will detail each option and discuss some of these associated drawbacks.
Option 1: Copying Objects from One Project to Another
This option is generally acceptable for only the most basic circumstances. As we will see, there are some significant drawbacks to this approach. However, for simpler Grooper environments and simple Grooper objects, simply copying the desired object from one Project to another can work out just fine.
Furthermore, sometimes this option is going to work for you, sometimes not, depending on the reference complexity of the object you're copying.
| FYI | While the following guidance deals specifically with "copying and pasting", the same follows for "cutting and pasting" or "moving" objects from one Project to another. |
|
Let's go back to our previous example. Long story short, we want to use an IP Profile from the "Invoices" Project in the "Human Resources" Project. There's nothing preventing us from doing this, in this case.
Copying and pasting is a quick and easy solution for getting simple objects from one Project to another. We all know how to copy and paste. This isn't a groundbreaking concept. However, as with many simple things, it's not without its drawbacks. |
|
|
First, be aware these are now two separate objects. One lives in one Project. The other lives in another Project. They are distinct resources. Any changes made to the original object will not be reflected in the copied object (or vice versa).
This is one of the drawbacks to this approach. If you want to make changes to one object, you'll need to make the same changes to the other (assuming you want both objects to reflect the changes). |
|
|
Furthermore, there are situations where Grooper will not let you copy objects from one Project and paste them into another. This is a very intentional part of the Project object's design, done to preserve reference integrity. Grooper allowed us to copy and paste the IP Profile because it did not reference any other object in its original Project. If it did, its functionality would be dependent on that referenced object in the first Project being present in the second Project. Let's look at another example. In our "Invoice" Project's Content Model, we've built some extractor assets, including an address extractor. Let's say we want to bring that extractor into our "Human Resources" Project's Content Model.
|
|
|
If we try to do this, Grooper is going to throw an error. Why? The Data Type, as part of its configuration, references several Lexicon objects.
It also gives us the full node tree location within the Project of both the object doing the referencing (either the object you copied or one of its children) and the referenced object, using the following format:
|
|
Think of Projects like a friend's house. If your friend invites you over, he or she isn't surprised when you show up. But if you show up with a bunch of friends unannounced, they're going to take issue with you. There's now a bunch of random strangers in their house they didn't expect.
That's just like copying and pasting objects with references. Bringing in an object by itself is no big deal, but bringing along who knows how many objects it references is a big deal (Even more so considering any objects the referenced objects reference, and the objects the referenced objects' referenced objects reference and so on down the line). There's now a bunch of random objects you didn't expect cluttering up your Project.
This puts the onus on you, the user, to decide how you want to resolve these references. Again, there are strengths and drawbacks to each approach. It's up to you to decide what works best for your situation.
One thing you could do is copy all the needed referenced objects over to the second Project. Depending on the number of references you're dealing with, this could be a time consuming process, as it would involve the following steps:
- Copy and paste all the referenced objects from the first Project to the second.
- Unassign all the references in the object to be copied from the first Project
- Paste the object from the first Project to the second.
- Reassign all the references in the copied object to all the referenced objects pasted in step 1.
|
Depending on how these objects are organized, you could also copy and paste multiple objects at a time.
Since, we were able to copy the extractor, and a folder containing all the Lexicons it references and paste them all at the same time, Grooper allowed the move without any issue. |
|
|
Keep in mind, however, if you copy a folder, you're going to get everything in that folder.
|
Another option is to use Project references. This gives a Project referenceable access to all resources within one Project to another.
Option 2: Referencing a Project
Resources can be shared between two (or more) Projects by referencing the full Project. This gives explicit access to all objects within a Project, just as if they were created locally.
|
Let's go back to our problem copying an address extractor that references multiple Lexicons from one Project to another.
As we saw previously, Grooper will not allow us to do this (yet). |
|
|
All we need to do in order to make this happen, is effectively tell Grooper it's ok for the "Human Resources" Project to share assets with the "Invoices" Project. We do this by referencing the whole Project.
|
|
|
|
|
Now we can copy and paste all day long.
|
|
|
You may also make direct references to any object in a referenced Project. For example, because we've referenced the "Invoices" Project we could have simply referenced the address extractor without copying and pasting it.
|
This is an effective way of sharing resources between multiple Projects without duplicating your efforts by creating multiple copies of shared resources that you have to manage independently in each Project.
The only downside to this approach lies in how many different Projects utilize a set of shared resources. If it boils down to a limited number of resources, or resources shared between very similar Projects (in terms of their use case), this approach can work out just fine. But when you get into more and more resources shared between more and more Projects the crisscrossed references between them can be difficult to navigate when you're trying to track down a single object used across a variety of Projects.
In those cases, you may want to do a little extra work and create an entirely separate Project just devoted to housing resources shared between multiple Projects. We will discuss creating and utilizing a "Shared Resources" Project in the next tab.
| ⚠ | Please read the following before continuing. It contains best practice advice to avoid potential system corruption when dealing with Project referencing. |
Just as you can make references to other Projects, you can remove those references as well. However, to prevent future corruption down the line, you should always ensure no object in your Project references objects in the other Project before removing its reference.
|
The easiest way to do this is with the "Analyze References" button at the top of Project's UI screen.
These outbound references indicate there are resources in this Project that are dependent on resources in the "Invoices" Project to function. CAUTION!!!! While it is technically possible to remove the reference to a Project without resolving these references, YOU SHOULD NOT DO SO. It is best practice to either:
Please ensure there are no outbound references to the Project before removing the reference. |
The last option is to use an entirely separate Project which is solely devoted to housing objects used and referenced by multiple Projects. This option is most appropriate for larger environments, processing different kinds of documents from different use cases. Given a big enough body of documents, despite the fact they may come from different industries or use cases, you will find commonly used resources that are generalizable across a variety of documents. This can include generic or semi-generic extractors, Lexicons, even IP Profiles and OCR Profiles.
In these cases, it often makes sense to create a "bucket" of resources from which all Projects can draw from. The idea is to create shared resources in a single Project referenced by multiple others. Or, in our case, we're going to move these assets to a "Shared Resources" Project.
| FYI | Another common example of a shared resources are CMIS Connections and Data Connections.
It is often the case that multiple projects will reuse these connection objects to integrate Grooper with external storage platforms (such as content management systems and databases). Therefore, it would make sense to create something like a "Connections" Project containing these CMIS Connections and Data Connections. Instead of re-creating each connection object for each Project, all Projects can simply reference the "Connection" Project to gain access to the CMIS Connections and/or Data Connections required for import/export operations. |
|
For instance, there are some fairly generic extractors in the "Invoices" Project we may want accessible to the "Human Resources" Project and future Projects as well.
|
|||
|
We are going to move these extractors to a new Project, which we will name "Shared Resources".
For the first extractor, this job is very easy.
|
|||
|
|||
|
Here's where we get into the extra work on the front end. What we can do first, is copy the Value Reader. It makes no references to other objects. The issue here is that other objects are referencing it.
|
|||
|
Now, if you truly want to use this as a "shared" or "global" resource, you can reassign all the references to the "VAL - Generic Decimal" extractor within the "Invoices" Project. Ultimately, we will need the "Invoices" Project to reference the "Shared Resources" Project to reassign the references.
|
|||
|
|||
|
|||
The quickest way to figure out every object that references a selected object in the node tree, is to use the "References" tab.
What we could do from here is track down each of these objects, find where in their property grid the extractor is referenced, and reassign that reference to the version in the "Shared Resources" Project. That is a perfectly acceptable, although somewhat time consuming way to reassign references. Luckily, we have a shortcut available to us. |
|||
|
The "Reassign References..." button will allow us to change the reference for each object in the list from the selected object, to a different one. This is exactly what we want to do. We want to change the reference set on these Data Columns and Data Type from the "VAL - Generic Decimal" extractor in the "Invoices" Project to the copy we made in the "Shared Resources" Project.
|
|||
|
|||
|
As we've demonstrated, it's a little extra work if you decide you want to move resources from one Project to a shared resources Project. However, the benefit to organizing assets like this is any Project referencing our "Shared Resources" Project now have access to its assets.
|
The Essentials Project
|
Every newly created Grooper Repository in version 2022, will come with a Project named "Essentials". This Project contains several resources you may find useful when designing your document processing assets. Just like any other Project, you can access these resources by making a reference to the "Essentials" Project. The objects contained within can be examples of different types of objects you create, resources you can copy into your own Projects and build on top of, or simply resources you directly reference in your Projects. |
|
|
In this project you will find various:
|
Projects and Upgrading to 2022
|
Projects are a new way of organizing Grooper resources in version 2022. In previous versions, Grooper resources were organized primarily in one of three folders in the node tree:
Users would have to go back and forth between these locations in order to configure what they needed to process documents through Grooper. This often resulted in a time consuming and cumbersome process, sifting through the node tree's hierarchy to get to the objects you needed. Projects simplify this issue by allowing you to place all associated resources for a given use case (or "project") in a single node location.
Before the introduction of Projects in 2022, these objects were interspersed throughout various locations in the node tree. In version 2022, everything can be neatly placed in one, single location, making finding what you're looking for much simpler.
|
|||
|
Obviously, this architecture is much different than how your assets are currently organized in Grooper. So, what's going to happen when you upgrade?
Anything in the "Global Resources" folder will be placed throughout "Project 1"
|
Deciding What to Do Next
It's important to point out your Grooper environment will work just fine with everything organized into the single "Project 1" Project. You can leave everything as is in "Project 1" upon upgrading to version 2022 and continue processing Batches of documents as if nothing happened.
Going forward you have two options:
- Do nothing. Leave all Grooper resources organized into "Project 1"
- Migrate resources into their own Projects.
You should consider this an "all or nothing" choice. There are some significant benefits to organizing resources into their own Projects, but it should not be done haphazardly. You will not see the true benefits of this new architecture if you take a "half in/half out" approach. That said, migrating resources to new Projects will take time. There are some utilities that will aid you in this task, but there will necessarily be some manual moving of objects from one node location to another.
So, should you migrate away from "Project 1" at all? Here are some things to keep in mind, when making this decision.
- It's all or nothing.
- Again, we stress the importance of committing to the move. You should commit to migrating everything to new Projects (with the exception of a handful of shared resources), rather than just a few. The benefits of the Project architecture will not be realized until you've completed the entire process. Not following this advice increases the likelihood of a time-sensitive call to the help-desk in the future. This call will likely be time-consuming as we attempt to track down the issue through a partially architected system.
- You don't have to move things from "Project 1" at all."
- If you do not have the time or resources to migrate out of "Project 1", it's best to leave everything in "Project 1". Everything will continue to work as it did previously.
- Do you have time to do it?
- This is probably the biggest question you need to ask yourself. The migration will take time. The larger the repository is, with many Content Models, Batch Processes, profiles and other objects, the longer it's going to take.
- Do you have a lot of "shared resources"?
- If you frequently have individual Data Types, Lexicons, profiles, CMIS Connections or other objects used across many different Content Models and Batch Processes, this will take the highest amount of time and effort to migrate. Ensuring these shared resources are accessible to each Project created is the most time consuming part of any migration out of "Project 1".
- Do you frequently promote objects from a "test" or "dev" Grooper Repository to a "production" Grooper Repository?
- If so, Projects are for you. The new architecture provides multiple advantages to this kind of workflow. You should seriously consider devoting the time to migrate resources into their own Projects, if you maintain multiple environments to publish Grooper objects from development to production repositories.
- Do you use third-party data entry companies to review work in Grooper?
- If so, Projects are for you. You'll benefit from being able to push complete and tidy project packages to an environment dedicated to that company.
- Do you have multiple Grooper engineers working in the same Grooper Repository(ies)?
- If so, Projects are really for you. Aside from object organization, the other big reason for creating the Project architecture was to maintain object reference integrity. Projects will greatly assist you in preventing reference corruption in your Grooper environments.
Project Migration Plan
Ok, you've decided Projects are for you, and you want to move resources out of "Project 1" to best take advantage of them. What are the next steps forward?
We've narrowed the process down to seven general steps:
- Clean up your repository. Delete items that are no longer in use and will not be used in the future.
- Is this strictly necessary? No. But now's a good time to clean house as you begin to organize your Grooper Repo into multiple Projects.
- Use the "Create Project" feature for each Batch Process.
- As each Project is created, rename any objects as needed if your prior naming conventions no longer make sense.
- For each Project, use the "Analyze References" feature to decide what to do about "shared resources" used by multiple Projects.
- Remove Project references if the "Outbound References" list is empty.
- Reorganize any shared resource objects that remain in "Project 1"
- Rename "Project 1" to something like "Global Resources" or "Shared Resources"
1. Clean House
|
If you're going to take the time to reorganize your resources into Projects, now is a good time to take a look at the Grooper objects in your repository and get rid of anything not in use littering up your environment. This is entirely optional, but now is as good a time as any to clean house.
|
2. Create Project
Now we can start in earnest and create some Projects. You could do this manually. The steps would be as follows.
- Add a Project to the "Projects" folder.
- Using the Project's Referenced Projects property, reference "Project 1".
- For more information on referencing Projects, please review the #Referencing Objects in Other Projects section of this article.
- Move a Batch Process to that Project.
- Move the Content Model associated with that Batch Process to the Project.
- Move any other Grooper objects referenced by the Batch Process or Content Model's objects to the Project.
- Or keep any "shared resources" put in "Project 1", maintaining access to them through the Project reference (We'll discuss this further in Step 4: Analyze References).
There's nothing wrong with this approach, but there's a quicker way of doing things (or at least starting this process) using the "Create Project" feature.
The "Create Project" feature is accessed by selecting a Batch Process. If you think about it, a Batch Process should reference any Grooper object necessary to do work for a particular use case. All the necessary objects will be referenced in the steps of the Batch Process as part of its execution, such as a Content Model referenced for a Classify step or an OCR Profile referenced for a Recognize step.
The "Create Project" utility will create a new Project, named the same as the Batch Process's name, look for any objects referenced as part of its execution, and move them to the new Project.
Important! "Create Project" will only move objects not referenced by anything else. If another Batch Process uses the same OCR Profile, for example, that OCR Profile will remain in "Project 1". We will discuss this further in "Step 4: Analyze References".
|
We will start by creating a new Project using a fairly simple document redaction Batch Process. This is an entirely "self-contained" Batch Process. No other Batch Process utilizes its resources.
|
|||
|
To create the Project, perform the following steps.
There are two configurable options when creating the new Project.
|
|||
|
When the utility finishes running, a new Project will be created. All objects associated with the Batch Process are moved from "Project 1" to the new Project (as long as that move is allowed. Again, we'll talk more about moves that aren't allowed during Step 4).
|
3. Rename Resources
With the switch to the Project architecture, you may find your naming convention no longer makes sense or could be adjusted. Much like the "Clean Up" step, this step is not strictly necessary. But, if you're going through the effort to reorganize your repository into a new structure, you might as well make sure how you're naming things make sense in that new structure.










































