2022:Project (Node Type)
| WIP | This article is a work-in-progress. It was written using a beta version of 2022. This article is subject to change and/or expansion as it is updated to the release version of 2022.
This tag will be removed upon draft completion. |
A Project is the primary container in which document processing components are created, configured, and organized. It is a library of resources, such as Content Models, Batch Processes, OCR Profiles, Lexicons, and more, needed to process documents through Grooper.
About
After installing and setting up a Grooper Repository, creating a new Project is most likely the first thing you will do when starting work in Grooper Design Studio. A variety of different Grooper assets are required to process documents. A Content Model is required to classify documents and extract their data according to that classification. An OCR Profile is required to perform optical character recognition to get machine readable text from scanned pages. A Batch Process is required to define the step-by-step instructions to process documents from start to finish. A Project allows you to house these various resources related to a processing use case in one location.
|
Imagine you're processing vendor invoices. Pretty much anything and everything you need to process these documents can be organized into a Project.
|
|
|
How you organize objects in your Project is largely up to you. However, in service of this task, be aware you can add any number of folder levels to your Project.
|
What's With That Processes Folder?
|
If you're new to Grooper (or version 2022) you may be asking yourself, "What's with that "Processes" folder in the node tree?" As mentioned before, one of the things a Project can (and should) house is a Batch Process. If a Project can hold a Batch Process what does the Processes folder hold?
|
|
|
When adding and configuring a new Batch Process, you will always add it to a Project first. As you are editing it, you do not want it to be "live" or usable in a production-level environment as documents are coming into Grooper. This would cause partially or improperly processed documents to come through Grooper. So, while you are working on a Batch Process it is a working Batch Process. Once that Batch Process is finished and ready to be implemented in a production-level environment, it is then published (using the "Publish" button in the Batch Process object's UI). This creates a read-only copy of the working Batch Process in the Processes folder. Production-level Batches only have access to Batch Processes in the Processes folder, ensuring they are processed using only published processing instructions, not working ones. |
Adding a New Project
Add a Project
|
Projects are added to the Projects folder node in the node tree.
|
|
|
Add Resources to the Project
The following Grooper objects can be added to a Project
|
|
Extractor objects
|
Profile objects
|
Data integration objects
|
Other objects
|
|
So, how do you add them to a Project? Much like you would add an item to a node tree folder in Grooper.
|
|
|
What About Batches?
|
One thing you cannot add to a Project are Batches. This includes Test as well as Production Batches.
|
|
|
Test Batches can be accessed by any Grooper object with a Batch Selector in its UI.
|
Referencing Objects in Other Projects
Projects are new to version 2022. If you're new to Grooper, this won't mean much to you. Just know Projects are a much better way of organizing and accessing Grooper assets in a node tree structure than in previous versions. (And, if you are upgrading to version 2022, please review the #Projects and Upgrading to 2022 section of this article)
Aside from organizational benefits, one of the big reasons for switching to a Projects based architecture was to maintain reference integrity woven throughout multiple objects in a repository.
Generally speaking, Projects are intended to "silo" the resources contained within. Objects within the Project can freely reference other objects within the same Project but cannot reference objects in other Projects (without being explicitly allowed to do so).
|
For example, in our "Invoices" Project, the "Invoices OCR" OCR Profile references the "Image Cleanup - OCR" IP Profile to perform temporary image processing prior to running OCR.
|
|
|
However, imagine you're working in a different Project. Take our "Human Resources" Project. It makes perfect sense to have these two things separated into two Projects. They're two different use cases. They use different Content Models. They use different Batch Processes. There's good reason to keep the "invoice-y" things in one spot and the "human resource-y" things in another. There's no reason to clutter up our Project related to human resources documents with assets that only pertain to invoices. But, particularly for Grooper users who use Grooper across a variety of use cases, you will run into situations where resources you build for one project can be utilized in another. In these cases, it would be beneficial to share resources so that you don't have to rebuild something you've already developed. Let's say the "Image Cleanup - OCR" IP Profile would also work really well for our human resources (HR) documents. We've already done the work to get that IP Profile working well, and we don't want to duplicate our efforts by recreating it.
|
So, if we want to use an object from an external Project, what can we do? Depending on the situation, there are basically three options:
- Directly copy the object from one Project to another.
- Reference the external Project to allow access to its resources.
- Create a shared resources Project that both Projects reference.
Depending on the situation, there will be strengths and weaknesses to each approach. Next, we will detail each option and discuss some of these associated drawbacks.
Option 1: Copying Objects from One Project to Another
This option is generally acceptable for only the most basic circumstances. As we will see, there are some significant drawbacks to this approach. However, for simpler Grooper environments and simple Grooper objects, simply copying the desired object from one Project to another can work out just fine.
Furthermore, sometimes this option is going to work for you, sometimes not, depending on the reference complexity of the object you're copying.
|
Let's go back to our previous example. Long story short, we want to use an IP Profile from the "Invoices" Project in the "Human Resources" Project. There's nothing preventing us from doing this, in this case.
Copying and pasting is a quick and easy solution for getting simple objects from one Project to another. We all know how to copy and paste. This isn't a groundbreaking concept. However, as with many simple things, it's not without its drawbacks. |
|
|
First, be aware these are now two separate objects. One lives in one Project. The other lives in another Project. They are distinct resources. Any changes made to the original object will not be reflected in the copied object (or vice versa).
This is one of the drawbacks to this approach. If you want to make changes to one object, you'll need to make the same changes to the other (assuming you want both objects to reflect the changes). |
|
|
Furthermore, there are situations where Grooper will not let you copy objects from one Project and paste them into another. This is a very intentional part of the Project object's design, done to preserve reference integrity. Grooper allowed us to copy and paste the IP Profile because it did not reference any other object in its original Project. If it did, its functionality would be dependent on that referenced object in the first Project being present in the second Project. Let's look at another example. In our "Invoice" Project's Content Model, we've built some extractor assets, including an address extractor. Let's say we want to bring that extractor into our "Human Resources" Project's Content Model.
|
|
|
If we try to do this, Grooper is going to throw an error. Why? The Data Type, as part of its configuration, references several Lexicon objects.
It also gives us the full node tree location within the Project of both the object doing the referencing (either the object you copied or one of its children) and the referenced object, using the following format:
|
|
Think of Projects like a friend's house. If your friend invites you over, he or she isn't surprised when you show up. But if you show up with a bunch of friends unannounced, they're going to take issue with you. There's now a bunch of random strangers in their house they didn't expect.
That's just like copying and pasting objects with references. Bringing in an object by itself is no big deal, but bringing along who knows how many objects it references is a big deal (Even more so considering any objects the referenced objects reference, and the objects the referenced objects' referenced objects reference and so on down the line). There's now a bunch of random objects you didn't expect cluttering up your Project.
This puts the onus on you, the user, to decide how you want to resolve these references. Again, there are strengths and drawbacks to each approach. It's up to you to decide what works best for your situation.
One thing you could do is copy all the needed referenced objects over to the second Project. Depending on the number of references you're dealing with, this could be a time consuming process, as it would involve the following steps:
- Copy and paste all the referenced objects from the first Project to the second.
- Unassign all the references in the object to be copied from the first Project
- Paste the object from the first Project to the second.
- Reassign all the references in the copied object to all the referenced objects pasted in step 1.
Another option is to use Project references. This gives a Project referenceable access to all resources within one Project to another.















