2022:Project (Node Type)
| WIP | This article is a work-in-progress. It was written using a beta version of 2022. This article is subject to change and/or expansion as it is updated to the release version of 2022.
This tag will be removed upon draft completion. |
A Project is the primary container in which document processing components are created, configured, and organized. It is a library of resources, such as Content Models, Batch Processes, OCR Profiles, Lexicons, and more, needed to process documents through Grooper.
About
After installing and setting up a Grooper Repository, creating a new Project is most likely the first thing you will do when starting work in Grooper Design Studio. A variety of different Grooper assets are required to process documents. A Content Model is required to classify documents and extract their data according to that classification. An OCR Profile is required to perform optical character recognition to get machine readable text from scanned pages. A Batch Process is required to define the step-by-step instructions to process documents from start to finish. A Project allows you to house these various resources related to a processing use case in one location.
|
Imagine you're processing vendor invoices. Pretty much anything and everything you need to process these documents can be organized into a Project.
|
|
|
How you organize objects in your Project is largely up to you. However, in service of this task, be aware you can add any number of folder levels to your Project.
|
What's With That Processes Folder?
|
If you're new to Grooper (or version 2022) you may be asking yourself, "What's with that "Processes" folder in the node tree?" As mentioned before, one of the things a Project can (and should) house is a Batch Process. If a Project can hold a Batch Process what does the Processes folder hold?
|
|
|
When adding and configuring a new Batch Process, you will always add it to a Project first. As you are editing it, you do not want it to be "live" or usable in a production-level environment as documents are coming into Grooper. This would cause partially or improperly processed documents to come through Grooper. So, while you are working on a Batch Process it is a working Batch Process. Once that Batch Process is finished and ready to be implemented in a production-level environment, it is then published (using the "Publish" button in the Batch Process object's UI). This creates a read-only copy of the working Batch Process in the Processes folder. Production-level Batches only have access to Batch Processes in the Processes folder, ensuring they are processed using only published processing instructions, not working ones. |
Adding a New Project
Add a Project
|
Projects are added to the Projects folder node in the node tree.
|
|
|
Add Resources to the Project
The following Grooper objects can be added to a Project
|
|
Extractor objects
|
Profile objects
|
Data integration objects
|
Other objects
|
|
So, how do you add them to a Project? Much like you would add an item to a node tree folder in Grooper.
|
|
|
What About Batches?
|
One thing you cannot add to a Project are Batches. This includes Test as well as Production Batches.
|
|
|
Test Batches can be accessed by any Grooper object with a Batch Selector in its UI.
|
Referencing Objects in Other Projects
Projects are new to version 2022. If you're new to Grooper, this won't mean much to you. Just know Projects are a much better way of organizing and accessing Grooper assets in a node tree structure than in previous versions. (And, if you are upgrading to version 2022, please review the #Projects and Upgrading to 2022 section of this article)
Aside from organizational benefits, one of the big reasons for switching to a Projects based architecture was to maintain reference integrity woven throughout multiple objects in a repository.
Generally speaking, Projects are intended to "silo" the resources contained within. Objects within the Project can freely reference other objects within the same Project but cannot reference objects in other Projects (without being explicitly allowed to do so).
|
For example, in our "Invoices" Project, the "Invoices OCR" OCR Profile references the "Image Cleanup - OCR" IP Profile to perform temporary image processing prior to running OCR.
|
|
|
However, imagine you're working in a different Project. Take our "Human Resources" Project. It makes perfect sense to have these two things separated into two Projects. They're two different use cases. They use different Content Models. They use different Batch Processes. There's good reason to keep the "invoice-y" things in one spot and the "human resource-y" things in another. There's no reason to clutter up our Project related to human resources documents with assets that only pertain to invoices. But, particularly for Grooper users who use Grooper across a variety of use cases, you will run into situations where resources you build for one project can be utilized in another. In these cases, it would be beneficial to share resources so that you don't have to rebuild something you've already developed. Let's say the "Image Cleanup - OCR" IP Profile would also work really well for our human resources (HR) documents. We've already done the work to get that IP Profile working well, and we don't want to duplicate our efforts by recreating it.
|
So, if we want to use an object from an external Project, what can we do? Depending on the situation, there are basically three options:
- Directly copy the object from one Project to another.
- Reference the external Project to allow access to its resources.
- Create a shared resources Project that both Projects reference.
Depending on the situation, there will be strengths and weaknesses to each approach. Next, we will detail each option and discuss some of these associated drawbacks.
Option 1: Copying Objects from One Project to Another
This option is generally acceptable for only the most basic circumstances. As we will see, there are some significant drawbacks to this approach. However, for simpler Grooper environments and simple Grooper objects, simply copying the desired object from one Project to another can work out just fine.
Furthermore, sometimes this option is going to work for you, sometimes not, depending on the reference complexity of the object you're copying.











