2022:Project (Node Type): Difference between revisions

Revision as of 13:12, 16 February 2022

WIP

This article is a work-in-progress. It was written using a beta version of 2022. This article is subject to change and/or expansion as it is updated to the release version of 2022.

This tag will be removed upon draft completion.

A Project is the primary container in which document processing components are created, configured, and organized. It is a library of resources, such as Content Models, Batch Processes, OCR Profiles, Lexicons, and more, needed to process documents through Grooper.

About

After installing and setting up a Grooper Repository, creating a new Project is most likely the first thing you will do when starting work in Grooper Design Studio. A variety of different Grooper assets are required to process documents. A Content Model is required to classify documents and extract their data according to that classification. An OCR Profile is required to perform optical character recognition to get machine readable text from scanned pages. A Batch Process is required to define the step-by-step instructions to process documents from start to finish. A Project allows you to house these various resources related to a processing use case in one location.

Imagine you're processing vendor invoices. Pretty much anything and everything you need to process these documents can be organized into a Project.

Here, we have a Project named "Invoices"
This Project houses the Content Model configured for document classification and data extraction.
It also holds the Batch Process used to process Batches.
As well as other Grooper objects required for this use case.
- "NTFS Connection" is a CMIS Connection utilized for exporting content. It is referenced by the "Invoices Model" Content Model's Export Behavior configuration which is executed when the "Invoices Process" Batch Process's Export activity is applied.
- "Permanent IP" is an IP Profile referenced by the Image Processing step of the "Invoices Process" Batch Process.
- "Scan Profile" is a Scanner Profile referenced by the Scan step of the "Invoices Process" Batch Process.

How you organize objects in your Project is largely up to you. However, in service of this task, be aware you can add any number of folder levels to your Project.

For example, we've added an "OCR Resources" folder, which contains an OCR Profile and an IP Profile it references.
In the "Separation Resources" folder, there is a Separation Profile and an extractor referenced in the profile's configuration.

What's with that Processes folder?

If you're new to Grooper (or version 2022) you may be asking yourself, "What's with that "Processes" folder in the node tree?" As mentioned before, one of the things a Project can (and should) house is a Batch Process. If a Project can hold a Batch Process what does the Processes folder hold?

- Projects hold working Batch Processes.
- The Processes folder holds published Batch Processes.

When adding and configuring a new Batch Process, you will always add it to a Project first. As you are editing it, you do not want it to be "live" or usable in a production-level environment as documents are coming into Grooper. This would cause partially or improperly processed documents to come through Grooper. So, while you are working on a Batch Process it is a working Batch Process.

Once that Batch Process is finished and ready to be implemented in a production-level environment, it is then published (using the "Publish" button in the Batch Process object's UI). This creates a read-only copy of the working Batch Process in the Processes folder. Production-level Batches only have access to Batch Processes in the Processes folder, ensuring they are processed using only published processing instructions, not working ones.

Adding a New Project

Referencing Objects in Other Projects

Projects are new to version 2022. If you're new to Grooper, this won't mean much to you. Just know Projects are a much better way of organizing and accessing Grooper assets in a node tree structure than in previous versions. (And, if you are upgrading to version 2022, please review the #Projects and Upgrading to 2022 section of this article)

Aside from organizational benefits, one of the big reasons for switching to a Projects based architecture was to maintain reference integrity woven throughout multiple objects in a repository.

@@ Line 14: / Line 14: @@
 After installing and setting up a Grooper Repository, creating a new '''Project''' is most likely the first thing you will do when starting work in Grooper Design Studio.  A variety of different Grooper assets are required to process documents.  A '''Content Model''' is required to classify documents and extract their data according to that classification.  An '''OCR Profile''' is required to perform optical character recognition to get machine readable text from scanned pages.  A '''Batch Process''' is required to define the step-by-step instructions to process documents from start to finish.  A '''Project''' allows you to house these various resources related to a processing use case in one location.
-Imagine you're processing vendor invoices.
+{|cellpadding=10 cellspacing=5
+|valign=top style="width:40%"|
+Imagine you're processing vendor invoices.  Pretty much anything and everything you need to process these documents can be organized into a '''Project'''.
-=== Referencing Projects ===
+# Here, we have a '''Project''' named "Invoices"
+# This '''Project''' houses the '''Content Model''' configured for document classification and data extraction.
+# It also holds the '''Batch Process''' used to process '''Batches'''.
+# As well as other Grooper objects required for this use case.
+#* "NTFS Connection" is a '''CMIS Connection''' utilized for exporting content.  It is referenced by the "Invoices Model" '''Content Model's''' '''''Export Behavior''''' configuration which is executed when the "Invoices Process" '''Batch Process's''' '''Export''' activity is applied.
+#* "Permanent IP" is an '''IP Profile''' referenced by the '''Image Processing''' step of the "Invoices Process" '''Batch Process'''.
+#* "Scan Profile" is a '''Scanner Profile''' referenced by the '''Scan''' step of the "Invoices Process" '''Batch Process'''.
+|valign=top|
+[[File:2022-project-about-01.png]]
+|-
+|valign=top|
+How you organize objects in your '''Project''' is largely up to you.  However, in service of this task, be aware you can add any number of folder levels to your '''Project'''.
+# For example, we've added an "OCR Resources" folder, which contains an '''OCR Profile''' and an '''IP Profile''' it references.
+# In the "Separation Resources" folder, there is a '''Separation Profile''' and an extractor referenced in the profile's configuration.
+|valign=top|
+[[File:2022-project-about-02.png]]
+|}
+=== What's with that Processes folder? ===
+{|cellpadding=10 cellspacing=5
+|valign=top style="width:40%"|
+If you're new to Grooper (or version '''2022''') you may be asking yourself, "What's with that "'''Processes'''" folder in the node tree?"  As mentioned before, one of the things a '''Project''' can (and should) house is a '''Batch Process'''.  If a '''Project''' can hold a '''Batch Process''' what does the '''Processes''' folder hold?
+#* '''Projects''' hold ''working'' '''Batch Processes'''.
+#* The '''Processes''' folder holds ''published'' '''Batch Processes'''.
+|valign=top|
+[[File:2022-project-about-03.png]]
+|-
+|valign=top|
+When adding and configuring a new '''Batch Process''', you will always add it to a '''Project''' first.  As you are editing it, you do not want it to be "live" or usable in a production-level environment as documents are coming into Grooper.  This would cause partially or improperly processed documents to come through Grooper.  So, while you are working on a '''Batch Process''' it is a ''working'' '''Batch Process'''.
+Once that '''Batch Process''' is finished and ready to be implemented in a production-level environment, it is then ''published'' (using the "Publish" button in the '''Batch Process''' object's UI).  This creates a read-only copy of the working '''Batch Process''' in the '''Processes''' folder.  Production-level '''Batches''' only have access to '''Batch Processes''' in the '''Processes''' folder, ensuring they are processed using only published processing instructions, ''not'' working ones.
+|
+[[File:2022-project-about-03.png]]
+|}
+== Adding a New Project ==
+== Referencing Objects in Other Projects ==
+'''Projects''' are new to version '''2022'''.  If you're new to Grooper, this won't mean much to you.  Just know '''Projects''' are a much better way of organizing and accessing Grooper assets in a node tree structure than in previous versions.  (And, if you are upgrading to version '''2022''', please review the [[#Projects and Upgrading to 2022]] section of this article)
+Aside from organizational benefits, one of the big reasons for switching to a '''Projects''' based architecture was to maintain reference integrity woven throughout multiple objects in a repository.
+=== The Essentials Project ===
+== Projects and Upgrading to 2022 ==