2022:Project (Node Type): Difference between revisions
Dgreenwood (talk | contribs) |
Dgreenwood (talk | contribs) |
||
| (28 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
<blockquote style="font-size:125%"> | <blockquote style="font-size:125%"> | ||
A '''Project''' is the primary container in which document processing components are created, configured, and organized. It is a library of resources, such as '''Content Models''', '''Batch Processes''', '''OCR Profiles''', '''Lexicons''', and more, needed to process documents through Grooper. | A '''Project''' is the primary container in which document processing components are created, configured, and organized. It is a library of resources, such as '''Content Models''', '''Batch Processes''', '''OCR Profiles''', '''Lexicons''', and more, needed to process documents through Grooper. | ||
| Line 173: | Line 166: | ||
# The reference to "Image Cleanup - OCR" set using the '''OCR Profile's''' '''''IP Profile''''' property is allowed. | # The reference to "Image Cleanup - OCR" set using the '''OCR Profile's''' '''''IP Profile''''' property is allowed. | ||
# Both objects are contained in the same '''Project'''. | # Both objects are contained in the same '''Project'''. | ||
Generally speaking, maintaining reference integrity is ideal. The more narrowly you can define an object's allowable scope of reference, the better. This makes it easier to track down references, limits the number of object dependencies, making your system easier to manage, and limits possible system corruption down the line if a mess of "reference spaghetti" gets tangled up in one way or another. | |||
| | | | ||
[[File:2022-project-references-01.png]] | [[File:2022-project-references-01.png]] | ||
| Line 191: | Line 186: | ||
|} | |} | ||
So, if we want to use an object from an external '''Project''', what can we do? | === Using Resources in Other Projects === | ||
So, if we want to use an object from an external '''Project''', what can we do? There are three options: | |||
# Directly copy the object from one '''Project''' to another. | # Directly copy the object from one '''Project''' to another. | ||
| Line 197: | Line 194: | ||
# Create a shared resources '''Project''' that both '''Projects''' reference. | # Create a shared resources '''Project''' that both '''Projects''' reference. | ||
Depending on the situation, there will be strengths and weaknesses to each approach. Next, we will detail each option and discuss some of these associated drawbacks. | Depending on the situation, there will be strengths and weaknesses to each approach. Next, we will detail each option and discuss some of these associated drawbacks.</span> | ||
<tabs style="margin:20px"> | |||
<tab name="Option 1: Copying Objects from One Project to Another" style="margin:20px"> | |||
=== Option 1: Copying Objects from One Project to Another === | === Option 1: Copying Objects from One Project to Another === | ||
For simpler Grooper environments and simple Grooper objects, simply copying the desired object from one '''Project''' to another can work out just fine. This option is often the best for the most basic of circumstances. | |||
Furthermore, sometimes this option is going to work for you, sometimes not, depending on the reference complexity of the object you're copying. | However, there can be significant drawbacks to this approach. Furthermore, sometimes this option is going to work for you, sometimes not, depending on the reference complexity of the object you're copying. | ||
{|cellpadding="10" cellspacing="5" | {|cellpadding="10" cellspacing="5" style="margin:12px" | ||
|-style="background-color:#36b0a7; color:white | |- | ||
|style="font-size:125%; background-color:#36b0a7; color:white; width:28px; text-align:center"|'''FYI''' | |||
|style="border: 4px solid #36b0a7"| | |||
While the following guidance deals specifically with "copying and pasting", the same follows for "cutting and pasting" or "moving" objects from one '''Project''' to another. | |||
|} | |} | ||
{|cellpadding=10 cellspacing=5 | {|cellpadding=10 cellspacing=5 | ||
| Line 275: | Line 275: | ||
# Paste the object from the first '''Project''' to the second. | # Paste the object from the first '''Project''' to the second. | ||
# Reassign all the references in the copied object to all the referenced objects pasted in step 1. | # Reassign all the references in the copied object to all the referenced objects pasted in step 1. | ||
{|cellpadding=10 cellspacing=5 | |||
|valign=top style="width:40%"| | |||
Depending on how these objects are organized, you could also copy and paste multiple objects at a time. | |||
# The only reason we're having an issue here, is we need multiple objects coming into the "Human Resources" '''Project''' at a time. | |||
#* The "VAL - Address" '''Data Type''' and (some of) the '''Lexicons''' in the "Lexicons" folder. | |||
# Since the "VAL - Address" '''Data Type''' and "Lexicons" folder are siblings in the '''Local Resources''' folder, we can use the "Contents" tab to copy and paste both at the same time. | |||
# Navigate to the "Contents" tab. | |||
# Select all the objects you want to copy (using <code>Ctrl</code> + <code>Left Click</code> or <code>Shift</code> + <code>Left Click</code>). | |||
#* In our case, the "Lexicons" folder and "VAL - Address" '''Data Type'''. | |||
# Paste them into the desired location in the '''Project'''. | |||
Since, we were able to copy the extractor, and a folder containing all the '''Lexicons''' it references and paste them all at the same time, Grooper allowed the move without any issue. | |||
| | |||
[[File:2022-project-references-24.png]] | |||
|- | |||
|valign=top| | |||
Keep in mind, however, if you copy a folder, you're going to get everything in that folder. | |||
# In our case, we copied over a couple additional '''Lexicons''' we may or may not want. | |||
| | |||
[[File:2022-project-references-25.png]] | |||
|} | |||
Another option is to use '''Project''' references. This gives a '''Project''' referenceable access to all resources within one '''Project''' to another. | Another option is to use '''Project''' references. This gives a '''Project''' referenceable access to all resources within one '''Project''' to another. | ||
</tab> | |||
<tab name="Option 2: Referencing a Project" style="margin:20px"> | |||
=== Option 2: Referencing a Project === | === Option 2: Referencing a Project === | ||
| Line 369: | Line 395: | ||
[[File:2022-project-references-12.png]] | [[File:2022-project-references-12.png]] | ||
|} | |} | ||
</tab> | |||
<tab name="Option 3: Creating and Referencing a Shared Resources Project" style="margin:20px"> | |||
=== Option 3: Creating and Referencing a Shared Resources Project === | === Option 3: Creating and Referencing a Shared Resources Project === | ||
| Line 375: | Line 402: | ||
In these cases, it often makes sense to create a "bucket" of resources from which all '''Projects''' can draw from. The idea is to create shared resources in a single '''Project''' referenced by multiple others. Or, in our case, we're going to move these assets to a "Shared Resources" '''Project'''. | In these cases, it often makes sense to create a "bucket" of resources from which all '''Projects''' can draw from. The idea is to create shared resources in a single '''Project''' referenced by multiple others. Or, in our case, we're going to move these assets to a "Shared Resources" '''Project'''. | ||
{|cellpadding="10" cellspacing="5" style="margin:12px" | |||
|- | |||
|style="font-size:125%; background-color:#36b0a7; color:white; width:28px; text-align:center"|'''FYI''' | |||
|style="border: 4px solid #36b0a7"| | |||
Another common example of a shared resources are '''CMIS Connections''' and '''Data Connections'''. | |||
It is often the case that multiple projects will reuse these connection objects to integrate Grooper with external storage platforms (such as content management systems and databases). Therefore, it would make sense to create something like a "Connections" '''Project''' containing these '''CMIS Connections''' and '''Data Connections'''. Instead of re-creating each connection object for each '''Project''', all '''Projects''' can simply reference the "Connection" '''Project''' to gain access to the '''CMIS Connections''' and/or '''Data Connections''' required for import/export operations. | |||
|} | |||
{|cellpadding=10 cellspacing=5 | {|cellpadding=10 cellspacing=5 | ||
| Line 428: | Line 464: | ||
#* No outbound references are detected (meaning there is no object in the "Shared Resources" '''Project''' referencing ''out'' to objects in the "Invoices" '''Project'''). This is what we want to see. If there were outbound references, we would want to resolve them before removing the reference to the external '''Project'''. | #* No outbound references are detected (meaning there is no object in the "Shared Resources" '''Project''' referencing ''out'' to objects in the "Invoices" '''Project'''). This is what we want to see. If there were outbound references, we would want to resolve them before removing the reference to the external '''Project'''. | ||
# Press "OK" to continue. | # Press "OK" to continue. | ||
{|cellpadding="10" cellspacing="5" style="margin:12px" | |||
|- | |||
|style="font-size:250%; background-color:#f89420; color:white; width:28px; text-align:center"|⚠ | |||
|style="border: 4px solid #f89420"| | |||
You should '''''always''''' use the "Analyze Reference" button before removing a reference to a '''Project'''. | |||
Grooper will technically allow you to remove a reference to a '''Project''' even with outbound refences outstanding. However, doing so is '''''not''''' best practice as it can cause corruption of your system down the road. | |||
|} | |||
| | |||
[[File:2022-project-references-17.png]] | |||
|- | |||
|valign=top| | |||
# With no references detected from the "Invoices" '''Project''', we can remove the '''Project''' reference without issue. | |||
# Be sure to Save the project when finished. | |||
| | |||
[[File:2022-project-references-18.png]] | |||
|- | |||
|valign=top| | |||
# Next, we need to get rid of the local extractor in the "Invoices" '''Project''' and replace it with the copy we placed in the "Shared Resources" '''Project'''. | |||
# In order to access the extractor in the "Shared Resources" '''Project''', the "Invoices" '''Project''' must reference the "Shared Resources" '''Project'''. | |||
#* Here, we have selected the "Invoices" '''Project'''. | |||
# Using the '''''Referenced Projects''''' property, we have selected the "Shared Resources" '''Project'''. | |||
| | |||
[[File:2022-project-references-19.png]] | |||
|- | |||
|valign=top| | |||
# Now, we can go about the business of reassigning any reference to our local extractor to the one in our "Shared Resources" '''Project'''. | |||
The quickest way to figure out every object that references a selected object in the node tree, is to use the "References" tab. | |||
#<li value=2> To access this tab, (after selecting the object whose references you want to verify) select the "Advanced" tab. | |||
# Then, select the "References" tab. | |||
# This will list every object that references the object. | |||
#* In our case, there's one '''Data Type''' extractor ("VE - Invoice Total") and three '''Data Column''' objects ("Quantity" "Price" and "Extended Price") referencing the selected extractor ("VAL - Generic Decimal") | |||
What we could do from here is track down each of these objects, find where in their property grid the extractor is referenced, and reassign that reference to the version in the "Shared Resources" '''Project'''. That is a perfectly acceptable, although somewhat time consuming way to reassign references. Luckily, we have a shortcut available to us. | |||
| | |||
[[File:2022-project-references-20.png]] | |||
|- | |||
|valign=top| | |||
The "Reassign References..." button will allow us to change the reference for each object in the list from the selected object, to a different one. | |||
This is exactly what we want to do. We want to change the reference set on these '''Data Columns''' and '''Data Type''' from the "VAL - Generic Decimal" extractor in the "Invoices" '''Project''' to the copy we made in the "Shared Resources" '''Project'''. | |||
# Press the "Reassign References..." button. | |||
# This will bring up a window to select a new object for the reference. | |||
# Check it out. Here's our referenced "Shared Resources" '''Project'''. | |||
# Selecting the "VAL - Generic Decimal" '''Value Reader''', we will reassign the reference to this extractor in our "Shared Resources" '''Project'''. | |||
# Press "OK" to finish reassigning the references. | |||
| | |||
[[File:2022-project-references-21.png]] | |||
|- | |||
|valign=top| | |||
# Because the extractor is no longer referenced by any other object, the "Referenced By" list is now empty. All the objects that were listed here, are now referencing the extractor we chose in our "Shared Resources" '''Project'''. | |||
#* In other words, we reassigned the references. | |||
#* We've effectively replaced the local '''Project's''' decimal extractor with one in an external '''Project''', accessible to any other '''Project''' that references it. | |||
# Since no other object references the local decimal extractor, and we've replaced its references with something else, it is now safe to delete it. | |||
| | |||
[[File:2022-project-references-22.png]] | |||
|- | |||
|valign=top| | |||
As we've demonstrated, it's a little extra work if you decide you want to move resources from one '''Project''' to a shared resources '''Project'''. However, the benefit to organizing assets like this is any '''Project''' referencing our "Shared Resources" '''Project''' now have access to its assets. | |||
# For example, we could tell our "Human Resources" '''Project''' to reference the "Shared Resources" '''Project'''. | |||
# Now, both the "Human Resources" '''Project''' and the "Invoices" '''Project''' have access to its resources. | |||
#* Furthermore, any changes we make to the object in the "Shared Resources" '''Project''' will be reflected by any object in any '''Project''' that touches it. This can prevent duplication of efforts when updating an object's properties. | |||
#* If any other '''Projects''' or any future '''Projects''' can make use of these resources, all you have to do is assign it a reference to the "Shared Resources" '''Project'''. It acts as one big community bucket of resources other '''Projects''' can draw from. | |||
| | |||
[[File:2022-project-references-23.png]] | |||
|} | |||
</tab> | |||
:[[#Using Resources in Other Projects|Click here to return to the top of the tab]] | |||
</tabs> | |||
=== The Essentials Project === | |||
{|cellpadding=10 cellspacing=5 | |||
|valign=top style="width:40%"| | |||
Every newly created Grooper Repository in version '''2022''', will come with a '''Project''' named "Essentials". This '''Project''' contains several resources you may find useful when designing your document processing assets. Just like any other '''Project''', you can access these resources by making a reference to the "Essentials" '''Project'''. The objects contained within can be examples of different types of objects you create, resources you can copy into your own '''Projects''' and build on top of, or simply resources you directly reference in your '''Projects'''. | |||
| | |||
[[File:2022-project-essentials-01.png]] | |||
|- | |||
|valign=top| | |||
In this project you will find various: | |||
# '''Data Type''' and '''Value Reader''' Extractors | |||
# '''Lexicons''' | |||
# Profile objects ('''OCR Profiles''', '''IP Profiles''' etc.) | |||
| | |||
[[File:2022-project-essentials-02.png]] | |||
|} | |||
== Projects and Upgrading to 2022 == | |||
{|cellpadding=10 cellspacing=5 | |||
|valign=top style="width:40%"| | |||
'''Projects''' are a new way of organizing Grooper resources in version '''2022'''. In previous versions, Grooper resources were organized primarily in one of three folders in the node tree: | |||
* The '''Content Models''' folder | |||
* The '''Global Resources''' folder | |||
* The '''Processes''' folder of the '''Batch Processing''' folder | |||
Users would have to go back and forth between these locations in order to configure what they needed to process documents through Grooper. This often resulted in a time consuming and cumbersome process, sifting through the node tree's hierarchy to get to the objects you needed. | |||
'''Projects''' simplify this issue by allowing you to place all associated resources for a given use case (or "project") in a single node location. | |||
You can see the difference in the image to the right. All the required Grooper assets for one single document processing project are highlighted. | |||
Before the introduction of '''Projects''' in '''2022''', these objects were interspersed throughout various locations in the node tree. | |||
In version '''2022''', everything can be neatly placed in one, single location, making finding what you're looking for much simpler. | |||
{|cellpadding="12" cellspacing="4" style="margin:12px" | |||
|- | |||
|style="font-size:125%; background-color:#36b0a7; color:white; width:28px; text-align:center"|'''FYI''' | |||
|style="border: 4px solid #36b0a7"| | |||
If you have certain Grooper resources that can be used by multiple '''Projects''' (such as extractors, profile objects, or '''CMIS Connections'''), you can grant multiple '''Projects''' access to them through '''Project''' references. | |||
For more information, visit the [[#Referencing Objects in Other Projects]] section of this article. | |||
|} | |||
|valign=top| | |||
[[File:2022-project-upgrade-about-01.png]] | |||
|- | |||
|valign=top| | |||
=== What Happens When You Upgrade? === | |||
Obviously, this architecture is much different than how your assets are currently organized in Grooper. So, what's going to happen when you upgrade? | |||
# Upon upgrading to version '''2022''', most Grooper objects in your repository will simply be placed into a new '''Project''' named "Project 1". | |||
# All '''Content Models''' will be organized into a folder named "Content Models" | |||
# All working '''Batch Processes''' will be placed in a folder named "Processes" within the '''Project''' | |||
# Any published '''Batch Processes''' will be placed in the "Processes" folder at the first level of the node tree. | |||
Anything in the "Global Resources" folder will be placed throughout "Project 1" | |||
#<li value=5> If these objects were organized into a subfolder in the "Global Resources" folder, a folder of the same name will be created. | |||
#* For example, in this Grooper Repository, there was a folder named "HR Docs Resources", containing a handful of Grooper objects. Upon upgrading, a folder of the same name, containing the same objects, was placed in "Project 1" | |||
# Any unfoldered objects in the "Global Resources" folder will be placed at the first level of the "Project 1" '''Project'''. | |||
# Last, all '''Production''' and '''Test Batches''' will be placed in the "Batches" folder at the first level of the node tree. | |||
| | |||
[[File:2022-project-upgrade-project1-01.png]] | |||
|} | |||
=== Deciding What to Do Next === | |||
It's important to point out your Grooper environment will work just fine with everything organized into the single "Project 1" '''Project'''. You can leave everything as is in "Project 1" upon upgrading to version '''2022''' and continue processing '''Batches''' of documents as if nothing happened. | |||
Going forward you have two options: | |||
# Do nothing. Leave all Grooper resources organized into "Project 1" | |||
# Migrate resources into their own '''Projects'''. | |||
You should consider this an "all or nothing" choice. There are some significant benefits to organizing resources into their own '''Projects''', but it should not be done haphazardly. You will not see the true benefits of this new architecture if you take a "half in/half out" approach. That said, migrating resources to new '''Projects''' will take time. There are some utilities that will aid you in this task, but there will necessarily be some manual moving of objects from one node location to another. | |||
So, should you migrate away from "Project 1" at all? Here are some things to keep in mind, when making this decision. | |||
# '''It's all or nothing.''' | |||
#* Again, we stress the importance of committing to the move. You should commit to migrating everything to new '''Projects''' (with the exception of a handful of shared resources), rather than just a few. The benefits of the '''Project''' architecture will not be realized until you've completed the entire process. Not following this advice increases the likelihood of a time-sensitive call to the help-desk in the future. This call will likely be time-consuming as we attempt to track down the issue through a partially architected system. | |||
# '''You don't have to move things from "Project 1" at all." | |||
#* If you do not have the time or resources to migrate out of "Project 1", it's best to leave everything in "Project 1". Everything will continue to work as it did previously. | |||
# '''Do you have time to do it?''' | |||
#* This is probably the biggest question you need to ask yourself. The migration will take time. The larger the repository is, with many '''Content Models''', '''Batch Processes''', profiles and other objects, the longer it's going to take. | |||
# '''Do you have a lot of "shared resources"?''' | |||
#* If you frequently have individual '''Data Types''', '''Lexicons''', profiles, '''CMIS Connections''' or other objects used across many different '''Content Models''' and '''Batch Processes''', this will take the highest amount of time and effort to migrate. Ensuring these shared resources are accessible to each '''Project''' created is the most time consuming part of any migration out of "Project 1". | |||
# '''Do you frequently promote objects from a "test" or "dev" Grooper Repository to a "production" Grooper Repository?''' | |||
#* If so, '''Projects''' are for you. The new architecture provides multiple advantages to this kind of workflow. You should seriously consider devoting the time to migrate resources into their own '''Projects''', if you maintain multiple environments to publish Grooper objects from development to production repositories. | |||
# '''Do you use third-party data entry companies to review work in Grooper?''' | |||
#* If so, '''Projects''' are for you. You'll benefit from being able to push complete and tidy project packages to an environment dedicated to that company. | |||
# '''Do you have multiple Grooper engineers working in the same Grooper Repository(ies)?''' | |||
#* If so, '''Projects''' are ''really'' for you. Aside from object organization, the other big reason for creating the '''Project''' architecture was to maintain object reference integrity. '''Projects''' will greatly assist you in preventing reference corruption in your Grooper environments. | |||
=== Project Migration Plan === | |||
Ok, you've decided '''Projects''' are for you, and you want to move resources out of "Project 1" to best take advantage of them. What are the next steps forward? | |||
We've narrowed the process down to seven general steps: | |||
# Clean up your repository. Delete items that are no longer in use and will not be used in the future. | |||
# Use the "Create Project" feature for each '''Batch Process'''. | |||
# As each '''Project''' is created, rename any objects as needed if your prior naming conventions no longer make sense. | |||
# For each '''Project''', use the "Analyze References" feature to decide what to do about "shared resources" used by multiple '''Projects'''. | |||
# Remove '''Project''' references if the "Outbound References" list is empty. | |||
# Reorganize any shared resource objects that remain in "Project 1" | |||
# Rename "Project 1" to something like "Global Resources" or "Shared Resources" | |||
<tabs style="margin:20px"> | |||
<tab name="1. Clean House" style="margin:20px"> | |||
=== 1. Clean House === | |||
{|cellpadding=10 cellspacing=5 | |||
|valign=top style="width:40%"| | |||
If you're going to take the time to reorganize your resources into '''Projects''', now is a good time to take a look at the Grooper objects in your repository and get rid of anything not in use littering up your environment. | |||
This is entirely optional, but now is as good a time as any to clean house. | |||
# For example, we have a dummy '''IP Profile''' named "temp" and '''Value Reader''' named "test" in our newly upgraded "Project 1" '''Project'''. | |||
#* I have no idea where these objects came from or what their original purpose was. They aren't being used by anything else in this environment. It's best to just delete them to get them out of the way. | |||
| | |||
[[File:2022-project-upgrade-steps-01.png]] | |||
|} | |||
</tab> | |||
<tab name="2. Create Project" style="margin:20px"> | |||
=== 2. Create Project === | |||
Now we can start in earnest and create some '''Projects'''. You could do this manually. The steps would be as follows. | |||
# Add a '''Project''' to the "Projects" folder. | |||
# Using the '''Project's''' '''''Referenced Projects''''' property, reference "Project 1". | |||
#* For more information on referencing '''Projects''', please review the [[#Referencing Objects in Other Projects]] section of this article. | |||
# Move a '''Batch Process''' to that '''Project'''. | |||
# Move the '''Content Model''' associated with that '''Batch Process''' to the '''Project'''. | |||
# Move any other Grooper objects referenced by the '''Batch Process''' or '''Content Model's''' objects to the '''Project'''. | |||
#* Or keep any "shared resources" put in "Project 1", maintaining access to them through the '''Project''' reference (We'll discuss this further in Step 4: Analyze References). | |||
There's nothing wrong with this approach, but there's a quicker way of doing things (or at least starting this process) using the "Create Project" feature. | |||
The "Create Project" feature is accessed by selecting a '''Batch Process'''. If you think about it, a '''Batch Process''' should reference any Grooper object necessary to do work for a particular use case. All the necessary objects will be referenced in the steps of the '''Batch Process''' as part of its execution, such as a '''Content Model''' referenced for a '''Classify''' step or an '''OCR Profile''' referenced for a '''Recognize''' step. | |||
The "Create Project" utility will create a new '''Project''', named the same as the '''Batch Process's''' name, look for any objects referenced as part of its execution, and move them to the new '''Project'''. | |||
Important! "Create Project" will only move objects ''not'' referenced by anything else. If another '''Batch Process''' uses the same '''OCR Profile''', for example, that '''OCR Profile''' will remain in "Project 1". We will discuss this further in "Step 4: Analyze References". | |||
{|cellpadding=10 cellspacing=5 | |||
|valign=top style="width:40%"| | |||
We will start by creating a new '''Project''' using a fairly simple document redaction '''Batch Process'''. This is an entirely "self-contained" '''Batch Process'''. No other '''Batch Process''' utilizes its resources. | |||
# This is the '''Batch Process''' we will create the '''Project''' from. | |||
# It references this '''Content Model''', including resources in its '''Local Resources''' folder. | |||
# Specifically, the "URLA" '''Document Type''' is referenced as the '''Extract''' step's '''''Default Content Type'''''. Since this '''Document Type''' is referenced, its parent '''Content Model''' (and all its children, including '''Local Resources''' folder and '''Data Model''') will be moved to the new '''Project'''. | |||
# This '''OCR Profile''' is also referenced (by the '''Recognize''' step). So, it will move to the new '''Project''' as well. | |||
|valign=top| | |||
[[File:2022-project-upgrade-steps-02.png]] | |||
|- | |||
|valign=top| | |||
To create the '''Project''', perform the following steps. | |||
# Select the '''Batch Process''' you wish to use to create the new '''Project'''. | |||
# Press the "Create Project" button in the '''Batch Process's''' toolbar. | |||
There are two configurable options when creating the new '''Project'''. | |||
#<li value=3> The '''''Remove Emptied Folders''''' property will delete a folder from "Project 1" if it is empty after objects are moved to the new '''Project'''. | |||
#* Generally speaking, you'll want this property set to ''True''. It cleans up empty folders in "Project 1". Why would you want to keep a bunch of empty folders around? I don't know. If you have a reason to keep these empty folders, you can keep this property ''False''. | |||
# The '''''Organize Into Folders''''' button will create folders in the new '''Project''' for each type of object moved. | |||
#* If this property is set to ''True'' an "OCR Profiles" folder would be created for any '''OCR Profile''' moved to the new '''Project'''. A "Content Models" folder would be created for any '''Content Models''' moved. | |||
#* Most people will elect to keep this property ''False'', as you probably want to establish your own organizational structure to your '''Project'''. However, this option is present if you find it helpful when initially moving objects to the '''Project''' to group like objects into like folders. | |||
# Press "Execute" to create the '''Project'''. | |||
|valign=top| | |||
[[File:2022-project-upgrade-steps-03.png]] | |||
|- | |||
|valign=top| | |||
When the utility finishes running, a new '''Project''' will be created. All objects associated with the '''Batch Process''' are moved from "Project 1" to the new '''Project''' (as long as that move is allowed. Again, we'll talk more about moves that aren't allowed during Step 4). | |||
# In our case, this '''Project''' named "URLA Redaction" was created. | |||
#* The new '''Project''' will always be named after the '''Batch Process'''. | |||
# A total of three objects were moved from "Project 1" and placed in the new '''Project'''. | |||
#* This was a relatively simple '''Batch Process'''. More complicated '''Batch Processes''' will more likely than not have more objects referenced, and therefore, more objects moved. But, you can pretty much guarantee you'll at least end up with the '''Batch Process''' and a '''Content Model''' in the new '''Project'''. With few exceptions, you're always going to need a '''Batch Process''' and a '''Content Model''' to do work in Grooper. In general, each '''Project''' will have one '''Batch Process''' and one '''Content Model'''. | |||
{|cellpadding="10" cellspacing="5" | |||
|-style="background-color:#36b0a7; color:white" | |||
|style="font-size:14pt"|'''FYI'''||Grooper renamed our '''Content Model''' to "Content Model". Why? In "Project 1", the '''Batch Process''' and '''Content Model''' were both named "URLA Redaction". Object names in the same branch of the node tree must be unique. Grooper will rename any object sharing the name of the source '''Batch Process''' after their object type. Therefore the '''Content Model''' named "URLA Redaction" was renamed "Content Model". | |||
|} | |||
|valign=top| | |||
[[File:2022-project-upgrade-steps-04.png]] | |||
|} | |||
</tab> | |||
<tab name="3. Rename Resources" style="margin:20px"> | |||
=== 3. Rename Resources === | |||
With the switch to the '''Project''' architecture, you may find your naming convention no longer makes sense or could be adjusted. Much like the "Clean Up" step, this step is not strictly necessary. But, if you're going through the effort to reorganize your repository into a new structure, you might as well make sure how you're naming things make sense in that new structure. | |||
{|cellpadding=10 cellspacing=5 | |||
|valign=top style="width:40%"| | |||
If you're coming from an environment with a lot of '''Batch Processes''' and a lot of '''Content Models''' you've probably named your resources according to their intended use case. So you might have "Use Case X Content Model", "Use Case X OCR Profile" "Use Case X IP Profile" and so on. You may find this naming superfluous once all these assets are moved over to a '''Project'''. So, it might make more sense to you to just rename these objects after their generic object type or function in the '''Batch Process''' workflow. | |||
# For example, we've renamed our '''OCR Profile''' from "URLA OCR" to simply "OCR" | |||
# Name your resources whatever makes sense to you in your environment. "Batch Process" may be too generic of a name if you're executing multiple different '''Batch Processes'''. We went ahead and stuck with "URLA Redaction" for our '''Batch Process''' here. | |||
|valign=top| | |||
[[File:2022-project-upgrade-steps-05.png]] | |||
|} | |||
</tab> | |||
<tab name="4. Analyze References" style="margin:20px"> | |||
=== 4. Analyze References === | |||
So far, this process has been fairly simple. With the press of the "Create Project" button, all resources associated with a '''Batch Process''' are moved to a new '''Project'''. Our previous example was so simple, because all the resources were fairly self-contained, or "local" to the '''Batch Process'''. | |||
That is not always the case. Particularly with larger environments, you will find you reuse resources across a variety of '''Batch Processes'''. For example, the "Full Text - Accurate" '''OCR Profile''', in our "Essentials" downloads, is many Grooper users "go to" '''OCR Profile''' if they don't have the time or the will to create their own. This '''OCR Profile''' would be a shared or "global" resource, touched by multiple different '''Batch Processes'''. | |||
When you have shared resources, they will ''not'' be copied over to a newly created '''Project''' when using the "Create Project" feature. They can't. Other '''Batch Processes''' need to use that resource too. Instead, all resources that ''are not'' shared are moved to a newly created '''Project''', and a '''Project''' reference is made to "Project 1". The '''Project''' reference allows the new '''Project''' access to the resources it needs in "Project 1". | |||
{|cellpadding=10 cellspacing=5 | |||
|valign=top style="width:40%"| | |||
# We will create '''Projects''' for these two '''Batch Processes''' next. | |||
# However, they share a number of resources in their '''Batch Process''', two '''IP Profiles''' and a '''CMIS Connection'''. | |||
# For example, they both use the "Image Cleanup - Permanent" '''IP Profile''' in executing the '''Image Processing''' steps of their '''Batch Processes'''. | |||
|valign=top| | |||
[[File:2022-project-upgrade-steps-06.png]] | |||
|- | |||
|valign=top| | |||
Next, we will press the "Create Project" button to create a new '''Project''' from each '''Batch Process''', starting with the "Invoices Process" '''Project'''. | |||
# Select the '''Batch Process''' you wish to use to create the '''Project'''. | |||
#* In our case, we're starting with the '''Batch Process''' named "Invoices Process". | |||
# Press the "Create Project" button. | |||
# Configure the '''Project''' creation properties as desired. | |||
# Press the "Execute" button to create the '''Project'''. | |||
|valign=top| | |||
[[File:2022-project-upgrade-steps-07.png]] | |||
|- | |||
|valign=top| | |||
# A new '''Project''' is created and several resources have been moved. | |||
# However, select the '''Project''' created. | |||
# Notice the '''''Referenced Projects''''' property shows a '''Project''' reference to "Project 1". | |||
This indicates there is something in "Project 1" the '''Batch Process''' utilizes that ''can't'' be moved because another '''Batch Process''' (or its associated objects) utilize it in one way or another. | |||
Essentially, ''both'' '''Batch Processes''' need to reference one or more objects. So those objects stay put in "Project 1" and are accessible through the '''Project''' reference. By referencing "Project 1", the '''Project''' we just created has access to all its resources, including whatever it is it needs to function. | |||
So, just what resources are out in "Project 1" that our new '''Project''' needs? Good question. You can quickly answer this with the "Analyze References" feature. | |||
|valign=top| | |||
[[File:2022-project-upgrade-steps-08.png]] | |||
|- | |||
|valign=top| | |||
# Press the "Analyze References" button to view all objects in "Project 1" referenced by the selected '''Project'''. | |||
# This will bring up a list of outbound references. | |||
#* These are references objects in the selected '''Project''' make ''out'' to external '''Projects''' listed in the '''''Referenced Projects'''''. | |||
# Any referenced object will be listed. | |||
#* In our case, we made references to three objects in the "Shared Resources" folder of "Project 1" | |||
#** Two '''IP Profiles''': "Image Cleanup - OCR" and "Image Cleanup - Permanent" | |||
#** One '''CMIS Connection's''' ("NTFS - Local Hard Drive") '''CMIS Repository''' ("Import Export") | |||
|valign=top| | |||
[[File:2022-project-upgrade-steps-09.png]] | |||
|} | |||
Now that we know what is shared. We have two options: | |||
# Copy these resources from "Project 1" so that we have local copies of these resources. | |||
# Keep these shared resources put so that every '''Project''' that needs them can reference the same object. | |||
Your choice will largely depend on how big your environment is, how many times the resources in "Project 1" are referenced by different potential '''Projects''', and if you prefer to have local copies of these resources that can be edited independently or if you want these resources to be truly shared across different '''Projects''' (meaning changing one single object will impact how multiple '''Projects''' implement it). | |||
=== Option 1: Copy the Resources === | |||
The process will be most time consuming if you want to copy these objects over to the '''Project'''. However, doing so does have benefits. Once you copy these shared resources from "Project 1", you will no longer need the reference to "Project 1". At that point the '''Project''' is independent and self contained. If you need to share this '''Project''' with another Grooper environment (say promoting it from a "test" repository to a "development" repository) it will have no dependencies to another '''Project''' that need to be shared along side it. | |||
Aside from the time it takes to do this, the only drawback is the resources are then completely local to the '''Project'''. Changes to the versions copied from "Project 1" will be separate objects from the versions copied to the '''Project'''. This means they must be edited independently if you want to make changes to them. | |||
{|cellpadding=10 cellspacing=5 | |||
|valign=top style="width:40%"| | |||
Copying these resources over is not necessarily the hard part. It's even easier in this case because they're already all in the same folder. | |||
# We can simply copy the whole folder. | |||
# Or we can select the folder, and go to the "Contents" tab. | |||
# Then, multi-select all the objects we want to copy. | |||
#* In this case, all of them. | |||
# Paste them into the '''Project''' you want. | |||
{|cellpadding="10" cellspacing="5" | |||
|-style="background-color:#36b0a7; color:white" | |||
|style="font-size:14pt"|'''FYI'''||Depending on the reference complexity of the objects you are copying, this process will be more complicated. It will require you to track down the referenced objects, and bring them over either a) before you copy and paste the object referencing it (which will require you to reset the reference after everything is copied over) or b) copy and paste everything at the same time. | |||
For more information on copying and pasting objects from one '''Project''' to another, please refer to the [[#Referencing Objects in Other Projects]] | |||
|} | |||
|valign=top| | |||
[[File:2022-project-upgrade-steps-10.png]] | |||
|- | |||
|valign=top| | |||
# Now we have local copies in our '''Project''' of these referenced resources from "Project 1". | |||
|valign=top| | |||
[[File:2022-project-upgrade-steps-11.png]] | |||
|- | |||
|valign=top| | |||
Now the time consuming part of this process. All the references must be reassinged from the source objects in "Project 1" to their local copies in the '''Project'''. | |||
# For example, this '''OCR Profile''' uses one of the '''IP Profiles''' we just copied. | |||
# We would need to select the property where the reference is assigned. | |||
#* The '''''IP Profile'''''' property in this case. | |||
# Then, we'd need to change the reference from the source object in "Project 1" | |||
# ...to the local copy in our '''Project'''. | |||
The more resources you copy over, and the more complex references and sub-references you're faced with, the more time consuming this process will take. | |||
| | |||
[[File:2022-project-upgrade-steps-12.png]] | |||
|} | |||
=== Option 2: Keep Project 1 Shared === | |||
{|cellpadding=10 cellspacing=5 | |||
|valign=top style="width:40%| | |||
The other option is to just keep these resources shared. Your '''Batch Process''' will function as it did before. The only difference is your '''Project''' will be dependent on the referenced '''Project''' ("Project 1") to function. | |||
# Here, we used the "Create Project" button to create a new '''Project''' from the last '''Batch Process''' in the Grooper Repository. | |||
# We've used the "Analyze References" button to see what's referenced in "Project 1" | |||
# It's utilizing one '''IP Profile''' and one '''CMIS Connection''' we saw in the previous example. | |||
# It's also utilizing some of the resources in our Grooper Essentials package. | |||
If you're fine with keeping these as shared resources in a referenced '''Project''', you're done here. There's no need to go through the time consuming, copy and paste and reference reassignment dance we did earlier. The only potential drawback here is you've made this '''Project''' dependent on "Project 1". You will need to evaluate for yourself whether this is a drawback, a benefit, or doesn't really matter one way or another. | |||
|valign=top| | |||
[[File:2022-project-upgrade-steps-13.png]] | |||
|} | |||
=== Option 3: Option 1 + Option 2 === | |||
There's also no reason why you can't copy some items over to your new '''Project''' and keep others referenced through '''Project''' references. | |||
For example, in our first example, we could have copied over the two '''IP Profiles''' but kept the '''CMIS Connection''' as a shared resource. In fact, that would have made more sense. If we have to make changes to a '''CMIS Connection''' (like entering new access permissions), we would only want to do that once by manipulating one single object, rather than reproducing our efforts by editing multiple copies of the same object in multiple '''Projects'''. | |||
</tab> | |||
<tab name="5. Remove Project References" style="margin:20px"> | |||
=== 5. Remove Project References === | |||
Now, we're at a point where we've used the "Create Project" feature for every '''Batch Process''' in "Project 1". What's next? | |||
In our situation, we've created some '''Projects''' that need to retain a reference to "Project 1" and some that do not. For those that do not, we should go ahead and remove the reference to '''Project 1'''. To do this, the "Analyze References" feature will once again be useful. For each '''Project''' we will want to use the "Analyze References" button to check and see if there are any outbound references. If there are, we'll keep the reference intact. If there are not, we will remove the reference. | |||
{|cellpadding=10 cellspacing=5 | |||
|valign=top style="width:40%| | |||
# We'll start with the "Human Resources" '''Project''' | |||
# Press the "Analyze References" button. | |||
# There are outbound references. | |||
So, we do nothing. | |||
#<li value=4> The reference to "Project 1" must stay. | |||
| | |||
[[File:2022-project-upgrade-steps-14.png]] | |||
|- | |||
|valign=top| | |||
# Next, we'll analyze the "Invoices Process" '''Project's''' references. | |||
# Press the "Analyze References" button. | |||
# There are not any outbound references. | |||
Now, we know it's safe to remove the reference to "Project 1". | |||
|valign=top| | |||
[[File:2022-project-upgrade-steps-15.png]] | |||
|- | |||
|valign=top| | |||
# Using the '''''Referenced Projects''''' property, remove the reference to "Project 1" | |||
# Press "Save" when finished. | |||
{|cellpadding="10" cellspacing="5" | {|cellpadding="10" cellspacing="5" | ||
| Line 433: | Line 910: | ||
|style="font-size:22pt"|⚠||You should '''''always''''' use the "Analyze Reference" button before removing a reference to a '''Project'''. | |style="font-size:22pt"|⚠||You should '''''always''''' use the "Analyze Reference" button before removing a reference to a '''Project'''. | ||
Grooper will technically allow you to remove a reference to a '''Project''' even with outbound | Grooper will technically allow you to remove a reference to a '''Project''' even with outbound refences outstanding. However, doing so is '''''not''''' best practice as it can cause corruption of your system down the road. | ||
|} | |} | ||
|valign=top| | |||
[[File:2022-project-upgrade-steps-16.png]] | |||
|- | |||
|valign=top| | |||
# Continue selecting your '''Projects''' to analyze their references. | |||
# Once all '''Projects''' with no outstanding references to "Project 1" have had their '''Project''' reference to "Project 1" removed, you are done. | |||
| | | | ||
[[File:2022-project-references- | [[File:2022-project-upgrade-steps-17.png]] | ||
|} | |||
</tab> | |||
<tab name="6. Reorganize Shared Resources" style="margin:20px"> | |||
=== 6. Reorganize Shared Resources === | |||
After you're done creating '''Projects''' from the '''Batch Process''' in "Project 1", you'll want to clean up the remaining resources in "Project 1". | |||
This will include: | |||
# Organizing any leftover assets into manually created '''Projects''', if applicable. | |||
# Organizing any remaining shared resources into folders and deleting empty folders, as you see fit. | |||
=== Manually Creating Projects === | |||
There are some situations where you may need to manually create a '''Project''' for certain assets remaining in "Project 1". Most commonly, this can happen if you have resources you've created for testing purposes that are not tied to a '''Batch Process''' and you want to keep them around. | |||
{|cellpadding=10 cellspacing=5 | |||
|valign=top style="width:40%"| | |||
# For example, we have two '''Content Models''' leftover. | |||
# We also have this '''IP Profile''' | |||
It seems like these are partially architected resources from whenever this Grooper user went through Grooper's ACE - Architect training. We may want to keep these around so this user can continue their training. | |||
The only issue is there is not '''Batch Process''' utilizing these resources, so we can't take advantage of our "Create Project" utility. | |||
|valign=top| | |||
[[File:2022-project-upgrade-steps-18.png]] | |||
|- | |||
|valign=top| | |||
So, we'll need to manually create a '''Project''' | |||
# Right click the '''Projects''' folder in the node tree. | |||
# Select "Add", then "Project..." | |||
# Name the new '''Project'''. | |||
# Press "Ok" when finished. | |||
|valign=top| | |||
[[File:2022-project-upgrade-steps-19.png]] | |||
|- | |||
|valign=top| | |||
From here, you can cut and paste or move resources from "Project 1" to the new '''Project'''. | |||
# For example, we moved this '''IP Profile''' from "Project 1" to the "ACE Training" '''Project''' we created. | |||
Next, we're going to look at a potential issue you may encounter when moving resources out of "Project 1" to another '''Project'''. | |||
|valign=top| | |||
[[File:2022-project-upgrade-steps-20.png]] | |||
|- | |||
|valign=top| | |||
# I want to move either of these two '''Content Models''' from "Project 1" to the "ACE Training" '''Project'''. | |||
# If I try to move one, I get the following error message. | |||
#* This is telling me there is a reference violation. The '''Content Model''' is dependent on some external reference, and can't be moved without resolving the reference. | |||
Why did this happen? Long story short, someone did something they should have never done, but was technically allowed in previous versions of Grooper. | |||
|valign=top| | |||
[[File:2022-project-upgrade-steps-25.png]] | |||
|- | |||
|valign=top| | |||
The problem is someone made a reference within a '''Content Model''' to something in ''another'' '''Content Model'''. | |||
# This '''Data Field''' is causing the problem. | |||
# Its '''''Value Extractor''''' is set to ''Reference'' a '''Data Type'''. | |||
# The problem is the '''Data Type''' it's referencing is in a ''different'' '''Content Model'''. | |||
This is a big "no-no" that was technically possible in Grooper, but never considered best practice. This may have happened by accident when copying a '''Content Model'''. This may have been a "quick fix" that some Grooper designer did, intending to go back and resolve but never got around to it. Who knows. The main issue is these types of reference violations can cause problems down the road, potentially causing corruption in your Grooper environment. | |||
Part of the reason '''Projects''' were created was to avoid this type of corruption due to improperly referenced objects. Referencing '''Projects''' via the '''''Referenced Projects''''' property makes external resource references much more intentional, avoiding accidental reference violations (as much as possible). | |||
|valign=top| | |||
[[File:2022-project-upgrade-steps-21.png]] | |||
|- | |||
|valign=top| | |||
In order to resolve this reference violation and get these objects moved to the new '''Project''' we need to resolve the reference violation in one way or another. | |||
# We could clear the offending reference, or reset it to something that doesn't violate a reference across multiple '''Content Models'''. | |||
# Or, since this '''Content Model''' is really just a copy of the other one, we could just delete it. | |||
#* This is what I will elect to do. | |||
| | |||
[[File:2022-project-upgrade-steps-22.png]] | |||
|- | |||
|valign=top| | |||
# With the reference violation resolved, the '''Content Model''' can be moved with no issue. | |||
|valign=top| | |||
[[File:2022-project-upgrade-steps-23.png]] | |||
|} | |||
=== Clean Up Remaining Folders === | |||
After you've moved everything out of "Project 1" into a new '''Project''' (whether manually or through the "Create Project" feature), the only resources left should be shared resources, objects intended to be used by any current or future '''Project'''. | |||
{|cellpadding=10 cellspacing=5 | |||
|valign=top style="width:40%"| | |||
All that's left is to organize the objects and folders remaining. | |||
# You may have empty folders you can simply delete. | |||
FYI: This "Content Models" folder will be "Read Only". This is a carry over from this folder being a system node in the previous version of Grooper's architecture. | |||
#<li value=2> To delete it, you will need to select the folder, then go to the "Advanced" tab. | |||
# Then, under '''''Attributes''''', change '''''Read Only''''' to ''False''. | |||
#* Then navigate off the object. You will be prompted to "Save". After saving, you will be able to delete it. | |||
You may also want to create some new folders for stray objects or move some folders to the root of the '''Project'''. | |||
#<li value=4> For example, many users find it helpful to move the Grooper "Essentials" folder to the root of the '''Project''', then delete the "Downloads" folder. | |||
| | |||
[[File:2022-project-upgrade-steps-24.png]] | |||
|} | |||
</tab> | |||
<tab name="7. Rename Project 1" style="margin:20px"> | |||
=== 7. Rename Project 1 === | |||
{|cellpadding=10 cellspacing=5 | |||
|valign=top style="width:40%"| | |||
Once all resources are out of "Project 1" and you've organized any shared resources remaining, rename "Project 1" to something that reflects its true utility, something like "Shared Resources" or "Global Resources". | |||
# We named ours "Global Resources". | |||
That's it! The migration from "Project 1" to Grooper's new '''Project''' based architecture is complete!! | |||
|valign=top| | |||
[[File:2022-project-upgrade-steps-26.png]] | |||
|} | |} | ||
</tab> | |||
</tabs> | |||
[[Category:Articles]] | |||
[[Category:Version 2022]] | |||
Revision as of 13:28, 1 March 2023
A Project is the primary container in which document processing components are created, configured, and organized. It is a library of resources, such as Content Models, Batch Processes, OCR Profiles, Lexicons, and more, needed to process documents through Grooper.
About
After installing and setting up a Grooper Repository, creating a new Project is most likely the first thing you will do when starting work in Grooper Design Studio. A variety of different Grooper assets are required to process documents. A Content Model is required to classify documents and extract their data according to that classification. An OCR Profile is required to perform optical character recognition to get machine readable text from scanned pages. A Batch Process is required to define the step-by-step instructions to process documents from start to finish. A Project allows you to house these various resources related to a processing use case in one location.
|
Imagine you're processing vendor invoices. Pretty much anything and everything you need to process these documents can be organized into a Project.
|
|
|
How you organize objects in your Project is largely up to you. However, in service of this task, be aware you can add any number of folder levels to your Project.
|
What's With That Processes Folder?
|
If you're new to Grooper (or version 2022) you may be asking yourself, "What's with that "Processes" folder in the node tree?" As mentioned before, one of the things a Project can (and should) house is a Batch Process. If a Project can hold a Batch Process what does the Processes folder hold?
|
|
|
When adding and configuring a new Batch Process, you will always add it to a Project first. As you are editing it, you do not want it to be "live" or usable in a production-level environment as documents are coming into Grooper. This would cause partially or improperly processed documents to come through Grooper. So, while you are working on a Batch Process it is a working Batch Process. Once that Batch Process is finished and ready to be implemented in a production-level environment, it is then published (using the "Publish" button in the Batch Process object's UI). This creates a read-only copy of the working Batch Process in the Processes folder. Production-level Batches only have access to Batch Processes in the Processes folder, ensuring they are processed using only published processing instructions, not working ones. |
Adding a New Project
Add a Project
|
Projects are added to the Projects folder node in the node tree.
|
|
|
Add Resources to the Project
The following Grooper objects can be added to a Project
|
|
Extractor objects
|
Profile objects
|
Data integration objects
|
Other objects
|
|
So, how do you add them to a Project? Much like you would add an item to a node tree folder in Grooper.
|
|
|
What About Batches?
|
One thing you cannot add to a Project are Batches. This includes Test as well as Production Batches.
|
|
|
Test Batches can be accessed by any Grooper object with a Batch Selector in its UI.
|
Referencing Objects in Other Projects
Projects are new to version 2022. If you're new to Grooper, this won't mean much to you. Just know Projects are a much better way of organizing and accessing Grooper assets in a node tree structure than in previous versions. (And, if you are upgrading to version 2022, please review the #Projects and Upgrading to 2022 section of this article)
Aside from organizational benefits, one of the big reasons for switching to a Projects based architecture was to maintain reference integrity woven throughout multiple objects in a repository.
Generally speaking, Projects are intended to "silo" the resources contained within. Objects within the Project can freely reference other objects within the same Project but cannot reference objects in other Projects (without being explicitly allowed to do so).
|
For example, in our "Invoices" Project, the "Invoices OCR" OCR Profile references the "Image Cleanup - OCR" IP Profile to perform temporary image processing prior to running OCR.
Generally speaking, maintaining reference integrity is ideal. The more narrowly you can define an object's allowable scope of reference, the better. This makes it easier to track down references, limits the number of object dependencies, making your system easier to manage, and limits possible system corruption down the line if a mess of "reference spaghetti" gets tangled up in one way or another. |
|
|
However, imagine you're working in a different Project. Take our "Human Resources" Project. It makes perfect sense to have these two things separated into two Projects. They're two different use cases. They use different Content Models. They use different Batch Processes. There's good reason to keep the "invoice-y" things in one spot and the "human resource-y" things in another. There's no reason to clutter up our Project related to human resources documents with assets that only pertain to invoices. But, particularly for Grooper users who use Grooper across a variety of use cases, you will run into situations where resources you build for one project can be utilized in another. In these cases, it would be beneficial to share resources so that you don't have to rebuild something you've already developed. Let's say the "Image Cleanup - OCR" IP Profile would also work really well for our human resources (HR) documents. We've already done the work to get that IP Profile working well, and we don't want to duplicate our efforts by recreating it.
|
Using Resources in Other Projects
So, if we want to use an object from an external Project, what can we do? There are three options:
- Directly copy the object from one Project to another.
- Reference the external Project to allow access to its resources.
- Create a shared resources Project that both Projects reference.
Depending on the situation, there will be strengths and weaknesses to each approach. Next, we will detail each option and discuss some of these associated drawbacks.
Option 1: Copying Objects from One Project to Another
For simpler Grooper environments and simple Grooper objects, simply copying the desired object from one Project to another can work out just fine. This option is often the best for the most basic of circumstances.
However, there can be significant drawbacks to this approach. Furthermore, sometimes this option is going to work for you, sometimes not, depending on the reference complexity of the object you're copying.
| FYI |
While the following guidance deals specifically with "copying and pasting", the same follows for "cutting and pasting" or "moving" objects from one Project to another. |
|
Let's go back to our previous example. Long story short, we want to use an IP Profile from the "Invoices" Project in the "Human Resources" Project. There's nothing preventing us from doing this, in this case.
Copying and pasting is a quick and easy solution for getting simple objects from one Project to another. We all know how to copy and paste. This isn't a groundbreaking concept. However, as with many simple things, it's not without its drawbacks. |
|
|
First, be aware these are now two separate objects. One lives in one Project. The other lives in another Project. They are distinct resources. Any changes made to the original object will not be reflected in the copied object (or vice versa).
This is one of the drawbacks to this approach. If you want to make changes to one object, you'll need to make the same changes to the other (assuming you want both objects to reflect the changes). |
|
|
Furthermore, there are situations where Grooper will not let you copy objects from one Project and paste them into another. This is a very intentional part of the Project object's design, done to preserve reference integrity. Grooper allowed us to copy and paste the IP Profile because it did not reference any other object in its original Project. If it did, its functionality would be dependent on that referenced object in the first Project being present in the second Project. Let's look at another example. In our "Invoice" Project's Content Model, we've built some extractor assets, including an address extractor. Let's say we want to bring that extractor into our "Human Resources" Project's Content Model.
|
|
|
If we try to do this, Grooper is going to throw an error. Why? The Data Type, as part of its configuration, references several Lexicon objects.
It also gives us the full node tree location within the Project of both the object doing the referencing (either the object you copied or one of its children) and the referenced object, using the following format:
|
|
Think of Projects like a friend's house. If your friend invites you over, he or she isn't surprised when you show up. But if you show up with a bunch of friends unannounced, they're going to take issue with you. There's now a bunch of random strangers in their house they didn't expect.
That's just like copying and pasting objects with references. Bringing in an object by itself is no big deal, but bringing along who knows how many objects it references is a big deal (Even more so considering any objects the referenced objects reference, and the objects the referenced objects' referenced objects reference and so on down the line). There's now a bunch of random objects you didn't expect cluttering up your Project.
This puts the onus on you, the user, to decide how you want to resolve these references. Again, there are strengths and drawbacks to each approach. It's up to you to decide what works best for your situation.
One thing you could do is copy all the needed referenced objects over to the second Project. Depending on the number of references you're dealing with, this could be a time consuming process, as it would involve the following steps:
- Copy and paste all the referenced objects from the first Project to the second.
- Unassign all the references in the object to be copied from the first Project
- Paste the object from the first Project to the second.
- Reassign all the references in the copied object to all the referenced objects pasted in step 1.
|
Depending on how these objects are organized, you could also copy and paste multiple objects at a time.
Since, we were able to copy the extractor, and a folder containing all the Lexicons it references and paste them all at the same time, Grooper allowed the move without any issue. |
|
|
Keep in mind, however, if you copy a folder, you're going to get everything in that folder.
|
Another option is to use Project references. This gives a Project referenceable access to all resources within one Project to another.
Option 2: Referencing a Project
Resources can be shared between two (or more) Projects by referencing the full Project. This gives explicit access to all objects within a Project, just as if they were created locally.
|
Let's go back to our problem copying an address extractor that references multiple Lexicons from one Project to another.
As we saw previously, Grooper will not allow us to do this (yet). |
|
|
All we need to do in order to make this happen, is effectively tell Grooper it's ok for the "Human Resources" Project to share assets with the "Invoices" Project. We do this by referencing the whole Project.
|
|
|
|
|
Now we can copy and paste all day long.
|
|
|
You may also make direct references to any object in a referenced Project. For example, because we've referenced the "Invoices" Project we could have simply referenced the address extractor without copying and pasting it.
|
This is an effective way of sharing resources between multiple Projects without duplicating your efforts by creating multiple copies of shared resources that you have to manage independently in each Project.
The only downside to this approach lies in how many different Projects utilize a set of shared resources. If it boils down to a limited number of resources, or resources shared between very similar Projects (in terms of their use case), this approach can work out just fine. But when you get into more and more resources shared between more and more Projects the crisscrossed references between them can be difficult to navigate when you're trying to track down a single object used across a variety of Projects.
In those cases, you may want to do a little extra work and create an entirely separate Project just devoted to housing resources shared between multiple Projects. We will discuss creating and utilizing a "Shared Resources" Project in the next tab.
| ⚠ | Please read the following before continuing. It contains best practice advice to avoid potential system corruption when dealing with Project referencing. |
Just as you can make references to other Projects, you can remove those references as well. However, to prevent future corruption down the line, you should always ensure no object in your Project references objects in the other Project before removing its reference.
|
The easiest way to do this is with the "Analyze References" button at the top of Project's UI screen.
These outbound references indicate there are resources in this Project that are dependent on resources in the "Invoices" Project to function. CAUTION!!!! While it is technically possible to remove the reference to a Project without resolving these references, YOU SHOULD NOT DO SO. It is best practice to either:
Please ensure there are no outbound references to the Project before removing the reference. |
The last option is to use an entirely separate Project which is solely devoted to housing objects used and referenced by multiple Projects. This option is most appropriate for larger environments, processing different kinds of documents from different use cases. Given a big enough body of documents, despite the fact they may come from different industries or use cases, you will find commonly used resources that are generalizable across a variety of documents. This can include generic or semi-generic extractors, Lexicons, even IP Profiles and OCR Profiles.
In these cases, it often makes sense to create a "bucket" of resources from which all Projects can draw from. The idea is to create shared resources in a single Project referenced by multiple others. Or, in our case, we're going to move these assets to a "Shared Resources" Project.
| FYI |
Another common example of a shared resources are CMIS Connections and Data Connections. It is often the case that multiple projects will reuse these connection objects to integrate Grooper with external storage platforms (such as content management systems and databases). Therefore, it would make sense to create something like a "Connections" Project containing these CMIS Connections and Data Connections. Instead of re-creating each connection object for each Project, all Projects can simply reference the "Connection" Project to gain access to the CMIS Connections and/or Data Connections required for import/export operations. |
|
For instance, there are some fairly generic extractors in the "Invoices" Project we may want accessible to the "Human Resources" Project and future Projects as well.
|
|||
|
We are going to move these extractors to a new Project, which we will name "Shared Resources".
For the first extractor, this job is very easy.
|
|||
|
|||
|
Here's where we get into the extra work on the front end. What we can do first, is copy the Value Reader. It makes no references to other objects. The issue here is that other objects are referencing it.
|
|||
|
Now, if you truly want to use this as a "shared" or "global" resource, you can reassign all the references to the "VAL - Generic Decimal" extractor within the "Invoices" Project. Ultimately, we will need the "Invoices" Project to reference the "Shared Resources" Project to reassign the references.
|
|||
|
|||
|
|||
The quickest way to figure out every object that references a selected object in the node tree, is to use the "References" tab.
What we could do from here is track down each of these objects, find where in their property grid the extractor is referenced, and reassign that reference to the version in the "Shared Resources" Project. That is a perfectly acceptable, although somewhat time consuming way to reassign references. Luckily, we have a shortcut available to us. |
|||
|
The "Reassign References..." button will allow us to change the reference for each object in the list from the selected object, to a different one. This is exactly what we want to do. We want to change the reference set on these Data Columns and Data Type from the "VAL - Generic Decimal" extractor in the "Invoices" Project to the copy we made in the "Shared Resources" Project.
|
|||
|
|||
|
As we've demonstrated, it's a little extra work if you decide you want to move resources from one Project to a shared resources Project. However, the benefit to organizing assets like this is any Project referencing our "Shared Resources" Project now have access to its assets.
|
The Essentials Project
|
Every newly created Grooper Repository in version 2022, will come with a Project named "Essentials". This Project contains several resources you may find useful when designing your document processing assets. Just like any other Project, you can access these resources by making a reference to the "Essentials" Project. The objects contained within can be examples of different types of objects you create, resources you can copy into your own Projects and build on top of, or simply resources you directly reference in your Projects. |
|
|
In this project you will find various:
|
Projects and Upgrading to 2022
|
Projects are a new way of organizing Grooper resources in version 2022. In previous versions, Grooper resources were organized primarily in one of three folders in the node tree:
Users would have to go back and forth between these locations in order to configure what they needed to process documents through Grooper. This often resulted in a time consuming and cumbersome process, sifting through the node tree's hierarchy to get to the objects you needed. Projects simplify this issue by allowing you to place all associated resources for a given use case (or "project") in a single node location.
Before the introduction of Projects in 2022, these objects were interspersed throughout various locations in the node tree. In version 2022, everything can be neatly placed in one, single location, making finding what you're looking for much simpler.
|
|||
What Happens When You Upgrade?Obviously, this architecture is much different than how your assets are currently organized in Grooper. So, what's going to happen when you upgrade?
Anything in the "Global Resources" folder will be placed throughout "Project 1"
|
Deciding What to Do Next
It's important to point out your Grooper environment will work just fine with everything organized into the single "Project 1" Project. You can leave everything as is in "Project 1" upon upgrading to version 2022 and continue processing Batches of documents as if nothing happened.
Going forward you have two options:
- Do nothing. Leave all Grooper resources organized into "Project 1"
- Migrate resources into their own Projects.
You should consider this an "all or nothing" choice. There are some significant benefits to organizing resources into their own Projects, but it should not be done haphazardly. You will not see the true benefits of this new architecture if you take a "half in/half out" approach. That said, migrating resources to new Projects will take time. There are some utilities that will aid you in this task, but there will necessarily be some manual moving of objects from one node location to another.
So, should you migrate away from "Project 1" at all? Here are some things to keep in mind, when making this decision.
- It's all or nothing.
- Again, we stress the importance of committing to the move. You should commit to migrating everything to new Projects (with the exception of a handful of shared resources), rather than just a few. The benefits of the Project architecture will not be realized until you've completed the entire process. Not following this advice increases the likelihood of a time-sensitive call to the help-desk in the future. This call will likely be time-consuming as we attempt to track down the issue through a partially architected system.
- You don't have to move things from "Project 1" at all."
- If you do not have the time or resources to migrate out of "Project 1", it's best to leave everything in "Project 1". Everything will continue to work as it did previously.
- Do you have time to do it?
- This is probably the biggest question you need to ask yourself. The migration will take time. The larger the repository is, with many Content Models, Batch Processes, profiles and other objects, the longer it's going to take.
- Do you have a lot of "shared resources"?
- If you frequently have individual Data Types, Lexicons, profiles, CMIS Connections or other objects used across many different Content Models and Batch Processes, this will take the highest amount of time and effort to migrate. Ensuring these shared resources are accessible to each Project created is the most time consuming part of any migration out of "Project 1".
- Do you frequently promote objects from a "test" or "dev" Grooper Repository to a "production" Grooper Repository?
- If so, Projects are for you. The new architecture provides multiple advantages to this kind of workflow. You should seriously consider devoting the time to migrate resources into their own Projects, if you maintain multiple environments to publish Grooper objects from development to production repositories.
- Do you use third-party data entry companies to review work in Grooper?
- If so, Projects are for you. You'll benefit from being able to push complete and tidy project packages to an environment dedicated to that company.
- Do you have multiple Grooper engineers working in the same Grooper Repository(ies)?
- If so, Projects are really for you. Aside from object organization, the other big reason for creating the Project architecture was to maintain object reference integrity. Projects will greatly assist you in preventing reference corruption in your Grooper environments.
Project Migration Plan
Ok, you've decided Projects are for you, and you want to move resources out of "Project 1" to best take advantage of them. What are the next steps forward?
We've narrowed the process down to seven general steps:
- Clean up your repository. Delete items that are no longer in use and will not be used in the future.
- Use the "Create Project" feature for each Batch Process.
- As each Project is created, rename any objects as needed if your prior naming conventions no longer make sense.
- For each Project, use the "Analyze References" feature to decide what to do about "shared resources" used by multiple Projects.
- Remove Project references if the "Outbound References" list is empty.
- Reorganize any shared resource objects that remain in "Project 1"
- Rename "Project 1" to something like "Global Resources" or "Shared Resources"
1. Clean House
|
If you're going to take the time to reorganize your resources into Projects, now is a good time to take a look at the Grooper objects in your repository and get rid of anything not in use littering up your environment. This is entirely optional, but now is as good a time as any to clean house.
|
2. Create Project
Now we can start in earnest and create some Projects. You could do this manually. The steps would be as follows.
- Add a Project to the "Projects" folder.
- Using the Project's Referenced Projects property, reference "Project 1".
- For more information on referencing Projects, please review the #Referencing Objects in Other Projects section of this article.
- Move a Batch Process to that Project.
- Move the Content Model associated with that Batch Process to the Project.
- Move any other Grooper objects referenced by the Batch Process or Content Model's objects to the Project.
- Or keep any "shared resources" put in "Project 1", maintaining access to them through the Project reference (We'll discuss this further in Step 4: Analyze References).
There's nothing wrong with this approach, but there's a quicker way of doing things (or at least starting this process) using the "Create Project" feature.
The "Create Project" feature is accessed by selecting a Batch Process. If you think about it, a Batch Process should reference any Grooper object necessary to do work for a particular use case. All the necessary objects will be referenced in the steps of the Batch Process as part of its execution, such as a Content Model referenced for a Classify step or an OCR Profile referenced for a Recognize step.
The "Create Project" utility will create a new Project, named the same as the Batch Process's name, look for any objects referenced as part of its execution, and move them to the new Project.
Important! "Create Project" will only move objects not referenced by anything else. If another Batch Process uses the same OCR Profile, for example, that OCR Profile will remain in "Project 1". We will discuss this further in "Step 4: Analyze References".
|
We will start by creating a new Project using a fairly simple document redaction Batch Process. This is an entirely "self-contained" Batch Process. No other Batch Process utilizes its resources.
|
|||
|
To create the Project, perform the following steps.
There are two configurable options when creating the new Project.
|
|||
|
When the utility finishes running, a new Project will be created. All objects associated with the Batch Process are moved from "Project 1" to the new Project (as long as that move is allowed. Again, we'll talk more about moves that aren't allowed during Step 4).
|
3. Rename Resources
With the switch to the Project architecture, you may find your naming convention no longer makes sense or could be adjusted. Much like the "Clean Up" step, this step is not strictly necessary. But, if you're going through the effort to reorganize your repository into a new structure, you might as well make sure how you're naming things make sense in that new structure.
|
If you're coming from an environment with a lot of Batch Processes and a lot of Content Models you've probably named your resources according to their intended use case. So you might have "Use Case X Content Model", "Use Case X OCR Profile" "Use Case X IP Profile" and so on. You may find this naming superfluous once all these assets are moved over to a Project. So, it might make more sense to you to just rename these objects after their generic object type or function in the Batch Process workflow.
|
4. Analyze References
So far, this process has been fairly simple. With the press of the "Create Project" button, all resources associated with a Batch Process are moved to a new Project. Our previous example was so simple, because all the resources were fairly self-contained, or "local" to the Batch Process.
That is not always the case. Particularly with larger environments, you will find you reuse resources across a variety of Batch Processes. For example, the "Full Text - Accurate" OCR Profile, in our "Essentials" downloads, is many Grooper users "go to" OCR Profile if they don't have the time or the will to create their own. This OCR Profile would be a shared or "global" resource, touched by multiple different Batch Processes.
When you have shared resources, they will not be copied over to a newly created Project when using the "Create Project" feature. They can't. Other Batch Processes need to use that resource too. Instead, all resources that are not shared are moved to a newly created Project, and a Project reference is made to "Project 1". The Project reference allows the new Project access to the resources it needs in "Project 1".
|
|
|
Next, we will press the "Create Project" button to create a new Project from each Batch Process, starting with the "Invoices Process" Project.
|
|
This indicates there is something in "Project 1" the Batch Process utilizes that can't be moved because another Batch Process (or its associated objects) utilize it in one way or another. Essentially, both Batch Processes need to reference one or more objects. So those objects stay put in "Project 1" and are accessible through the Project reference. By referencing "Project 1", the Project we just created has access to all its resources, including whatever it is it needs to function. So, just what resources are out in "Project 1" that our new Project needs? Good question. You can quickly answer this with the "Analyze References" feature. |
|
|
Now that we know what is shared. We have two options:
- Copy these resources from "Project 1" so that we have local copies of these resources.
- Keep these shared resources put so that every Project that needs them can reference the same object.
Your choice will largely depend on how big your environment is, how many times the resources in "Project 1" are referenced by different potential Projects, and if you prefer to have local copies of these resources that can be edited independently or if you want these resources to be truly shared across different Projects (meaning changing one single object will impact how multiple Projects implement it).
Option 1: Copy the Resources
The process will be most time consuming if you want to copy these objects over to the Project. However, doing so does have benefits. Once you copy these shared resources from "Project 1", you will no longer need the reference to "Project 1". At that point the Project is independent and self contained. If you need to share this Project with another Grooper environment (say promoting it from a "test" repository to a "development" repository) it will have no dependencies to another Project that need to be shared along side it.
Aside from the time it takes to do this, the only drawback is the resources are then completely local to the Project. Changes to the versions copied from "Project 1" will be separate objects from the versions copied to the Project. This means they must be edited independently if you want to make changes to them.
|
Copying these resources over is not necessarily the hard part. It's even easier in this case because they're already all in the same folder.
|
|||
|
|||
|
Now the time consuming part of this process. All the references must be reassinged from the source objects in "Project 1" to their local copies in the Project.
The more resources you copy over, and the more complex references and sub-references you're faced with, the more time consuming this process will take. |
|
The other option is to just keep these resources shared. Your Batch Process will function as it did before. The only difference is your Project will be dependent on the referenced Project ("Project 1") to function.
If you're fine with keeping these as shared resources in a referenced Project, you're done here. There's no need to go through the time consuming, copy and paste and reference reassignment dance we did earlier. The only potential drawback here is you've made this Project dependent on "Project 1". You will need to evaluate for yourself whether this is a drawback, a benefit, or doesn't really matter one way or another. |
Option 3: Option 1 + Option 2
There's also no reason why you can't copy some items over to your new Project and keep others referenced through Project references.
For example, in our first example, we could have copied over the two IP Profiles but kept the CMIS Connection as a shared resource. In fact, that would have made more sense. If we have to make changes to a CMIS Connection (like entering new access permissions), we would only want to do that once by manipulating one single object, rather than reproducing our efforts by editing multiple copies of the same object in multiple Projects.
5. Remove Project References
Now, we're at a point where we've used the "Create Project" feature for every Batch Process in "Project 1". What's next?
In our situation, we've created some Projects that need to retain a reference to "Project 1" and some that do not. For those that do not, we should go ahead and remove the reference to Project 1. To do this, the "Analyze References" feature will once again be useful. For each Project we will want to use the "Analyze References" button to check and see if there are any outbound references. If there are, we'll keep the reference intact. If there are not, we will remove the reference.
So, we do nothing.
|
|||
Now, we know it's safe to remove the reference to "Project 1". |
|||
|
|||
|
After you're done creating Projects from the Batch Process in "Project 1", you'll want to clean up the remaining resources in "Project 1".
This will include:
- Organizing any leftover assets into manually created Projects, if applicable.
- Organizing any remaining shared resources into folders and deleting empty folders, as you see fit.
Manually Creating Projects
There are some situations where you may need to manually create a Project for certain assets remaining in "Project 1". Most commonly, this can happen if you have resources you've created for testing purposes that are not tied to a Batch Process and you want to keep them around.
It seems like these are partially architected resources from whenever this Grooper user went through Grooper's ACE - Architect training. We may want to keep these around so this user can continue their training. The only issue is there is not Batch Process utilizing these resources, so we can't take advantage of our "Create Project" utility. |
|
|
So, we'll need to manually create a Project
|
|
|
From here, you can cut and paste or move resources from "Project 1" to the new Project.
Next, we're going to look at a potential issue you may encounter when moving resources out of "Project 1" to another Project. |
|
Why did this happen? Long story short, someone did something they should have never done, but was technically allowed in previous versions of Grooper. |
|
|
The problem is someone made a reference within a Content Model to something in another Content Model.
This is a big "no-no" that was technically possible in Grooper, but never considered best practice. This may have happened by accident when copying a Content Model. This may have been a "quick fix" that some Grooper designer did, intending to go back and resolve but never got around to it. Who knows. The main issue is these types of reference violations can cause problems down the road, potentially causing corruption in your Grooper environment. Part of the reason Projects were created was to avoid this type of corruption due to improperly referenced objects. Referencing Projects via the Referenced Projects property makes external resource references much more intentional, avoiding accidental reference violations (as much as possible). |
|
|
In order to resolve this reference violation and get these objects moved to the new Project we need to resolve the reference violation in one way or another.
|
|
|
Clean Up Remaining Folders
After you've moved everything out of "Project 1" into a new Project (whether manually or through the "Create Project" feature), the only resources left should be shared resources, objects intended to be used by any current or future Project.
|
All that's left is to organize the objects and folders remaining.
FYI: This "Content Models" folder will be "Read Only". This is a carry over from this folder being a system node in the previous version of Grooper's architecture.
You may also want to create some new folders for stray objects or move some folders to the root of the Project.
|
7. Rename Project 1
|
Once all resources are out of "Project 1" and you've organized any shared resources remaining, rename "Project 1" to something that reflects its true utility, something like "Shared Resources" or "Global Resources".
That's it! The migration from "Project 1" to Grooper's new Project based architecture is complete!! |
































































