Batch Archiving Guidance

You've got your Grooper Repository all set up. You've architected your Content Model. You have a Batch Process up and running. Documents are coming in and being processed accordingly. But, now that you've reached the end of the Batch Process what do you do with the Batch itself?

You have a choice to make here:

Do I just delete the Batch?
Do I archive the Batch for longer term storage?

Deleting the Batch vs Archiving the Batch

There are pros and cons to deleting Batches and archiving them for long term storage. There are also different ways of archiving Batches which we will consider in this section of the article. Before we get there, you need to decide if you want to archive them at all. To help you make that decision, consider the pros and cons of each option.

Pros and cons: Deleting Batches

Production Batches can be deleted from Grooper in one of three ways:

They can be manually deleted by a user.
They can be deleted automatically by a Dispose Batch step in a Batch Process.
They can be first moved to a folder in the "Test" branch by a Dispose Batch step in a Batch Process. Then, they can be regularly scheduled for deletion by the System Maintenance Service using its Purge Batches feature.

What are the pros and cons of deleting Batches from Grooper?

Pros for deleting Batches

Can save on storage

If your files and data have been exported out of Grooper, keeping content in the Grooper Repository may be unncessicarily duplicating files in Grooper.

Sparsely importing documents into Grooper can help mitigate this. Sparse documents are not actually loaded into the Grooper file store. They are instead accessed on-demand by a link between Grooper are the external content management system.

Don't have to manage content in two systems

If you've exported content out of Grooper, that external system is likely your "system of record." It is where documents and/or their data live long term. By keeping that content in Grooper and your external system, it is critical to keep track of that document in both locations to ensure changes to the document in one system are reflected to the corresponding document in the other.

In version 2024, Grooper introduced its first ever document search and retrieval mechanism, AI Search. This makes it possible for Grooper to be your sole system of record. AI Search has robust querying and filtering capabilities, allowing you to locate documents using both their full-text content and data extracted by Grooper.

Keeps the Grooper Repository lean

A repository with fewer Batches (and other Grooper nodes period) is generally more responsive than one brimming with content. This is particularly true when content is poorly organized. An excessive amount of Batches in a single folder can impact Grooper's performance when trying to navigate viewers that render lists of Batches (such as the Batches Page).

This is less true in newer versions of Grooper. Version 2024 started implementing efficiency changes to make it possible to keep Batches in a Grooper Repository long term without dramatically affecting performance.

Cons against deleting Batches

Deleting Batches makes it more difficult to reprocess documents

If you delete a Batch, everything done to those documents in Grooper is gone too. If you need to reprocess previously exported content (say to extract new fields from them) and they have been deleted from Grooper, you will need to start all over again at the start of the Batch Process. If the exported content is already in Grooper, and is still classified, still has OCR data, still has an extracted Data Model or whatever else the Batch Process did, you

You can't use AI Search

AI Search is Grooper's document search and retrieval mechanism introduced in version 2024. After documents are indexed, they can be searched for using the Search Page. Documents can be searched using their full-text and extracted data. Search queries can range from simple term searches to complex queries with filters to narrow down results. Several commands can be executed from the Search Page too. Users can start document review, submit Processing Jobs to apply Activities, start a whole new Batch from the search results and more! However, if you deleted your documents, the Search Page is not going to be much use for you. Documents must remain in the Grooper Repository to take advantage of this groundbreaking Grooper feature.

It's not really necessary in newer Grooper versions

You may have noticed each "Pro" point above had an "FYI" providing a caveat of sorts. Traditionally Grooper has been an intermediary. It's job was to take unstructured document content, make sense of it, collect the data you want, and export documents and data into a structured end destination. It simply was not designed to hold content long term. However, starting in version 2024, new strides in Grooper's efficiency and features like AI Search make it possible to keep content in Grooper long term.

If you decide to keep Batches in Grooper long term, you will need to develop an archiving strategy. There are a few different ways to archive Batches in Grooper. In the next section, we will discuss these methods and give you some best practice advice.

Ways to archive Batches

There are several ways Batches can be archived for long term storage in Grooper. You can:

Organize them on import with the Organize By Date feature
Manually move Batches from one folder to another
Use the Archive command to organize Batches into a year/month/day folder structure. This can be automated in a Batch Process with the Execute activity.
Use the Dispose Batch activity to move the Batches to the Test branch. This can be automated as a step in a Batch Process.

Please note: Archive and Dispose Batch can both move Batches into a grouped folder structure. What's the difference?

Archive can move Batches to either the Production branch or the Test branch. Dispose Batch can only move Batches to the Test branch.
Archive has an option to clear the Batch's job history, deleting the task history for the Batch. This ensures the tables logging task processing stats don't become overpopulated with historical data. Dispose Batch has no such option.
Archive can only organize Batches into a year/month/day subfolder structure. Dispose Batch has more subfoldering options in its "Group By" configuration.
Archive will organize Batches into a year/month/day subfolder structure based on the date the Batch was created. Dispose Batch will organize Batches into a year/month/day subfolder structure based on the date the Batch was archived (the day the Dispose Batch activity is executed).

Organize for archiving on import (Our preferred method)

Archiving is just a way of organizing content long term. People think of it as being done at the end of an operation, but Grooper provides you a good way to do this from the start when importing files to new Batches.

When configuring an Import Provider's Batch Creation settings, users can enable the Organize By Date option to organize Batches right when they are created. Enabling this option will create year, month, and day subfolders for the Batch Process's folder in the production Batch. As new Batches are created day-after-day, Batches are created in the day's folder and new day, month or year folders are created if necessary.

After Batches are created users have two options, going forward:

Do nothing! Batches can be left in this structure permanently.
Or, manually move Batches to an archive folder in the Test branch one year/month/day at a time (Or manually execute the "Archive" command or "Dispose Batch" activity).

This is our preferred way of archiving Batches at Grooper for two big reasons:

It automates Batch organization at import.
- No need for an extra step to archive content.
- No need to reorganize content. It's already organized.
It avoids thread-locking complications.
- As we will discuss below, Batches can be moved from the Production branch to the Test branch and organized in a similar fashion using the Archive command or Dispose Batch activity.
- However, if multiple processing threads are executing this activity, Grooper can run into problems where one thread locks a folder while it completes its operation, and another attempts to input or output to the locked folder. This can result in timeouts and other errors.
- Import operations always run single-threaded in Grooper. This avoids the thread-locking problem entirely.

Manually archive Batches

The simplest (but possibly least efficient) way to archive Batches is to have a human operator move finished Batches to an archive folder at regular intervals (daily, weekly, etc). This is a manual process but may be well suited for the following scenarios:

Users scanning documents into Grooper rather than importing them.
If you're not importing documents, you won't be able to use the Organize By Date feature to automatically organize new Batches into year/month/day folders. That property only exists on Import Providers.
Small volume environments.
If you process a fairly small amount of Batches every day, it may be easy enough to have someone regularly move Batches to an archive folder as part of their daily/weekly duties. For larger environments, processing a large amount of content every day, this may be tedious or untenable.