Batch Archiving Guidance

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

You've got your Grooper Repository all set up. You've architected your Content Model. You have a Batch Process up and running. Documents are coming in and being processed accordingly. But, now that you've reached the end of the Batch Process, what do you do with the Batch itself?

You have a choice to make here:

Do I just delete the Batch?
Do I archive the Batch in Grooper for long term storage?

Deleting the Batch vs Archiving the Batch

There are pros and cons to deleting Batches and archiving them for long term storage. There are also different ways of archiving Batches which we will consider in this section of the article. Before we get there, you need to decide if you want to archive them at all. To help you make that decision, consider the pros and cons of each option.

Pros and cons: Deleting Batches

Production Batches can be deleted from Grooper in one of three ways:

They can be manually deleted by a user.
They can be deleted automatically by a Dispose Batch step in a Batch Process.
They can be first moved to a folder in the "Test" branch by a Dispose Batch step in a Batch Process. Then, they can be regularly scheduled for deletion by the System Maintenance Service using its Purge Batches feature.

What are the pros and cons of deleting Batches from Grooper?

Pros for deleting Batches

Can save on storage

If your files and data have been exported out of Grooper, keeping content in the Grooper Repository may be unncessicarily duplicating files by keeping them both in their export location and in Grooper.

Sparsely importing documents into Grooper can help mitigate this in many scenarios. Sparse documents are not actually loaded into the Grooper file store. They are instead accessed on-demand by a link between Grooper are the external content management system.

Don't have to manage content in two systems

If you've exported content out of Grooper, that external system is likely your "system of record." It is where documents and/or their data live long term. By keeping that content in Grooper and your external system, it is critical to keep track of that document in both locations to ensure changes to the document in one system are reflected to the corresponding document in the other.

In version 2024, Grooper introduced its first ever document search and retrieval mechanism, AI Search. This makes it possible for Grooper to be your sole system of record. AI Search has robust querying and filtering capabilities, allowing you to locate documents using both their full-text content and data extracted by Grooper.

Keeps the Grooper Repository lean

A repository with fewer Batches (and other Grooper nodes period) is generally more responsive than one brimming with content. This is particularly true when content is poorly organized. An excessive amount of Batches in a single folder can impact Grooper's performance when trying to navigate viewers that render lists of Batches (such as the Batches Page).

This is less true in newer versions of Grooper. Version 2024 started implementing efficiency changes to make it possible to keep Batches in a Grooper Repository long term without dramatically affecting performance.

Cons against deleting Batches

Deleting Batches makes it more difficult to reprocess documents

If you delete a Batch, everything done to those documents in Grooper is gone too. Many steps in a Batch Process are present to condition a file or group of pages to the point where they are usable documents that can be classified and extracted. For example, Recognize collects machine-readable text that is used downstream by an Extract step. If you need to reprocess previously exported content (say to extract new fields from them) but they have been deleted from Grooper, you will need to start all over again at the start of the Batch Process. Archiving them in Grooper long term avoids this problem and makes it easier to iterate solution design.

You can't use AI Search

AI Search is Grooper's document search and retrieval mechanism introduced in version 2024. After documents are indexed, they can be searched for using the Search Page. Documents can be searched using their full-text and extracted data. Search queries can range from simple term searches to complex queries with filters to narrow down results. Several commands can be executed from the Search Page too. Users can start document review, submit Processing Jobs to apply Activities, start a whole new Batch from the search results and more! However, if you deleted your documents, the Search Page is not going to be much use for you. Documents must remain in the Grooper Repository to take advantage of this groundbreaking Grooper feature.

It's not really necessary in newer Grooper versions

You may have noticed each "Pro" point above had an "FYI" providing a caveat of sorts. Traditionally Grooper has been an intermediary. It's job was to take unstructured document content, make sense of it, collect the data you want, and export documents and data into a structured end destination. It simply was not designed to hold content long term. However, starting in version 2024, new strides in Grooper's efficiency and features like AI Search make it possible to keep content in Grooper long term.

If you decide to keep Batches in Grooper long term, you will need to develop an archiving strategy. There are a few different ways to archive Batches in Grooper. In the next section, we will discuss these methods and give you some best practice advice.

Ways to archive Batches

There are several ways Batches can be archived for long term storage in Grooper. You can:

Organize them on import with the Organize By Date feature
Manually move Batches from one folder to another
Use the Archive command to organize Batches into a year/month/day folder structure. This can be automated in a Batch Process with the Execute activity.
Use the Dispose Batch activity to move the Batches to the Test branch. This can be automated as a step in a Batch Process.

Organize for archiving on import (Our preferred method)

Archiving is just a way of organizing content long term. People think of it as being done at the end of an operation, but Grooper provides you a good way to do this from the start when importing files to new Batches.

When configuring an Import Provider's Batch Creation settings, users can enable the Organize By Date option to organize Batches right when they are created. Enabling this option will create year, month, and day subfolders for the Batch Process's folder in the production Batch. As new Batches are created day-after-day, Batches are created in the day's folder and new day, month or year folders are created if necessary.

After Batches are created users have two options, going forward:

Do nothing! Batches can be left in this structure permanently.
Or, manually move Batches to an archive folder in the Test branch one year/month/day at a time (Or manually execute the "Archive" command or "Dispose Batch" activity).

This is our preferred way of archiving Batches at Grooper for two big reasons:

It automates Batch organization at import.
- No need for an extra step to archive content.
- No need to reorganize content. It's already organized.
It avoids thread-locking complications.
- As we will discuss below, Batches can be moved from the Production branch to the Test branch and organized in a similar fashion using the Archive command or Dispose Batch activity.
- However, if multiple processing threads are executing this activity, Grooper can run into problems where one thread locks a folder while it completes its operation, and another attempts to input or output to the locked folder. This can result in timeouts and other errors.
- Import operations always run single-threaded in Grooper. This avoids the thread-locking problem entirely.

Manually archive Batches

The simplest (but possibly least efficient) way to archive Batches is to have a human operator move finished Batches to an archive folder at regular intervals (daily, weekly, etc). This is a manual process but may be well suited for the following scenarios:

Users scanning documents into Grooper rather than importing them.
If you're not importing documents, you won't be able to use the Organize By Date feature to automatically organize new Batches into year/month/day folders. That property only exists on Import Providers.
Small volume environments.
If you process a fairly small amount of Batches every day, it may be easy enough to have someone regularly move Batches to an archive folder as part of their daily/weekly duties. For larger environments, processing a large amount of content every day, this may be tedious or untenable.

The Archive command and Dispose Batch activity

The Archive command moves Batches into a folder and organizes them into three subfolder levels by the year, month and date the Batch was created. The Dispose Batch activity can "dispose" of Batches in several ways, including moving them to a folder in the Test branch. Moving Batches to the Test branch effectively archives them. Dispose Batch's Group By property controls how the Batch is organized into subfolders. Options include by a single year folder only, by a single year folder quarter, by three folder levels reflecting the year/month/day, and more.

Archive and Dispose Batch can both move Batches into a grouped folder structure. What's the difference?

Archive can move Batches to either the Production branch or the Test branch. Dispose Batch can only move Batches to the Test branch.
Archive has an option to clear the Batch's job history, deleting the task history for the Batch. This ensures the tables logging task processing stats don't become overpopulated with historical data. Dispose Batch has no such option.
Archive can only organize Batches into a year/month/day subfolder structure. Dispose Batch has more subfoldering options in its "Group By" configuration.
Archive will organize Batches into a year/month/day subfolder structure based on the date the Batch was created. Dispose Batch will organize Batches into a year/month/day subfolder structure based on the date the Batch was archived (the day the Dispose Batch activity is executed).

Either the Archive command or the Dispose Batch activity can be applied manually to Batches or executed as part of a Batch Process. The Archive and Dispose Batch methods of archiving Batches are best suited for the following scenarios.

Scanning documents into Grooper rather than importing them
If you're not importing documents, you won't be able to use the Organize By Date feature to automatically organize new Batches into year/month/day folders. That property only exists on Import Providers.
Larger volume environments
Because Archive and Dispose Batch can be made part of a Batch Process, their archival operations can be automated. This makes it easier to archive large volumes of Batches than manually moving them to an archive folder.
- BE AWARE: To avoid thread-locking complications, you should run Archive or Dispose Batch single-threaded. As threads move Batches to different folders, Grooper locks the folder while the move operation completes. Problems can occur when multiple threads are attempting to write to the same folders. You should also consider if an Import Watcher is writing to the same folders you're archiving from.
- To run a step "single-threaded", first create a dedicated Processing Queue for the step. Assign that Processing Queue to the step's Queue Name property. Then, install an Activity Processing service, set its Number of Threads to 1 and assign it the dedicated Processing Queue.