Redact (Activity)

From Grooper Wiki

This article was migrated from an older version and has not been updated for the current version of Grooper.

This tag will be removed upon article review and update.

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 2023.1

format_ink_highlighter Redact is an Activity that visibly obscures (or "redacts") text information on an page based on results returned from a extractor. Be aware, Redact does not alter the text data. It only alters the image.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

About

At times you may have documents to process that contain sensitive data such as personal contact information, Social Security Numbers (SSNs), account numbers, etc. You might want to obscure and/or remove this information on the document. Grooper can do this for you through the Redact Activity.

You can configure a Redact Batch Process Step to take extracted data (either from a Data Field or by referencing an extractor) and cover it with a colored bar that hides the information from view. You can customize the color of the bar, including matching it to the color of the document itself to make it seem like the information was completely removed from the document.

Even though the information will be redacted on the image or PDF of the document, the information will still be present in the attached text data. To remove the information from the text data, you will need to use a Correct activity in your Batch Process.

  1. On the Employer Contribution Refund Form below, we have two sections where all or parts of account numbers are displayed. This is sensitive information that we would rather not be visible on the document.


  1. Below we have redacted the account numbers. This is the result we want after the document goes through the Redact Activity.


How To

In this tutorial we are going to take the Employer Contribution Refund Form from the previous section and configure a Redact Batch Process Step to redact the two account numbers on the form.

There are two ways to redact information. You can set the Batch Process Step to redact all data being collected by a specific extractor, or you can select one to many different Data Fields and redact their results. First, let's take a look at what is being collected by the extractors and Data Fields that we are going to use for our redaction.

The Extractors and Data Fields

Both the below extractor object (the Data Type) and the Data Fields shown below will be used to redact the exact same information.

  1. In the screenshot below, we see that we have set up a Data Type that is collecting the sensitive data from our document.
  2. Both account numbers are being returned.


  1. Under our Data Model we have a Data Field that is also collecting one of the account numbers considered as "sensitive information".
  2. We can configure Grooper to redact the value of the Data Field like the account number.

You might notice that only one of the account numbers is being collected here. When using Data Fields to redact information, a single object can only redact a single extracted value. Since we have two account numbers we want redacted, we will have to select both the "Last 4 of SSN or Wellness Unity ID #" and the "Account #" Data Fields when configuring the Redact Step later in the tutorial.


Adding the Redact Step

Before we can do anything else, we will need to add the Redact Batch Process Step to our Batch Process.

  1. Right click on your Batch Process.
  2. Hover over "Add Activity", then hover over "Document Processing". Finally, click on "Redact..."
  3. When the "Add Activity" window pops up, you can rename the Activity if you like. In this tutorial, we are going to leave it as the default: Redact.
  4. Click "EXECUTE" in the top right corner of the pop-up window to create the Batch Process Step.


  1. Now you should have a Redact Batch Process Step in your node tree.
  2. The Scope is set to Page. Since Redact generates a new image, it must always be set to a page scope, as Grooper cannot create an image from a folder.
  3. There are many properties that can be configured for the Redact Activity. We will go through these individually in the next section.


Configuring the Redact Activity

General Properties

  1. Click the hamburger icon to the right of the Redaction Color property to select a color.
    • If you leave it set to (none), then the bar that covers the redacted information will attempt to match the color of the document background.
  2. We are going to select Black for this tutorial.


  1. The Transparency property indicates how see-through the redaction bar is. We don't want to be able to see through the redaction bar, so we will leave it at 0%.
  2. The Minimum Output Format property will change depending on what Redaction Color you choose. The image that will be generated by the redaction step might be a different format than what it started.
    • For example, if you have a black and white image and want to put a colored bar on the page, the resulting image will have to be in a format that can support color.
  3. The Create Undo Image property, when turned to True, will create and attach a copy of the original un-redacted image to the document in addition to the new redacted image. Unless necessary, it is recommended to keep this turned to False as the additional image will just take up unnecessary file space.


Redaction Extractors

FYI

When you edit your Redaction Extractor properties, you can either choose a specific extraction object such as a Data Type or Value Reader for the actual Redaction Extractors property, or you can choose one or multiple Data Fields for the Redaction Fields property. You do not have to configure both for the Redact Activity to run.

  1. Click on the ellipsis icon to the right of the Redaction Extractors property.
  2. When the "Redaction Extractors" window pops up, navigate to and click the check box next to the extractor you want to use for your redaction. We are going to use the "VAL - Sensitive Information" Data Type for our redaction.
  3. Click "OK" in the top right-hand corner of the pop-up window.


  1. Click the save icon at the top of the center property grid to save and apply your changes to the Redact Batch Process Step.


Redaction Fields

  1. Click the ellipsis icon to the right of the Redaction Fields property.
  2. When the "Redaction Fields" window pops up, navigate to and select the check boxes next to the Data Fields you want redacted on the document.
  3. Click "OK" at the top right of the pop-up window.


  1. Click the save icon at the top of the center property grid to save and apply your changes to the Redact Batch Process Step.

Test the Activity

Once you have finished configuring and saving changes to the Redact Batch Process Step, you can test the Step to make sure it is redacting the correct information.

  1. Click the "Activity Tester" tab.
  2. Select the Page object in the Batch Viewer.
  3. Click the "Test" icon in the top right of the Batch Viewer.
  4. Now you should see that the account numbers have been redacted on the document in the Document Viewer.