Visual (Classification Method)

From Grooper Wiki

This article was migrated from an older version and has not been updated for the current version of Grooper.

This tag will be removed upon article review and update.

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 2023.1

The Visual Classification Method uses image analysis instead of text data to determine the description Document Type assigned to a folder Batch Folder during classification. Instead of using text-based extractors, an "Extract Features" IP Command in an perm_media IP Profile is used to collect image-based data from a Batch Folder's image(s). This image-based data is compared against that of previously trained document examples of each Document Type to classify the Batch Folder.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

About

Similar to Lexical Classification, Visual Classification relies on a training-based approach. Where Grooper is trained to classify specific documents based off of examples given. The difference is, that there is no text involved. Instead, Visual Classification relies on how the document looks. Specifically, Visual Classification looks at pixel intensity across the document, and classifies accordingly. Pixel intensity refers to how dark a pixel is.

How To: Set Up Visual Classification

Before You Begin

Before you set up your Classification Method, you will need some things first. Namely, an IP Profile and an Extract Features IP Step. What Extract Features does, is create an MxN series of matrices for each document page's image


As Visual Classification is training-based, you will need Document Types as well.

  1. For an example later in this article, we'll be using four Document Types.


What Is 'Intensity'?

Intensity is a feature used by Grooper to determine how dark a cell is. When Grooper runs the Extract Features IP Command, it divides the pages of a document into cells and measures each one by how dark, or "intense" it is. The document is then given a percentage similarity score to each Document Type. Whichever Document Type has the highest percentage similarity is assigned to the document. In the case of the "intensity" example, each cell's intensity is compared with the training example to determine similarity via the black to white pixels ratio.

Think of a structured form, where the lines and text change very little. Therefore, if the document is divided into cells, the percentage of black pixels in that cell will be very similar from document to document.

Below is one of the documents we'll be using to illustrate how Visual Classification works. For the moment, we'll use it to briefly showcase intensity and how it works during the process of Visual Classification.

  1. Visual Classification is best put to use with highly structured documents. Since structured documents rely on their visual layout more than their text data, said text data could be considered irrelevant for classification. It doesn't really matter what name is on the document if it's going to be in the same place every time, for example.



  1. Once Extract Features has been run, the intensity is available for viewing. Once the document is trained and classification begins, Grooper will use this intensity grid as a basis, classifying any documents that look like this as the document type it was assigned.

The Classification Process

Now that everything has been set up, it's time to classify a set of Documents using the Visual Classification Method.

  1. With the IP Profile set up and the Document Types created, the Visual Classification Method can be set up on the Content Model
  2. Select the hamburger icon to the far right of the Classification Method property, and from the drop-down menu, select Visual



  1. After selecting the Visual Classification Method, expand the property to select the IP Profile the Classification Method will reference.
  2. Select the IP Profile from the drop-down menu.



  1. With all the prep work taken care of, there are just two things left to do: train the documents, then classify. This is done within the Classify step within a Batch Process.
  2. Ensure that the activity and scope are correct before proceeding.
  3. Expand the drop-down menu for the Content Model Scope, and select the desired Content Model.



  1. Navigate to the Classification Tester tab to train the documents.
  2. Right-click the Folder you wish to train.
  3. Select "Classification", followed by "Train As..."



  1. In the Train As window that populates, expand the drop-down menu for the Document Type you wish to select.
  2. Choose the desired document type. For this example, this first document we've selected is an Addendum an is being classified a such.
  3. Once the desired Document Type is selected, click "Execute".



  1. Repeat the process until all Folders have been trained as the desired Document Type.



  1. With the document training completed, navigate to the Activity Tester tab.
  2. We're now ready to test Visual Classification on an, as of yet, unclassified Batch.



  1. All the folders in this Batch have been selected.
  2. The goal is to classify them according to the Visual training Grooper has been given.



  1. And voila! Having successfully trained Grooper on how to recognize certain Document Types visually, through features alone, such as intensity, all of these previously unclassified Documents have been classified.