Multi-Column (Collation Provider)

From Grooper Wiki

This article was migrated from an older version and has not been updated for the current version of Grooper.

This tag will be removed upon article review and update.

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 2023.1


Multi-Column is a Collation Provider option for pin Data Type extractors. Multi-Column combines multiple columns on a page into a single column for extraction.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

About

Sometimes you might run into documents with text that is divided up into columns:


Grooper cannot intuitively determine when a page is divided into columns rather than just being continuous text. We need to tell Grooper to expect multiple columns using the Multi-Column Collation Provider.

How To

We are going to go over the basics of setting up the Multi-Column Collation Provider. There are many options located under the Collation property after selecting Multi-Column that you can adjust to improve your results beyond what we will discuss here.

Set Up the Provider

  1. The page in our Batch has two columns on the page. The text in the first column is continued on the second.
  2. Create a Data Type.


  1. Set the Local Extractor for the Data Type. In this example we are setting it to a Pattern Match.


  1. In our example we have configured our Pattern Match with the regex pattern [^\r\n\t\f]+ to collect all lines of text on the page.
    • You need to turn on Tab Marking for this pattern to work.


Turning on Tab Marking

  1. Click on the "Properties" tab.
  2. Open up the Preprocessing options.
  3. Click the check box to the right of Tab Marking to enable the property.


Setting the Provider

  1. Set the Collation property to Multi-Column.
  2. It may look like teh whole page is being extracted straight across, but Grooper is now collecting the individual columns.
  3. Click the Inspection icon located to the bottom right of the Document Viewer.


  1. Now you can see, in the "Text Value" tab below the Document Viewer on the Inspection page, that the text in the first column is collected first before Grooper collects the second column.