2.90:Use Case: Transposing Rows and Columns with Row Match

From Grooper Wiki

So you have a set of documents. Most of them have a very straightforward table. It's easily extracted using the Row Match method. But every now and then, you get another document with a table presenting the same data, but in a wildly different table structure. So different, in fact, that the data is presented as rows instead of columns. One solution would be to create two different Data Models for two different Document Types with two different Data Tables using two different extraction techniques. However, there may be a more elegant solution, using what we call a "Pivot Row Extractor".

The Problem

This use case comes from the world of oil and gas. Oil and gas companies receive and produce production reports detailing volumes and sales of materials produced for a particular oil and gas well. They are essentially big tables of information. But there's a lot of different ways to make a table. The vast majority of the documents for this project looked something like this.

This has a very straightforward table structure. It is well suited for the Row Match method. You could fairly easily write a pattern to match the data on each row of this table.

However, sometimes, this project would encounter documents that look something like this one. It has a completely different table structure. In fact it uses a combination of field labels for the "Well" and "Field" etc values, and a table for the "Oil" "Water" and "Gas" data.

While these tables present data in different ways, they share a great deal of data points. For this explanation, we will focus on eight shared data points. These data points will become the Data Columns for our Data Table.

  • Well Name (also known as "Completion Name")
  • API Number
  • Oil Produced
  • Oil Sales
  • Gas Produced
  • Gas Sales
  • Water Produced
  • Producing Days

The problem is part of the data that is just contained on a single row on this document...

...is in multiple columns on a table in this one, as seen by the yellow and blue highlighted data.

Or, it's not even in the table at all, as seen by the green highlighted data.