2.80:Table Extraction (Concept)

From Grooper Wiki
Revision as of 10:45, 7 January 2020 by Configadmin (talk | contribs)
Data in an Excel spreadsheet is an example of tabular data.

Tables are one of the most common ways data is organized on documents. Human beings have been writing information into tables before they started writing literature, even before paper was invented. They are excellent structures for representing a lot of information with various characteristics in common in a relatively small space. However, targeting the data inside them presents its own set of challenges. A table’s structure can range from simple and straightforward to more complex (and even confounding). Different organizations may organize the same data differently, creating different tables for what, essentially, is the same data.

In Grooper, tabular data can be extracted using the Row Match, Header-Value (Table Extraction Method), or Infer Grid table extraction methods.

What Is a Table?

Tables consists of rows and columns. Where those rows and columns intersect are cells containing a single piece of information. Each row consists of the same number of columns (although some columns may be empty in a given row). A single column consists of the same type of information. For example, an "Order Date" column will always have dates in the cells below it.

This may seem obvious, but understanding how data is structured on the page informs how you will use Grooper to target it.