2.80:Header-Value (Table Extract Method)
Header-Value is one of three methods available to Data Table elements to extract information from tables on a document set. It uses a combination of header and value extractors to determine the table’s structure and extract information from the table’s cells.
About
Where the Row Match method focuses on using a table’s rows to model table structure and extract data, ‘’’Header-Value’’’ looks to the table’s column header labels and the values in those columns.
img or imgs of star wars table showing what a header is and values are
As the name implies both “Header” extractors and “Value” extractors are required for this method to function. Configuring these extractors is done on each of the Data Columns.
img of Data Column showing where this is done
--- loose explanation using Star Wars table of header-value extraction---
Version Differences
Use Cases
The Header-Value method is the second table extraction method created in Grooper. It was made to target tables not easily extracted by Row Match. Row Match looses its efficiency once a tables structure starts to change from document to document. Different companies are going to structure tables however they want, which is well outside your control. Think of all the different ways an invoice can be structured. While the information you want is present in all the different tables, how that data is presented may not be consistent. Even just the column location changing can present problems for this method. Row Match’s method of using a Row Extractor to pattern the table may not be able to do the job (or a complicated Row Extractor accounting for multiple row formats may need to be used). For these situations, the Header-Value method is often easier to configure and produces better results.
![]() |
![]() |
![]() |
![]() |
Optional data columns, where values may or may not be present in a cell, can complicate things for Row Match as well. Again, a simple Row Extractor may not do the trick. While a more complicated extractor may successfully extract the table's information, the Header-Value method (or the Infer Grid) may be simpler to set up and produce the same or even better results.
However, the Header-Value method does have its limitations. Perhaps most obviously, header labels are necessary for this method to work. In tables where header labels are not present, Header-Value will not be suitable for use.
Furthermore, the Header-Value method requires several extractors to detect a table’s structure and extract the values inside, at least two extractors for every Data Column (one for its header and one for its values). Because of this, there are several components to configure in order to extract a table’s information. For relatively simple tables, Row Match ends up being simpler to set up, both being less time consuming and using fewer objects.
The Infer Grid method also has some advantages over Header-Value. There are some specialized use cases, such as reading OMR checkboxes in tables and reprocessing table cells using a secondary OCR profile, where Infer Grid does things the other two methods simply can’t. Infer Grid also performs well when table line information can be saved to a page’s layout data.