Key-Value List (Collation Provider)

From Grooper Wiki
(Redirected from Key-Value List)

This article was migrated from an older version and has not been updated for the current version of Grooper.

This tag will be removed upon article review and update.

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 2023.1

Key-Value List is a Collation Provider option for pin Data Type extractors. Key-Value List matches instances where a key and a list of one or more values appear together on the document, adhering to a specific layout pattern.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

About

The Key-Value List Collation Provider is designed to collect a list of things that follow a label of some kind. Similar to the Key-Value Pair Collation Provider, it is configured by having a "Key" and a "Value" extracted in a Data Type's child objects. When collated, the parent Data Type will first locate the "Key" and then look either vertically or horizontally (based on how you configure its properties) for a list of text as extracted by the "Value" extractor object. Only the list of terms from the "Value" extractor object will be returned as a result.

The Key-Value List is also useful in Table Extraction via Row Match when not every cell in the table contains data.

How To

Configuring a Key-Value List involves creating a Data Type with two child extractors: a Key and a Value. The Key will collect the title or label of your list, while the Value will collect all text segments in your list. Then the Key-Value List Collation Provider will need to be configured on the Data Type itself.

Configuring a Key-Value List

  1. Create a Data Type with two child Value Readers: one for the Key, and one for the Value.


The "Key" Value Reader will be used to collect the label or title of the list.

  1. Set your Extractor property for the Key Value Reader. In this example we have set it to a List Match.


On the page there are two lists. We want to collect the list on the right labeled "Cryptids" and not the list on the left labeled "Groceries". For the "Key", we need to collect the label for the list we want to return.

  1. Configure the extractor to return the label of your list. In this case we have configured our List Match to return the word "Cryptids".


Now that the "Key" has been configured, we want to set up the "Value" Value Reader to collect all items in the list.

  1. Set your Extractor property for the Value Value Reader. In this example we have set it to a Pattern Match.


  1. Configure your extractor to return all text you are wanting to collect near your label.
    • It is fine if the extractor collects more than just the terms you want to return. Here we are collecting all generic text on the page.


Now we need to go back to the parent object and set the Collation to collect the Key-Value List.

  1. Click back on the parent Data Type.
  2. The Collation property by default is set to Individual.
  3. Each text segment extracted by the child objects will be returned individually.


  1. Change the Collation to a Key-Value List.
  2. Choose the appropriate layout for the data you are collecting. Since the list in our example is aligned vertically, we are enabling the Vertical Layout property.
  3. Now only the desired list is returned.


Key-Value Lists and Data Tables

There are times when a Key-Value List can be useful in Table Extraction. In the first example below, we have a table that is being collected using an Ordered Array Collation Provider (See the Ordered Array wiki article for how to extract a table using a Row Match Extract Method with an Ordered Array).

  1. We have created a Data Type with multiple child extractors to collect information on a table.
  2. The Data Type is configured with an Ordered Array.
  3. For this page in the Batch, this configuration works just fine. We are collecting all rows in the table.


The first page in our Batch is being extracted appropriately, however on the second page we run into a problem. The table on the second page does not have data in each of the cells of the table. Ordered Array does not work well with incomplete data. Using a Key-Value List on the child extractors with incomplete data can help us overcome this issue.

  1. You can see in the table on the page, only three rows are being returned because they are the only rows that have complete data.


  1. Going back to the Data Table object, we can see that using a Row Match Extract Method referencing the Data Type with the Ordered Array only returns the complete rows on the table.


Using Key-Value List with Array

  1. As you can see in the screenshot below, we have made a few changes to the referenced Data Type's children objects. Now we have added a "Key" and "Value" Value Reader for each child Data Type that corresponds to a column with missing data on the page.
  2. Each Data Type with a child "Key" and "Value" Value Reader has the Collation property configured to a Key-Value List.
  3. As we can see in the Document Viewer, for the "Dry Hole" Data Type, now all dollar amounts under the "Dry Hole" label on the page are being returned.


  1. On the parent Data Type where all the results are collated...
  2. ... we still have the Collation property set to Ordered Array.
  3. An Ordered Array will not work because all child objects will need to return a value for Ordered Array to return the row. Since we have incomplete data in the table, not all rows return.


  1. If we change the Collation property to an Array, then all of our rows are returned because an Array does not require all child extractors to return a result to collect the row information.


Testing the Table Extraction

  1. Click on the Data Table object.
  2. The table is configured for a Row Match and the Row Extractor is still referencing the same Data Type (now configured with an Array instead of an Ordered Array).


  1. If we test the extraction, now we can see that the full table is returned even though some cells are missing data.