2023.1:Key-Value List (Collation Provider)

From Grooper Wiki
Revision as of 09:24, 18 April 2024 by Rpatton (talk | contribs) (update // via Wikitext Extension for VSCode)

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252023.1

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.

A Key-Value List is one of many Collation Providers you can use in Grooper to combine or organize extracted data based on the data's layout relationship. A Key-Value List collects a "list" of information with a spatial relationship to a label or a "Key".

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023). The first contains a Project with resources used in examples throughout this article. The second contains one or more Batches of sample documents.

About

The Key-Value List Collation Provider is designed to collect a list of things that follow a label of some kind. Similar to the Key-Value Pair Collation Provider, it is configured by having a "Key" and a "Value" extracted in a Data Type's child objects. When collated, the parent Data Type will first locate the "Key" and then look either vertically or horizontally for a list of text as extracted by the "Value" extractor object. Only the list of terms from the "Value" extractor object will be returned as a result.

The Key-Value List is also useful in Table Extraction via Row Match when not every cell in the table contains data.

How To

Configuring a Key-Value List involves creating a Data Type with two child extractors: a Key and a Value. The Key will collect the title or label of your list, while the Value will collect all text segments in your list. Then the Key-Value List Collation Provider will need to be configured on the Data Type itself.

Configuring a Key-Value List

  1. Create a Data Type with two child Value Readers: one for the Key, and one for the Value.


The "Key" Value Reader will be used to collect the label or title of the list.

  1. Set your Extractor property for the Key Value Reader. In this example we have set it to a List Match.


On the page there are two lists. We want to collect the list on the right labeled "Cryptids" and not the list on the left labeled "Groceries". For the "Key", we need to collect the label for the list we want to return.

  1. Configure the extractor to return the label of your list. In this case we have configured our List Match to return the word "Cryptids".


Now that the "Key" has been configured, we want to set up the "Value" Value Reader to collect all items in the list.

  1. Set your Extractor property for the Value Value Reader. In this example we have set it to a Pattern Match.


  1. Configure your extractor to return all text you are wanting to collect near your label.
    • It is fine if the extractor collects more than just the terms you want to return. Here we are collecting all generic text on the page.

Now we need to go back to the parent object and set the Collation to collect the Key-Value List.

  1. Click back on the parent Data Type.
  2. The Collation property by default is set to Individual.
  3. Each text segment extracted by the child objects will be returned individually.


  1. Change the Collation to a Key-Value List.
  2. Choose the appropriate layout for the data you are collecting. Since the list in our example is aligned vertically, we are enabling the Vertical Layout property.
  3. Now only the desired list is returned.

Key-Value Lists and Data Tables

  1. We have created a Data Type with multiple child extractors to collect information on a table.
  2. The Data Type is configured with an Ordered Array.
  3. For this page in the Batch, this configuration works just fine. We are collecting all rows in the table.


  1. However, for this table, our current configuration falls short.