Collation Provider (Property)

From Grooper Wiki

This article was migrated from an older version and has not been updated for the current version of Grooper.

This tag will be removed upon article review and update.

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 2023.120232.90

The Collation property of a pin Data Type defines the method for converting its raw results into a final result set. It is configured by selecting a Collation Provider. The Collation Provider governs how initial matches from the Data Type's extractor(s) are combined and interpreted to produce the Data Type's final output.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

About

Data Type extractors in Grooper use regular expression to match a document's text data in order to return a particular piece of information. Extractors serve a variety of purposes. They can be used to populate fields in a Data Model, to separate and classify documents, to break up a document into sections, and more. For the most part, any time part of document's text data is needed or useful to do something, you need an extractor to find and return it.

Often, this requires something more complex than returning a single result. The relationships between multiple extraction results are often important. The fact results are physically related to each other on the page, or text exists between one or more results, or results are in one order versus another can be used accomplish various goals in Grooper.

The following Collation Providers are available in Grooper:

  • Individual - The Collation property is set to Individual by default. Each result is returned individually.
  • Combine - Takes multiple extracted instances and combines them into a single result.
  • AND - Returns a result when the extractor gets at least one hit. Useful in Classification.
  • Key-Value Pair - Matches a "Key" with a "Value" on a document to collect information following a label. This Collation method is not as commonly used anymore as the Labeled Value extractor is an easier way to get similar results.
  • Key-Value List - Returns a list of results based on the layout relationship to a "Key" or label.
  • Array - Combines a list of values that are arranged in a horizontal, vertical, or flow layout into a single result. Can be useful for Row Match Table Extraction.
  • Ordered Array - Combines a list of values that are arranged in a horizontal, vertical, or flow layout into a single result. However, unlike Array, Ordered Array the order in which the individual values are extracted matters and each extractor must return a value. Also useful in Row Match Table Extraction.
  • Split - Separates a data instance at each match returned by a Data Type. Useful for splitting up a document into smaller segments for more accurate extraction.
  • Pattern-Based - Uses regular expressions to sequence returned results into a final result set.
  • Multi-Column - Combines multiple columns on a page into a single column result for easier and more accurate extraction.