Array (Collation Provider)

From Grooper Wiki
(Redirected from Array)

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 2023

Array is a Collation Provider option for pin Data Type extractors. Array matches a list of values arranged in horizontal, vertical, or text-flow order, combining instances that qualify into a single result.

About

Array is one of the Collation Providers of a Data Type and can be used for data organization depending on what you want to extract from your documents. All of the Collation Providers (except for Individual) essentially take multiple results and combine them into one. Array specifically only returns results based on the orientation of the information. If the data is lined up horizontally or vertically, you must select the corresponding layout property for Grooper to return the information.

Essentially, an Array collated result is a collection of results who share a layout relationship that are all lined up together (either horizontally, vertically, or in the left/right and top/bottom text flow of the document).

The Array collation differs from the Ordered Array collation in one significant way. In an Ordered Array the order of the data matters. For the Array Collation Provider the data can be in any order and will all be returned as one result.

How To

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2024). The first contains a Project with resources used in examples throughout this article. The second contains one or more Batches of sample documents.

Setting the Collation Property

  1. To begin, we need an object to set an extractor on. We are going to configure the Array on a Data Type.
    • Select an existing Data Type or create a new one in your Project.
  2. For the Local Extractor property, we are going to use a simple List Match. Click the drop-down and set to List Match, then click the ellipsis button.


  1. On the document provided with this article, we have several grocery items and one entry that does not match the rest. Let's say we want to just return the grocery items and not the item that doesn't belong.
  2. In the local entries the following input is given:
Bread
Milk
Eggs
Butter


  1. The entires of the List Match extractor are returning the desired results, and can be seen in the List View. Let's configure this Data Type to combine the individual results into one.
  2. Click the hamburger icon next to the Collation property to access the drop-down menu.
  3. Select Array from the drop-down menu.


  1. With the Collation set to Array...
  2. The List View will not display any results.


The Horizontal Layout Property

  1. The data we want to extract is horizontally aligned on the given document.
  2. The sub-properties of the Collation property need to be configured. Expand the sub-properties of the Collation property and enable the Horizontal Layout property.
  3. The List View will display the a single result, with no spaces between each word captured.
  4. Configuring the Result Separator property can help to make the result easier to read.


  1. The Result Separator can accept any string. Try putting in a pipe delimiter |
  2. The result in the List View will now show each one in one result, but separated by a pipe delimiter |


The Vertical Layout Property

  1. The second document in the provided Batch has an example of the previous list, but in a different configuration.
  2. We can see that we have the same list of items, but they are aligned vertically instead of horizontally.
  3. The current configuration of our Data Type will not return a result, and the List View will be empty.


  1. Disable the Horizontal Layout property, and instead enable the Vertical Layout property.
  2. You will now see a result in the List View.

The Flow Layout Property

The Flow Layout Property should be used when the information you want to extract is contained within the flow of a paragraph or text. You will commonly find this in unstructured documents.

  1. Using the third document in the supplied Batch, we can see a document that has language in a natural flow, instead of in a table.
  2. All the words in the list are present in this document but in a different layout.
  3. With the current configuration of the Data Type, only the words milk and eggs will return because they happen to exist directly on top of each other in the flow of language of the given document format.


  1. Disable the Vertical Layout property, and instead enable the Flow Layout property.
  2. With this property enabled only, there will be no results displayed in the List View.
  3. One of the sub-properties of the Flow Layout property (Separator Expression or Maximum Character Distance) need to be configured to produce a result.


  1. For the purposes of this example, in the supplied document, set the Maximum Character Distance to 500
  2. There are fewer than 500 characters between bread and milk, fewer than 500 between milk and eggs, and finally there are fewer than 500 characters between eggs and butter
  3. Considering this, the result will now display in the List View as desired.


  1. Clear the Maximum Character Distance property and instead set the Separator Expression property to .*
    • This is a simple regular expression to indicate "any character" represented by the period . and "0 to many" of those characters represented by the asterisk *
  2. With this configuration the result will now be displayed in the List View.

The Minimum Elements Property

You may have noticed when we first switched to the third document in the Batch with Vertical Layout two words were returned because there were two words that just happened to be stacked on top of one another. It returned when Grooper found two of the words, but not just a single word. The reason is because the Minimum Elements Property is set to 2 by default. The following screenshots will explain how this property works.

  1. Go back to a previous configuration and enable only Vertical Layout.
  2. The Minimum Elements property indicates how many "hits" Grooper has to get to return a result.
  3. With Minimum Elements set to its default of 2, we are getting milk and eggs because there were two items found in a vertical layout.
  4. We are not getting bread because there is not a second word next to it in a vertical layout.


  1. Change the Minimum Elements property to 1
  2. The List View will now display 3 results. The words milk and eggs return as one result because they are stacked on top of each other in a vertical layout. The individual words bread and butter


  1. Change the Minimum Elements property to 3
  2. This will now prevent any results from being returned, and the List View will now be blank.

A Practical Example

Although the above examples make it easier to see how Array works, it is not something you'd probably see in the real world. The following screenshots give an example of how you might use a collation property in the real world.




Maximum Distance


Enforce Line Boundaries