Combine (Collation Provider)

From Grooper Wiki

This article was migrated from an older version and has not been updated for the current version of Grooper.

This tag will be removed upon article review and update.

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 2023.1

Combine is a Collation Provider option for pin Data Type extractors. Combine combines instances from returned results based on a specified grouping, controlling how extractor results are assembled together for output.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

About

The Combine Collation Provider is helpful when there are multiple text segments on a document that you wish to return as one result. There are many different ways to combine your results through different Combine Methods.

There are five different Combine Methods:

  • Individual
  • Sum
  • Flow
  • Geometric
  • Group

Which method you choose depends on the documents you are extracting from and what data you want to collect.


Individual

The Combine Method is set to Individual by default when using Combine for your Collation Provider. Grooper simply will take the individual results from a Data Type's child objects and put them all together into one result one right after the other.


Sum

The Sum Combine Method takes numeric results from a Data Type's child objects and adds them up. The sum of those numbers is returned as a single result.

There are more practical and efficient ways to sum up numeric data from a document such as the Calculated Value function. It is not advised to use the Sum Combine Method unless absolutely necessary and is primarily used for repositories upgraded from previous versions of Grooper.


Flow

The Flow Combine Method returns everything within the "flow" of the text of the document from whatever is returned in between the Data Type's first and second child objects. The full text is returned as a single result.


Geometric

The Geometric Combine Method requires multiple child objects that return text in multiple areas on the page. When the Combine Method is then set to Geometric, everything within the bounds of those extracted objects will be returned.


Group

The Group Combine Method allows you to choose one element from your extraction to be returned. If you have three child objects extracting different text segments, you can select just one of them to return a result instead of all three.

How To

The Individual Combine Method

In the example below, we want to collect a date. However, on the document the month, day, and year are listed separately. The goal will be to take the month, day, and year and combine the values to return a single date for extraction.

  1. Create a Data Type with children objects extracting the information you want to combine.
  2. By default, the Data Type's Collation property is set to Individual.
  3. With the Collation property set to Individual, each child object returns an individual result.


  1. Click the hamburger icon to the right of the Collation property.
  2. Select Combine from the drop-down menu.


  1. By default, the Combine Method property is set to Individual.
  2. Now the extracted text for all child objects have been combined into one result.

However, this value is just a set of numbers and does not immediately look like a date. The next section will solve this problem.


The Result Separator

Sometimes just combining all of the extracted text into one result is not enough. It can be difficult to read or it may lack syntactic context (see the Data Context article) to give us an idea of what information the text is conveying.

By using a Result Separator, we can add spaces, dashes, slashes, or any character to separate out the text in the returned result. In this case we are going to add dashes to make our result easier to identify as a date.

  1. Enter in your Result Separator. It can be anything. To make the result in our example look more like a date, we have entered in a dash (-) for the Result Separator.
  2. In the returned result, your separator will appear between each individually returned item.


The Sum Combine Method

For this example we have three numbers in a table and we want Grooper to add them up and return the result.

  1. Create a Data Type with child objects that collect the individual results you want to sum up.
  2. The Collation property is set to Individual by default.
  3. With the Collation set to Individual, each set of text returned by each child object is returned as a separate result.


  1. Set the Collation property to Combine.
  2. Without any extra configuration, Grooper returns each numeric value as one result.


  1. Click the hamburger icon to the right of the Combine Method property to access the drop down menu.
  2. Select Sum from the drop down menu.


  1. Now the sum of the numbers is returned as a single result.


The Flow Combine Method

For this example, we have a text sample that looks similar to a college transcript. We want to collect the whole block of text to capture all of the semester information. We can use the Flow Combine Method to do this.

  1. Create a Data Type with child extractors that return text at the start and end of the section of text you want to capture.
  2. By default, the Collation property is set to "Individual".
  3. With the Collation property set to "Individual", each text section extracted by the child objects are returned as an individual result.


  1. Set the Collation property to Combine.
  2. While you will now see a larger green box on the Document Viewer encompassing a lot more than just the two values returned by the child objects...
  3. ... only the two values from the child objects are actually being returned.


  1. Click the hamburger icon to the right of the Combine Method property to access the drop down menu.
  2. Select Flow from the drop down menu.


  1. The green box in the Document Viewer in our example has expanded to include all text within the "flow" of the document between the start and end of the section.
  2. The result has expanded to include more than the text extracted in the child objects.
  3. To see what Grooper is extracting, select the result then click the inspection icon that looks like a flashlight in the bottom right hand corner of the Document Viewer.


  1. In the inspector window, we can see that Grooper is now extracting everything within the text flow of the document between our starting and ending extractors.


The Geometric Combine Method

In this example we have two sections of text, but we only want to collect the Personal Information section on the left. We can use the Geometric Combine Method to do so.

  1. Create a Data Type with child objects extracting information that encompass the height and width of the geometric location where you want to extract data.
  2. By default the Collation property is set to "Individual".
  3. With the Collation property set to "Individual", each child object will return an individual result.


  1. Set the Collation property to Combine.
  2. Grooper will combine all of the results into one. It will not return anything more than what the child objects are extracting.


  1. Click the hamburger icon to the right of the Combine Method property to access the drop down menu.
  2. Select Geometric from the drop down menu.


  1. Now Grooper is collecting everything in the geometric region determined by the extraction objects.
  2. Click on the inspection icon in the bottom right hand corner of the Document Viewer to view the full text Grooper is extracting.


  1. In the inspection window, we can see that all of the text in the geometric location determined by teh child extractors is returned.


The Group Combine Method

For this example, we are going to revisit the text we collected for the Individual Combine Method. Instead of collecting the full date, in this case we only want to collect the year.

  1. Create a Data Type with multiple children objects that extract different text segments on the page.
  2. By default, the Collation property is set to Individual.
  3. With the Collation property set to Individual, each result is returned as its own result.


  1. Set the Collation property to Combine.
  2. Grooper will combine all of the child extractor results into one result.


  1. Click the hamburger icon to the right of the Combine Method property to access the drop down menu.
  2. Select Group from the drop down menu.


  1. Click the hamburger icon to the right of the Output Element property to access the drop down menu.
  2. Select the element you want to be returned from the drop down menu.


  1. Only the selected element will be returned as a result.