2023.1:Combine (Collation Provider): Difference between revisions

From Grooper Wiki
draft // via Wikitext Extension for VSCode
No edit summary
 
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{AutoVersion}}
{{AutoVersion}}


{|class="wip-box"
<blockquote>{{#lst:Glossary|Combine}}</blockquote>
|
'''WIP'''
|
This article is a work-in-progress or created as a placeholder for testing purposes.  This article is subject to change and/or expansion.  It may be incomplete, inaccurate, or stop abruptly.
 
This tag will be removed upon draft completion.
|}
 
<blockquote>
The '''''Combine''''' [[Collation Provider|collation provider]] takes individual results and "combines" them into one. There are several methods by which Grooper can obtain these results and combine them.
</blockquote>


{|class="download-box"
{|class="download-box"
Line 19: Line 8:
|
|
You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). The first contains one or more '''Batches''' of sample documents.  The second contains one or more '''Projects''' with resources used in examples throughout this article.  
You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). The first contains one or more '''Batches''' of sample documents.  The second contains one or more '''Projects''' with resources used in examples throughout this article.  
* [[Media:Wiki 2023.1 Combine (Collation Provider) Batches.zip]]
* [[Media:2023.1 Wiki Combine Batch.zip]]
* [[Media:Wiki 2023.1 Combine (Collation Provider) Project.zip]]
* [[Media:2023.1 Wiki Combine Project.zip]]
|}
|}


== About ==
== About ==
The '''''Combine''''' '''''Collation Provider''''' is helpful when there are multiple text segments on a document that you wish to return as one result. There are many different ways to combine your results through different '''''Combine Methods'''''.
The '''''Combine''''' '''''Collation Provider''''' is helpful when there are multiple text segments on a document that you wish to return as one result. There are many different ways to combine your results through different '''''Combine Methods'''''.


Line 36: Line 24:
Which method you choose depends on the documents you are extracting from and what data you want to collect.  
Which method you choose depends on the documents you are extracting from and what data you want to collect.  


<big>'''Individual'''</big>


The '''''Combine Method''''' is set to ''Individual'' by default when using '''''Combine''''' for your '''''Collation Provider'''''. Grooper simply will take the individual results from a '''Data Type's''' child objects and put them all together into one result one right after the other.
<big>'''[[#The Individual Combine Method|Individual]]'''</big>


<big>'''Sum'''</big>
The '''''Combine Method''''' is set to ''Individual'' by default when using '''''Combine''''' for your '''''Collation Provider'''''. Grooper simply will take the individual results from a '''[[image:GrooperIcon_DataType.png]]Data Type's''' child objects and put them all together into one result one right after the other.
 
 
<big>'''[[#The Sum Combine Method|Sum]]'''</big>


The ''Sum'' '''''Combine Method''''' takes numeric results from a '''Data Type's''' child objects and adds them up. The sum of those numbers is returned as a single result.  
The ''Sum'' '''''Combine Method''''' takes numeric results from a '''Data Type's''' child objects and adds them up. The sum of those numbers is returned as a single result.  


There are more practical and efficient ways to sum up numeric data from a document such as '''''Calculated Value'''''. It is not advised to use the ''Sum'' '''''Combine Method''''' unless absolutely necessary and is primarily used for repositories upgraded from previous versions of Grooper.
There are more practical and efficient ways to sum up numeric data from a document such as the '''''Calculated Value''''' function. It is not advised to use the ''Sum'' '''''Combine Method''''' unless absolutely necessary and is primarily used for repositories upgraded from previous versions of Grooper.


<big>'''Flow'''</big>
 
<big>'''[[#The Flow Combine Method|Flow]]'''</big>


The ''Flow'' '''''Combine Method''''' returns everything within the "flow" of the text of the document from whatever is returned in between the '''Data Type's''' first and second child objects. The full text is returned as a single result.
The ''Flow'' '''''Combine Method''''' returns everything within the "flow" of the text of the document from whatever is returned in between the '''Data Type's''' first and second child objects. The full text is returned as a single result.


<big>'''Geometric'''</big>
 
<big>'''[[#The Geometric Combine Method|Geometric]]'''</big>


The ''Geometric'' '''''Combine Method''''' requires multiple child objects that return text in multiple areas on the page. When the '''''Combine Method''''' is then set to ''Geometric'', everything within the bounds of those extracted objects will be returned.  
The ''Geometric'' '''''Combine Method''''' requires multiple child objects that return text in multiple areas on the page. When the '''''Combine Method''''' is then set to ''Geometric'', everything within the bounds of those extracted objects will be returned.  


<big>'''Group'''</big>
 
<big>'''[[#The Group Combine Method|Group]]'''</big>


The ''Group'' '''''Combine Method''''' allows you to choose one element from your extraction to be returned. If you have three child objects extracting different text segments, you can select just one of them to return a result instead of all three.  
The ''Group'' '''''Combine Method''''' allows you to choose one element from your extraction to be returned. If you have three child objects extracting different text segments, you can select just one of them to return a result instead of all three.  
Line 60: Line 53:
== How To ==
== How To ==


=== Setting the Combine Collation Property ===
=== The Individual Combine Method ===
 
In the example below, we want to collect a date. However, on the document the month, day, and year are listed separately. The goal will be to take the month, day, and year and combine the values to return a single date for extraction.
 
# Create a '''Data Type''' with children objects extracting the information you want to combine.
# By default, the '''Data Type''''s '''''Collation''''' property is set to ''Individual''.
# With the '''''Collation''''' property set to ''Individual'', each child object returns an individual result.


[[File:2023.1 Combine-(Collation-Provider) 02 01 Combine 01.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 01 Combine 01.png]]


#<li value=4> Click the hamburger icon to the right of the '''''Collation''''' property.
# Select ''Combine'' from the drop-down menu.


[[File:2023.1 Combine-(Collation-Provider) 02 01 Combine 02.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 01 Combine 02.png]]


#<li value=6> By default, the '''''Combine Method''''' property is set to ''Individual''.
# Now the extracted text for all child objects have been combined into one result.
However, this value is just a set of numbers and does not immediately look like a date. The next section will solve this problem.


[[File:2023.1 Combine-(Collation-Provider) 02 01 Combine 03.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 01 Combine 03.png]]


=== The Result Separator ===
 
==== The Result Separator ====
 
Sometimes just combining all of the extracted text into one result is not enough. It can be difficult to read or it may lack syntactic context (see the [[Data Context]] article) to give us an idea of what information the text is conveying.
 
By using a '''''Result Separator''''', we can add spaces, dashes, slashes, or any character to separate out the text in the returned result. In this case we are going to add dashes to make our result easier to identify as a date.
 
# Enter in your '''''Result Separator'''''. It can be anything. To make the result in our example look more like a date, we have entered in a dash (-) for the '''''Result Separator'''''.
# In the returned result, your separator will appear between each individually returned item.


[[File:2023.1 Combine-(Collation-Provider) 02 02 Result-Separator 01.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 02 Result-Separator 01.png]]
Line 76: Line 91:


=== The Sum Combine Method ===
=== The Sum Combine Method ===
For this example we have three numbers in a table and we want Grooper to add them up and return the result.
# Create a '''Data Type''' with child objects that collect the individual results you want to sum up.
# The '''''Collation''''' property is set to ''Individual'' by default.
# With the '''''Collation''''' set to ''Individual'', each set of text returned by each child object is returned as a separate result.


[[File:2023.1 Combine-(Collation-Provider) 02 03 Sum 01.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 03 Sum 01.png]]


#<li value=4> Set the '''''Collation''''' property to ''Combine''.
# Without any extra configuration, Grooper returns each numeric value as one result.


[[File:2023.1 Combine-(Collation-Provider) 02 03 Sum 02.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 03 Sum 02.png]]


#<li value=6> Click the hamburger icon to the right of the '''''Combine Method''''' property to access the drop down menu.
# Select ''Sum'' from the drop down menu.


[[File:2023.1 Combine-(Collation-Provider) 02 03 Sum 03.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 03 Sum 03.png]]


#<li value=8> Now the sum of the numbers is returned as a single result.


[[File:2023.1 Combine-(Collation-Provider) 02 03 Sum 04.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 03 Sum 04.png]]
Line 90: Line 119:


=== The Flow Combine Method ===
=== The Flow Combine Method ===
For this example, we have a text sample that looks similar to a college transcript. We want to collect the whole block of text to capture all of the semester information. We can use the ''Flow'' '''''Combine Method''''' to do this.
# Create a '''Data Type''' with child extractors that return text at the start and end of the section of text you want to capture.
# By default, the '''''Collation''''' property is set to "Individual".
# With the '''''Collation''''' property set to "Individual", each text section extracted by the child objects are returned as an individual result.


[[File:2023.1 Combine-(Collation-Provider) 02 04 Flow 01.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 04 Flow 01.png]]


#<li value=4> Set the '''''Collation''''' property to ''Combine''.
# While you will now see a larger green box on the Document Viewer encompassing a lot more than just the two values returned by the child objects...
# ... only the two values from the child objects are actually being returned.


[[File:2023.1 Combine-(Collation-Provider) 02 04 Flow 02.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 04 Flow 02.png]]


#<li value=7>Click the hamburger icon to the right of the '''''Combine Method''''' property to access the drop down menu.
# Select ''Flow'' from the drop down menu.


[[File:2023.1 Combine-(Collation-Provider) 02 04 Flow 03.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 04 Flow 03.png]]


#<li value=9> The green box in the Document Viewer in our example has expanded to include all text within the "flow" of the document between the start and end of the section.
# The result has expanded to include more than the text extracted in the child objects.
# To see what Grooper is extracting, select the result then click the inspection icon that looks like a flashlight in the bottom right hand corner of the Document Viewer.


[[File:2023.1 Combine-(Collation-Provider) 02 04 Flow 04.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 04 Flow 04.png]]


#<li value=12> In the inspector window, we can see that Grooper is now extracting everything within the text flow of the document between our starting and ending extractors.


[[File:2023.1 Combine-(Collation-Provider) 02 04 Flow 05.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 04 Flow 05.png]]
Line 107: Line 155:


=== The Geometric Combine Method ===
=== The Geometric Combine Method ===
In this example we have two sections of text, but we only want to collect the Personal Information section on the left. We can use the ''Geometric'' '''''Combine Method''''' to do so.
# Create a '''Data Type''' with child objects extracting information that encompass the height and width of the geometric location where you want to extract data.
# By default the '''''Collation''''' property is set to "Individual".
# With the '''''Collation''''' property set to "Individual", each child object will return an individual result.


[[File:2023.1 Combine-(Collation-Provider) 02 05 Geometric 01.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 05 Geometric 01.png]]


#<li value=4> Set the '''''Collation''''' property to ''Combine''.
# Grooper will combine all of the results into one. It will not return anything more than what the child objects are extracting.


[[File:2023.1 Combine-(Collation-Provider) 02 05 Geometric 02.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 05 Geometric 02.png]]


#<li value=6> Click the hamburger icon to the right of the '''''Combine Method''''' property to access the drop down menu.
# Select ''Geometric'' from the drop down menu.


[[File:2023.1 Combine-(Collation-Provider) 02 05 Geometric 03.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 05 Geometric 03.png]]


#<li value=8> Now Grooper is collecting everything in the geometric region determined by the extraction objects.
# Click on the inspection icon in the bottom right hand corner of the Document Viewer to view the full text Grooper is extracting.


[[File:2023.1 Combine-(Collation-Provider) 02 05 Geometric 04.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 05 Geometric 04.png]]


#<li value=10> In the inspection window, we can see that all of the text in the geometric location determined by teh child extractors is returned.


[[File:2023.1 Combine-(Collation-Provider) 02 05 Geometric 05.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 05 Geometric 05.png]]
Line 124: Line 189:


=== The Group Combine Method ===
=== The Group Combine Method ===
For this example, we are going to revisit the text we collected for the ''Individual'' '''''Combine Method'''''. Instead of collecting the full date, in this case we only want to collect the year.
# Create a '''Data Type''' with multiple children objects that extract different text segments on the page.
# By default, the '''''Collation''''' property is set to ''Individual''.
# With the '''''Collation''''' property set to ''Individual'', each result is returned as its own result.


[[File:2023.1 Combine-(Collation-Provider) 02 06 Group 01.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 06 Group 01.png]]


#<li value=4> Set the '''''Collation''''' property to ''Combine''.
# Grooper will combine all of the child extractor results into one result.


[[File:2023.1 Combine-(Collation-Provider) 02 06 Group 02.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 06 Group 02.png]]


#<li value=6> Click the hamburger icon to the right of the '''''Combine Method''''' property to access the drop down menu.
# Select ''Group'' from the drop down menu.


[[File:2023.1 Combine-(Collation-Provider) 02 06 Group 03.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 06 Group 03.png]]


#<li value=8> Click the hamburger icon to the right of the '''''Output Element''''' property to access the drop down menu.
# Select the element you want to be returned from the drop down menu.


[[File:2023.1 Combine-(Collation-Provider) 02 06 Group 04.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 06 Group 04.png]]


#<li value=10> Only the selected element will be returned as a result.


[[File:2023.1 Combine-(Collation-Provider) 02 06 Group 05.png]]
[[File:2023.1 Combine-(Collation-Provider) 02 06 Group 05.png]]

Latest revision as of 12:49, 21 November 2024

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252023.1

Combine is a Collation Provider option for pin Data Type extractors. Combine combines instances from returned results based on a specified grouping, controlling how extractor results are assembled together for output.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

About

The Combine Collation Provider is helpful when there are multiple text segments on a document that you wish to return as one result. There are many different ways to combine your results through different Combine Methods.

There are five different Combine Methods:

  • Individual
  • Sum
  • Flow
  • Geometric
  • Group

Which method you choose depends on the documents you are extracting from and what data you want to collect.


Individual

The Combine Method is set to Individual by default when using Combine for your Collation Provider. Grooper simply will take the individual results from a Data Type's child objects and put them all together into one result one right after the other.


Sum

The Sum Combine Method takes numeric results from a Data Type's child objects and adds them up. The sum of those numbers is returned as a single result.

There are more practical and efficient ways to sum up numeric data from a document such as the Calculated Value function. It is not advised to use the Sum Combine Method unless absolutely necessary and is primarily used for repositories upgraded from previous versions of Grooper.


Flow

The Flow Combine Method returns everything within the "flow" of the text of the document from whatever is returned in between the Data Type's first and second child objects. The full text is returned as a single result.


Geometric

The Geometric Combine Method requires multiple child objects that return text in multiple areas on the page. When the Combine Method is then set to Geometric, everything within the bounds of those extracted objects will be returned.


Group

The Group Combine Method allows you to choose one element from your extraction to be returned. If you have three child objects extracting different text segments, you can select just one of them to return a result instead of all three.

How To

The Individual Combine Method

In the example below, we want to collect a date. However, on the document the month, day, and year are listed separately. The goal will be to take the month, day, and year and combine the values to return a single date for extraction.

  1. Create a Data Type with children objects extracting the information you want to combine.
  2. By default, the Data Type's Collation property is set to Individual.
  3. With the Collation property set to Individual, each child object returns an individual result.


  1. Click the hamburger icon to the right of the Collation property.
  2. Select Combine from the drop-down menu.


  1. By default, the Combine Method property is set to Individual.
  2. Now the extracted text for all child objects have been combined into one result.

However, this value is just a set of numbers and does not immediately look like a date. The next section will solve this problem.


The Result Separator

Sometimes just combining all of the extracted text into one result is not enough. It can be difficult to read or it may lack syntactic context (see the Data Context article) to give us an idea of what information the text is conveying.

By using a Result Separator, we can add spaces, dashes, slashes, or any character to separate out the text in the returned result. In this case we are going to add dashes to make our result easier to identify as a date.

  1. Enter in your Result Separator. It can be anything. To make the result in our example look more like a date, we have entered in a dash (-) for the Result Separator.
  2. In the returned result, your separator will appear between each individually returned item.


The Sum Combine Method

For this example we have three numbers in a table and we want Grooper to add them up and return the result.

  1. Create a Data Type with child objects that collect the individual results you want to sum up.
  2. The Collation property is set to Individual by default.
  3. With the Collation set to Individual, each set of text returned by each child object is returned as a separate result.


  1. Set the Collation property to Combine.
  2. Without any extra configuration, Grooper returns each numeric value as one result.


  1. Click the hamburger icon to the right of the Combine Method property to access the drop down menu.
  2. Select Sum from the drop down menu.


  1. Now the sum of the numbers is returned as a single result.


The Flow Combine Method

For this example, we have a text sample that looks similar to a college transcript. We want to collect the whole block of text to capture all of the semester information. We can use the Flow Combine Method to do this.

  1. Create a Data Type with child extractors that return text at the start and end of the section of text you want to capture.
  2. By default, the Collation property is set to "Individual".
  3. With the Collation property set to "Individual", each text section extracted by the child objects are returned as an individual result.


  1. Set the Collation property to Combine.
  2. While you will now see a larger green box on the Document Viewer encompassing a lot more than just the two values returned by the child objects...
  3. ... only the two values from the child objects are actually being returned.


  1. Click the hamburger icon to the right of the Combine Method property to access the drop down menu.
  2. Select Flow from the drop down menu.


  1. The green box in the Document Viewer in our example has expanded to include all text within the "flow" of the document between the start and end of the section.
  2. The result has expanded to include more than the text extracted in the child objects.
  3. To see what Grooper is extracting, select the result then click the inspection icon that looks like a flashlight in the bottom right hand corner of the Document Viewer.


  1. In the inspector window, we can see that Grooper is now extracting everything within the text flow of the document between our starting and ending extractors.


The Geometric Combine Method

In this example we have two sections of text, but we only want to collect the Personal Information section on the left. We can use the Geometric Combine Method to do so.

  1. Create a Data Type with child objects extracting information that encompass the height and width of the geometric location where you want to extract data.
  2. By default the Collation property is set to "Individual".
  3. With the Collation property set to "Individual", each child object will return an individual result.


  1. Set the Collation property to Combine.
  2. Grooper will combine all of the results into one. It will not return anything more than what the child objects are extracting.


  1. Click the hamburger icon to the right of the Combine Method property to access the drop down menu.
  2. Select Geometric from the drop down menu.


  1. Now Grooper is collecting everything in the geometric region determined by the extraction objects.
  2. Click on the inspection icon in the bottom right hand corner of the Document Viewer to view the full text Grooper is extracting.


  1. In the inspection window, we can see that all of the text in the geometric location determined by teh child extractors is returned.


The Group Combine Method

For this example, we are going to revisit the text we collected for the Individual Combine Method. Instead of collecting the full date, in this case we only want to collect the year.

  1. Create a Data Type with multiple children objects that extract different text segments on the page.
  2. By default, the Collation property is set to Individual.
  3. With the Collation property set to Individual, each result is returned as its own result.


  1. Set the Collation property to Combine.
  2. Grooper will combine all of the child extractor results into one result.


  1. Click the hamburger icon to the right of the Combine Method property to access the drop down menu.
  2. Select Group from the drop down menu.


  1. Click the hamburger icon to the right of the Output Element property to access the drop down menu.
  2. Select the element you want to be returned from the drop down menu.


  1. Only the selected element will be returned as a result.