2023:Array (Collation Provider): Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
Tag: Manual revert
 
(8 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{AutoVersion}}
{{AutoVersion}}


<blockquote>
<blockquote>{{#lst:Glossary|Array}}</blockquote>
'''''Array''''' is one of many '''''Collation Providers''''' you can use in Grooper to combine or organize extracted data based on the data's layout relationship.
</blockquote>


== About ==
== About ==
'''''Array''''' is one  of the '''''Collation Providers''''' and can be used for data organization depending on what you want to extract from your documents. All of the '''''Collation Providers''''' (except for '''''Individual''''') essentially take multiple results and combine them into one. '''''Array''''' specifically only returns results based on the orientation of the information. If the data is lined up horizontally or vertically, you must select the corresponding layout property for Grooper to return the information.  
''Array'' is one  of the '''''Collation Providers''''' and can be used for data organization depending on what you want to extract from your documents. All of the '''''Collation Providers''''' (except for ''Individual'') essentially take multiple results and combine them into one. ''Array'' specifically only returns results based on the orientation of the information. If the data is lined up horizontally or vertically, you must select the corresponding layout property for Grooper to return the information.  


Essentially, an '''''Array''''' collated result is a collection of results who share a layout relationship that are all lined up together (either horizontally, vertically, or in the left/right and top/bottom text flow of the document).
Essentially, an ''Array'' collated result is a collection of results who share a layout relationship that are all lined up together (either horizontally, vertically, or in the left/right and top/bottom text flow of the document).


{|class="attn-box"
{|class="attn-box"
Line 14: Line 12:
|⚠
|⚠
|
|
The '''''Array''''' collation differs from the '''''Ordered Array''''' collation in one significant way. In an '''''Ordered Array''''' the order of the data matters. For the '''''Array''''' provide' the data can be in any order and will all be returned as one result.
The ''Array'' collation differs from the ''Ordered Array'' collation in one significant way. In an ''Ordered Array'' the order of the data matters. For the ''Array'' '''''Collation Provider''''' the data can be in any order and will all be returned as one result.
|}
|}


== How To ==
== How To ==
{|class="download-box"
|
[[File:Asset 22@4x.png]]
|
You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023).  The first contains a '''Project''' with resources used in examples throughout this article.  The second contains one or more '''Batches''' of sample documents.
* [[Media:2023 Wiki Array-(Collation-Provider) Project.zip]]
* [[Media:2023 Wiki Array-(Collation-Provider) Batches.zip]]
|}


=== Setting the Collation Property ===
=== Setting the Collation Property ===
Line 50: Line 57:
==== The Flow Layout Property ====
==== The Flow Layout Property ====


The '''''Flow Layout''''' property should be used when the information you want to extract is contained within the flow of a paragraph or text. You will commonly find this in unstructured documents.  
The Flow Layout Property should be used when the information you want to extract is contained within the flow of a paragraph or text. You will commonly find this in unstructured documents.  




Line 66: Line 73:
=== The Minimum Elements Property ===
=== The Minimum Elements Property ===


You may have noticed when we first switched to the third document in the '''Batch''' with '''''Vertical Layout''''' two words were returned because there were two words that just happened to be stacked on top of one another. It returned when Grooper found two of the words, but not just a single word. The reason is because the '''''Minimum Elements''''' property is set to 2 by default. The following screenshots will explain how this property works.  
You may have noticed when we first switched to the third document in the '''Batch''' with Vertical Layout two words were returned because there were two words that just happened to be stacked on top of one another. It returned when Grooper found two of the words, but not just a single word. The reason is because the Minimum Elements Property is set to 2 by default. The following screenshots will explain how this property works.  




Line 79: Line 86:
=== A Practical Example ===
=== A Practical Example ===


Although the above examples make it easier to see how '''''Array''''' works, it is not something you'd probably see in the real world. The following screenshots give an example of how you might use a '''''Collation Provider''''' in the real world.  
Although the above examples make it easier to see how Array works, it is not something you'd probably see in the real world. The following screenshots give an example of how you might use a collation property in the real world.  


[[File:2023 Arrays - 2023 01 How To 06 Practical Example 01.png]]
[[File:2023 Arrays - 2023 01 How To 06 Practical Example 01.png]]

Latest revision as of 11:47, 21 November 2024

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252023

Array is a Collation Provider option for pin Data Type extractors. Array matches a list of values arranged in horizontal, vertical, or text-flow order, combining instances that qualify into a single result.

About

Array is one of the Collation Providers and can be used for data organization depending on what you want to extract from your documents. All of the Collation Providers (except for Individual) essentially take multiple results and combine them into one. Array specifically only returns results based on the orientation of the information. If the data is lined up horizontally or vertically, you must select the corresponding layout property for Grooper to return the information.

Essentially, an Array collated result is a collection of results who share a layout relationship that are all lined up together (either horizontally, vertically, or in the left/right and top/bottom text flow of the document).

The Array collation differs from the Ordered Array collation in one significant way. In an Ordered Array the order of the data matters. For the Array Collation Provider the data can be in any order and will all be returned as one result.

How To

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023). The first contains a Project with resources used in examples throughout this article. The second contains one or more Batches of sample documents.

Setting the Collation Property





The Horizontal Layout Property



The Vertical Layout Property


The Flow Layout Property

The Flow Layout Property should be used when the information you want to extract is contained within the flow of a paragraph or text. You will commonly find this in unstructured documents.





The Minimum Elements Property

You may have noticed when we first switched to the third document in the Batch with Vertical Layout two words were returned because there were two words that just happened to be stacked on top of one another. It returned when Grooper found two of the words, but not just a single word. The reason is because the Minimum Elements Property is set to 2 by default. The following screenshots will explain how this property works.




A Practical Example

Although the above examples make it easier to see how Array works, it is not something you'd probably see in the real world. The following screenshots give an example of how you might use a collation property in the real world.




Maximum Distance


Enforce Line Boundaries