2023.1:Collation Provider (Property): Difference between revisions

← Older edit Newer edit →

Revision as of 09:34, 26 April 2024

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.

The Collation property of a pin Data Type defines the method for converting its raw results into a final result set. It is configured by selecting a Collation Provider. The Collation Provider governs how initial matches from the Data Type's extractor(s) are combined and interpreted to produce the Data Type's final output.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

About

Data Type extractors in Grooper use regular expression to match a document's text data in order to return a particular piece of information. Extractors serve a variety of purposes. They can be used to populate fields in a Data Model, to separate and classify documents, to break up a document into sections, and more. For the most part, any time part of document's text data is needed or useful to do something, you need an extractor to find and return it.

Often, this requires something more complex than returning a single result. The relationships between multiple extraction results are often important. The fact results are physically related to each other on the page, or text exists between one or more results, or results are in one order versus another can be used accomplish various goals in Grooper.

The following Collation Providers are available in Grooper:

Individual - The Collation property is set to Individual by default. Each result is returned individually.
Combine - Takes multiple extracted instances and combines them into one result.
AND - Returns a result when the extractor gets at least one hit. Useful in Classification.
Key-Value Pair - Matches a "Key" with a "Value" on a document to collect information following a label. This Collation method is not as commonly used anymore as the Labeled Value extractor is an easier way to get similar results.
Key-Value List - Returns a list of results based on the layout relationship to a "Key" or label.
Array - Combines a list of values that are arranged in a horizontal, vertical, or flow layout into a single result. Can be useful for Row Match Table Extraction.
Ordered Array - Combines a list of values that are arranged in a horizontal, vertical, or flow layout into a single result. However, unlike Array, Ordered Array the order in which the individual values are extracted matters and each extractor must return a value. Also useful in Row Match Table Extraction.
Split - Separates a data instance at each match returned by a Data Type. Useful for splitting up a document into smaller segments for more accurate extraction.
Pattern-Based - Uses regular expressions to sequence returned results into a final result set.
Multi-Column - Combines multiple columns on a page into a single column result for easier and more accurate extraction.

@@ Line 1: / Line 1: @@
-{{Migrated2023}}
+{{AutoVersion}}
-{{2023:{{PAGENAME}}}}
+{|class="wip-box"
+|
+'''WIP'''
+|
+This article is a work-in-progress or created as a placeholder for testing purposes.  This article is subject to change and/or expansion.  It may be incomplete, inaccurate, or stop abruptly.
+This tag will be removed upon draft completion.
+|}
+<blockquote>{{#lst:Glossary|Collation Provider}}</blockquote>
+{|class="download-box"
+|
+[[File:Asset 22@4x.png]]
+|
+You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). The first contains one or more '''Batches''' of sample documents. The second contains one or more '''Projects''' with resources used in examples throughout this article.
+* [[Media:2023.1_Wiki_Collation-Provider_Batch.zip]]
+* [[Media:2023.1_Wiki_Collation-Provider_Project.zip]]
+|}
+== About ==
+'''[[Data Type]]''' extractors in Grooper use regular expression to match a document's text data in order to return a particular piece of information. Extractors serve a variety of purposes. They can be used to populate fields in a [[Data Model]], to separate and classify documents, to break up a document into sections, and more. For the most part, any time part of document's text data is needed or useful to do something, you need an extractor to find and return it.
+Often, this requires something more complex than returning a single result. The relationships between multiple extraction results are often important. The fact results are physically related to each other on the page, or text exists between one or more results, or results are in one order versus another can be used accomplish various goals in Grooper.
+The following '''''Collation Providers''''' are available in Grooper:
+* Individual - The '''''Collation''''' property is set to ''Individual'' by default. Each result is returned individually.
+* [[Combine (Collation Provider)|Combine]] - Takes multiple extracted instances and combines them into one result.
+* [[AND (Collation Provider)|AND]] - Returns a result when the extractor gets at least one hit. Useful in Classification.
+* [[Key-Value Pair (Collation Provider)|Key-Value Pair]] - Matches a "Key" with a "Value" on a document to collect information following a label. This '''''Collation''''' method is not as commonly used anymore as the [[Labeled Value (Extractor Type)|Labeled Value]] extractor is an easier way to get similar results.
+* [[Key-Value List (Collation Provider)|Key-Value List]] - Returns a list of results based on the layout relationship to a "Key" or label.
+* [[Array (Collation Provider)|Array]] - Combines a list of values that are arranged in a horizontal, vertical, or flow layout into a single result. Can be useful for [[Row Match (Table Extract Method)|Row Match]] Table Extraction.
+* [[Ordered Array (Collation Provider)|Ordered Array]] - Combines a list of values that are arranged in a horizontal, vertical, or flow layout into a single result. However, unlike Array, Ordered Array the order in which the individual values are extracted matters and each extractor must return a value. Also useful in [[Row Match (Table Extract Method)|Row Match]] Table Extraction.
+* [[Split (Collation Provider)|Split]] - Separates a [[Data Instance (Concept)|data instance]] at each match returned by a [[image:GrooperIcon_DataType.png]][[Data Type (Object)|Data Type]]. Useful for splitting up a document into smaller segments for more accurate extraction.
+* [[Pattern-Based (Collation Provider)|Pattern-Based]] - Uses regular expressions to sequence returned results into a final result set.
+* [[Multi-Column (Collation Provider)|Multi-Column]] - Combines multiple columns on a page into a single column result for easier and more accurate extraction.