2.90:Data Extractor (Concept): Difference between revisions

From Grooper Wiki
No edit summary
m Randallkinard moved page 2.90:Data Extractor to 2.90:Data Extractor (Concept) without leaving a redirect: new naming convention
 
(5 intermediate revisions by one other user not shown)
Line 1: Line 1:
Data Extractors isolate and return information from text data on a page. Data extractors are used in a variety of ways, including (but not limited to):
{{AutoVersion}}
<section begin="glossary" />
<blockquote>
Data Extractors are Grooper objects or property configurations used to isolate and return information from text data on a page.  
</blockquote>
<section end="glossary" />
Data extractors (or simply "extractors") are used in a variety of ways, including (but not limited to):


* Classify documents
* Classify documents
* Find data on a page you wish to store outside of Grooper
* Find data on a page you wish to store outside of Grooper
* Separate documents  
* Separate documents


There are three types of Data Extractors:   
== About ==
 
There are three types of Data Extractors ''objects'':   


* '''[[Data Type]]'''
* '''[[Data Type]]'''
* '''[[Data Format]]'''
* '''[[Data Format]]''' (These extractors are ''only'' created as child objects of a '''Data Type''')
* '''[[Field Class]]'''
* '''[[Field Class]]'''


All have configurable properties to leverage information on the page, control how data is extracted, collate extraction results, get around imperfect text data, and format results the extraction returns.
All have configurable properties to leverage information on the page, control how data is extracted, collate extraction results, get around imperfect text data, and format results the extraction returns.
Furthermore, certain objects contain ''properties'' whose configuration extracts data.  These include:
Simple patterns:
* ''[[Internal]]''
* ''[[Text Pattern]]''
''Author's note:  These two options are interchangeable.  For some properties, this option is listed as "Internal".  For others, it is listed as "Text Pattern".  Both behave the same way.  They use a Pattern Editor to write and configure a regular expression pattern to extract data (Similar to how a '''Data Format''' or the "Pattern" property of a '''Data Type''' does).''
Zonal Extractors:
* ''[[Read Zone]]''
* ''[[Highlight Zone]]''
OMR Extractors:
* ''[[Labeled OMR]]''
* ''[[Ordered OMR]]''
* ''[[Zonal OMR]]''
Barcode Extractors:
* ''[[Find Barcode]]''
* ''[[Read Barcode]]''
These properties will also have an option to reference data extractor objects, using the ''Reference'' option.

Latest revision as of 14:10, 16 April 2024

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252.90

Data Extractors are Grooper objects or property configurations used to isolate and return information from text data on a page.

Data extractors (or simply "extractors") are used in a variety of ways, including (but not limited to):

  • Classify documents
  • Find data on a page you wish to store outside of Grooper
  • Separate documents

About

There are three types of Data Extractors objects:

All have configurable properties to leverage information on the page, control how data is extracted, collate extraction results, get around imperfect text data, and format results the extraction returns.

Furthermore, certain objects contain properties whose configuration extracts data. These include:

Simple patterns:

Author's note: These two options are interchangeable. For some properties, this option is listed as "Internal". For others, it is listed as "Text Pattern". Both behave the same way. They use a Pattern Editor to write and configure a regular expression pattern to extract data (Similar to how a Data Format or the "Pattern" property of a Data Type does).

Zonal Extractors:

OMR Extractors:

Barcode Extractors:

These properties will also have an option to reference data extractor objects, using the Reference option.