Data Extraction (Concept): Difference between revisions
No edit summary |
No edit summary Tag: Reverted |
||
| Line 9: | Line 9: | ||
* [[Data Instance (Concept)|Data Instance]] | * [[Data Instance (Concept)|Data Instance]] | ||
* '''''[[Extract (Activity)|Extract]]''''' | * '''''[[Extract (Activity)|Extract]]''''' | ||
== Glossary == | |||
<u><big>'''Batch Folder'''</big></u>: {{#lst:Glossary|Batch Folder}} | |||
<u><big>'''Data Context'''</big></u>: {{#lst:Glossary|Data Context}} | |||
<u><big>'''Data Element'''</big></u>: {{#lst:Glossary|Data Element}} | |||
<u><big>'''Data Extraction'''</big></u>: {{#lst:Glossary|Data Extraction}} | |||
<u><big>'''Data Extractor'''</big></u>: {{#lst:Glossary|Data Extractor}} | |||
<u><big>'''Data Instance'''</big></u>: {{#lst:Glossary|Data Instance}} | |||
<u><big>'''Extract'''</big></u>: {{#lst:Glossary|Extract}} | |||
Revision as of 15:06, 3 May 2024
|
STUB |
This article is a stub. It contains minimal information on the topic and should be expanded. |
Data Extraction involves identifying and capturing specific information from documents (represented by folder Batch Folders in Grooper). Extraction is performed by configurable Data Extractors, which transform unstructured or semi-structured data into a structured, usable format for processing and analysis.
For more/related information please visit the following articles:
Glossary
Batch Folder: The folder Batch Folder is an organizational unit within a inventory_2 Batch, allowing for a structured approach to managing and processing a collection of documents. Batch Folder nodes serve two purposes in a Batch. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other Batch Folders and/or contract Batch Page nodes as children.
- Batch Folders are frequently referred to simply as "documents" or "folders" depending on how they are used in the Batch.
Data Context: Data Context refers to contextual information used to extract data, such as a label that identifies the value you want to collect.
Data Element: Data Elements are a class of node types used to collect data from a document. These include: data_table Data Models, insert_page_break Data Sections, variables Data Fields, table Data Tables, and view_column Data Columns.
Data Extraction: Data Extraction involves identifying and capturing specific information from documents (represented by folder Batch Folders in Grooper). Extraction is performed by configurable Data Extractors, which transform unstructured or semi-structured data into a structured, usable format for processing and analysis.
Data Extractor: Data Extractor (or just "extractor") refers to all Value Extractors and Extractor Nodes. Extractors define the logic used to return data from a document's text content, including general data (such as a date) and specific data (such as an agreement date on a contract).
Data Instance: A Data Instance is an encapsulation of text data within a document returned by Grooper's extractors. Data instances are the hierarchy of text data created by Grooper's extractors.
Extract: export_notes Extract is an Activity that retrieves information from folder Batch Folder documents, as defined by Data Elements in a data_table Data Model. This is how Grooper locates unstructured data on your documents and collects it in a structured, usable format.