Data Extraction (Concept): Difference between revisions

Revision as of 15:06, 3 May 2024

STUB

This article is a stub. It contains minimal information on the topic and should be expanded.

Would you like to see this article expanded? Let us know at groopereducation@bisok.com.

Data Extraction involves identifying and capturing specific information from documents (represented by folder Batch Folders in Grooper). Extraction is performed by configurable Data Extractors, which transform unstructured or semi-structured data into a structured, usable format for processing and analysis.

For more/related information please visit the following articles:

Glossary

Batch Folder: The folder Batch Folder is an organizational unit within a inventory_2 Batch, allowing for a structured approach to managing and processing a collection of documents. Batch Folder nodes serve two purposes in a Batch. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other Batch Folders and/or contract Batch Page nodes as children.

Batch Folders are frequently referred to simply as "documents" or "folders" depending on how they are used in the Batch.

Data Context: Data Context refers to contextual information used to extract data, such as a label that identifies the value you want to collect.

Data Element: Data Elements are a class of node types used to collect data from a document. These include: data_table Data Models, insert_page_break Data Sections, variables Data Fields, table Data Tables, and view_column Data Columns.

Data Extraction: Data Extraction involves identifying and capturing specific information from documents (represented by folder Batch Folders in Grooper). Extraction is performed by configurable Data Extractors, which transform unstructured or semi-structured data into a structured, usable format for processing and analysis.

Data Extractor: Data Extractor (or just "extractor") refers to all Value Extractors and Extractor Nodes. Extractors define the logic used to return data from a document's text content, including general data (such as a date) and specific data (such as an agreement date on a contract).

Data Instance: A Data Instance is a unit of data within a document. Data Instances form a hierarchy defined by the document’s data_table Data Model, from the document level down to individual variables Data Fields. They store extracted, entered, or calculated values along with associated metadata such as location and confidence.

Extract: export_notes Extract is an Activity that retrieves information from folder Batch Folder documents, as defined by Data Elements in a data_table Data Model. This is how Grooper locates unstructured data on your documents and collects it in a structured, usable format.

@@ Line 9: / Line 9: @@
 * [[Data Instance (Concept)|Data Instance]]
 * '''''[[Extract (Activity)|Extract]]'''''
+== Glossary ==
+<u><big>'''Batch Folder'''</big></u>: {{#lst:Glossary|Batch Folder}}
+<u><big>'''Data Context'''</big></u>: {{#lst:Glossary|Data Context}}
+<u><big>'''Data Element'''</big></u>: {{#lst:Glossary|Data Element}}
+<u><big>'''Data Extraction'''</big></u>: {{#lst:Glossary|Data Extraction}}
+<u><big>'''Data Extractor'''</big></u>: {{#lst:Glossary|Data Extractor}}
+<u><big>'''Data Instance'''</big></u>: {{#lst:Glossary|Data Instance}}
+<u><big>'''Extract'''</big></u>: {{#lst:Glossary|Extract}}