Extract (Activity): Difference between revisions

From Grooper Wiki
Line 38: Line 38:


# Open the desired [[Batch Process]] in Grooper Design Studio.
# Open the desired [[Batch Process]] in Grooper Design Studio.
# Right-click on the process tree and select '''Add Activity'''.
# Right-click on the process tree and select "Add Activity".
# In the activity type list, choose '''Extract''' and click '''OK'''.
# In the activity type list, choose "Extract" and click "OK".
# Select the new Extract Activity node.
# Select the new Extract Activity node.
# Configure properties if need be.
# Configure properties if need be.

Revision as of 13:50, 21 November 2025

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

export_notes Extract is an Activity that retrieves information from folder Batch Folder documents, as defined by Data Elements in a data_table Data Model. This is how Grooper locates unstructured data on your documents and collects it in a structured, usable format.

What is the Extract Activity?

The Extract Activity in Grooper is a core step in document processing that performs data extraction from documents in a Batch. Its main purpose is to populate the Data Model with extracted information, making it available for review, validation, and export.

Extraction is defined by extraction objects within the Node Tree, and then organized by the Data Model through Data Hierarchy. These objects (which are child objects of the Data Model) are:

  • Data Field: A single value extractor that captures specific information from a document, such as an invoice number or date.
  • Data Table: An extractor that captures tabular, repeating data, such as line items on an invoice. Each table consists of columns (fields) and rows.
  • Data Section: A hierarchical extractor that groups related fields and tables, allowing for logical organization and extraction of complex document structures.

When the Extract Activity runs, it uses these elements to extract data from each document in the Batch, populating the corresponding Data Model for each document.

Value Extractors

After defining what Data Elements you want to extract, you need to define how to populate those fields, tables, and sections with data. This is done with Value Extractors, often shorthanded to just "extractors".

These are properties that you can configure on a Data Field, Data Table, and/or Data Section.

Data Hierarchy

As discussed earlier, you can create hierarchical relationships within a single Data Model using Data Sections and Data Tables. As a direct child of a Data Model a Data Field will execute against the entire document. However, as a child of a Data Section a Data Field will only execute against the portion of the document described by that Data Section.

Data Models also benefit from a Content Model's inheritance structure. For example, the Content Model itself may have a Data Model but a Document Type may also have its own Data Model. The Document Type, as a child of the Content Model, will inherit all Data Elements' from the parent Content Model's Data Model.

Why use the Extract Activity?

The Extract Activity is essential because it is part of the Collect Phase in Grooper’s five-phase processing model. This phase is where data is gathered from documents in a Batch. Without extraction, there would be no data to review, validate, or export in later phases.

Key reasons to use the Extract Activity:

  • It collects structured data from documents, enabling review (if necessary) and export.
  • It ensures that the Data Model is populated, making extracted data available for business processes.
  • It is a required step for getting data from documents.

How to add the Extract Activity

Follow these steps to add and configure the Extract Activity in a Batch Process:

  1. Open the desired Batch Process in Grooper Design Studio.
  2. Right-click on the process tree and select "Add Activity".
  3. In the activity type list, choose "Extract" and click "OK".
  4. Select the new Extract Activity node.
  5. Configure properties if need be.

Configuring the Extract Activity

Should you need to configure the Extract Activity beyond the default properties, you will need to be aware of each property and how it affects extraction. These properties are:

  • Mode: Choose how extraction handles existing data (Normal, Additive, or Recalculate).
  • Default Content Type: Set a fallback Content Type for unclassified folders.
  • Content Type Filter: Optionally restrict extraction to specific Content Types.
  • Data Element Filter: Optionally restrict extraction to specific Data Elements.
  • Rules: Add any Data Rules for post-processing or validation.
  • Flag Invalid Items: Enable to flag folders with validation errors.
  • Purge Alternate Candidates: Enable to remove alternate field values before saving.
  • Purge Empty Fields: Enable to remove empty fields before saving.
  • Stats Logging: Set the level of extraction statistics to record.

These properties can be configured in either Activity Properties panel, or by expanding the Activity property within the Step Properties panel.


To test the Extract Activity on the Batch Process Step:

  1. Open your Batch Process in the Node Tree.
  2. Select the Extract Step.
  3. Go to the Activity Tester tab.
  4. Choose a Batch on whose documents you wish to text extraction.
  5. Click the play button.
    • Make sure that the document has been OCR'd and has a Content Type assigned to it before testing extraction.
  6. To see the results, select the "View Diagnostics" button.
  7. The diagnostics will open up in a new tab;

Extraction example

Suppose you have a Batch of invoices and want to extract key data for review and export. Your Data Model might include:

  • Data Field: "Invoice Number"
  • Data Field: "Invoice Date"
  • Data Table: "Line Items" (with columns for Description, Quantity, Unit Price, and Line Total)
  • Data Section: "Vendor Information" (with fields for Vendor Name, Address, and Phone)

By configuring the Extract Activity in your Batch Process, Grooper will automatically extract these values from each invoice, populate the Data Model, and make the data available for validation and export.

In this example, we will look at running the Extract Activity on a Batch of invoices. We'll start by testing the activity, and then seeing in in action during a Batch Process.


See also