What's New in Grooper 2021

From Grooper Wiki


Welcome to Grooper 2021!

Introducing... Behaviors!

Behaviors are a new set of features designed to centralize the Content Model as the main hub controlling various aspects of document processing. Behaviors are born of the idea that consolidating the the flow of document data to the objects most relevant to its collection and delivery makes for a more streamlined and effective Grooper experience.

This allows a Content Model (and its component Content Types) to wrest control from various other disparate Activities, centralizing command of how documents and their data are modeled and what happens to that data once collected. The result is more focused control around how document data is imported, organized, collected, and exported by a Content Model. In other words, how it "behaves".

The following Behavior Types are introduced in 2021:

  • Import Behavior
  • Export Behavior
  • Labeling Behavior
  • PDF Data Mapping
  • Text Rendering

Introducing... Label Sets!

The Labeling Behavior functionality represents a huge change in how document content can be modeled and collected for structured and semi-structured document sets. It capitalizes on the utility labels provide to understand a document and its data. Grooper collects and uses "Label Sets" for each Document Type for a variety of document processing purposes, including:

  • Document classification - Using the Labelset-Based Classification Method
  • Field based data extraction - Primarily using the Labeled Value Extractor Type
  • Tabular data extraction - Primarily using a Data Table object's Tabular Layout Extract Method
  • Sectional data extraction - Primarily using a Data Section object's Transaction Detection Extract Method

"Label Sets" offer vast improvements to these areas, both simplifying setup and allowing for quicker onboarding of new Document Types for structured and semi-structures forms.

Introducing... PDF Data Mapping!

The PDF Data Mapping functionality is part of the foundation for Grooper's "Smart PDF" architecture. The "Smart PDF" architecture's goal is to unify document content into a single source. Too often it is the case document content is divided in two, with the image-based and text content being represented as a PDF file and the data content living in a database or other content management platform.

PDF Data Mapping allows Grooper to store data content directly to the PDF itself, including separation and classification data as well as Data Fields from a Data Model. This way, even if you do store document data in a database, the document itself retains all the information Grooper collected inside the PDF itself as well.

The PDF Data Mapping functionality includes the ability to embed PDFs with the following data:

  • Metadata
  • Bookmarks
  • Annotations

Introducing... Data Rules!

The Data Rule is a new object available in Grooper 2021. Data Rules allow for complex validation and manipulation of Data Elements in a Data Model. This allows users to create a conditional hierarchy of actions to take if certain conditions are met. This includes clearing, copying, appending, parsing and calculating values based on a series of expression based conditions. Data Rules expand on simpler validation and calculation methods available to Data Element objects, and allow for more simplified setup and net new capabilities for more complicated data normalization projects.

There are also two new Batch Processing Activities that apply Data Rules as well:

  • Apply Rules
  • Convert Data

Introducing... API!

Beginning in 2021, Grooper offers a RESTful Document Ingestion API. The document ingestion API provides the ability to create and populate batches, and the ability to monitor the status of batch processes, and retrieve results. It allows users to create dashboards or portals that interface with existing processes, including allowing them to build portals that feed documents into a Grooper process, or dashboards that display, and change extracted values.

The API Has some other capabilities, such as the ability to ingest compressed archives of Grooper notes (which could assist in automation of new repository population) and the ability to query certain pieces of information from the repository.

Data Extraction Improvements

Goodbye Data Formats... Hello Value Reader!

The Value Reader is a new extraction object introduced in Grooper 2021 to replace and improve on the Data Format object. The Value Reader extractor combines over a dozen multiple extractor types into a single extractor for increased functionality and ease of use.

It is designed to expand on the extractor functionality of Grooper's regular expression pattern matching capabilities to include newer extraction capabilities, such as extracting values next to OMR (optical mark recognition) checkboxes and barcode values. In previous versions, this functionality was split across multiple objects (or properties of multiple objects). The Value Reader extractor combines these disparate functionalities into a single extractor object with increased functionality.

Vertical Wrap and Constrained Wrap

Part of Grooper's switch to the Value Reader functionality allows for simplified extraction of data wrapping multiple lines, in certain situations.

New and Improved Table Extraction

In 2021, Grooper offers the following improvements to existing Extract Methods for Data Table objects.

  • Grid Layout
    • Formerly called Infer Grid, this method provides under-the-hood improvements to how line locations determine the grid-like structure of tables on digital and image-based documents.
  • Header-Value
    • Improved header width detection.
  • Delimited Extract
    • This method offers an improvement upon CSV Extract, allowing simplified extraction of delimiter separated text files. Previously, Grooper could only extract CSV files with a Data Table object. With Delimited Extract, both CSV files and delimited text files using other character separators can easily be extracted.

2021 also introduces three brand new table extraction methods.

  • Tabular Layout
    • This method brings the most advanced improvements to table extraction to date. Building on the best parts of Header-Value and Infer Grid, this method returns highly accurate results with simplified initial setup but enough configurability to target a wide variety of table structures.
    • This method also was built with "Label Sets" in mind, further simplifying its set up when using that feature.
  • Fluid Layout
    • This method leverages Tabular Layout and Row Match targeting document sets with highly variable table structures, allowing users to configure Tabular Layout while falling back on the Row Match method if it fails.
  • Fixed Width
    • This new method allows for extraction of tabular data in fixed width text files.

New and Improved Data Section Methods

Grooper 2021 introduces two new Extract Methods for Data Section objects.

  • Transaction Detection
    • This extraction method automatically detects sections in a document using a Data Section's Data Field locations and analyzing the similarities of lines surrounding it. 'Transaction Detection is useful for certain semi-structured documents which have multiple sections which are themselves very structured, repeating the same (or at least very similar) field or table data.
    • Transaction Detection has additional functionality when used in combination with Label Sets.
  • Nested Table
    • This is a specialized extraction method for sections with table data nested within each section. The Nested Table method divides a document into sections by extracting table data within those sections.
    • This method is heavily reliant on Label Sets in order to function.

Changes to Document Export and Database Export

Goodbye Document Export and Database Export... Hello Export!

In 2021, we heavily reworked Grooper's document and data export functionality, to improve the process and allow for new functionality. As part of this process, we unified Document Export and Database Export into a single Activity: Export

Export is now the single Activity driving all export operations in Grooper. Whether exporting PDFs to a content management system, exporting data to a database, or any content to any external storage platform, Export is your way to go.

Goodbye CMIS Content Types... Hello Import and Export Behaviors!

One big change to how things were done before 2021 is how data is mapped according to its Data Model structure to or from an external storage platform upon document import or export. Previously, these mappings were configured using CMIS Content Type objects, created as children of a CMIS Connection.

In 2021, the CMIS Connection object purely serves the function of integrating Grooper with an external storage platform. Import and export mappings are defined using Import or Export Behaviors. This removes some unnecessary object bloat around the CMIS Connection object and lets the Content Model and Document Types drive their associated Data Model mappings.

Import and Export Behaviors are configurable via:

  • Content Models or Content Categories or Document Types
  • The Export Activity (in the case of export related mappings only)

Install and Setup Changes

The Grooper Config application's interface was dramatically altered in version 2021. This was done to simplify repository configuration, Grooper services, and, most notably, product licensing.

Please visit the following articles for a more detailed explaination of Grooper Config and the current Grooper instalation and setup instructions:

Miscellaneous

  • Changes to Content Action
  • Document Viewer improvements
  • Text file processing improvements