What's New in Grooper 2021: Difference between revisions

From Grooper Wiki
No edit summary
No edit summary
 
(3 intermediate revisions by 2 users not shown)
Line 14: Line 14:
== Introducing... Behaviors! ==
== Introducing... Behaviors! ==


[[File:behaviors-badge.png|thumb|150px]]
[[File:behaviors-badge-3.png|thumb|150px|link=Behaviors]]


'''''[[Behaviors]]''''' are a new set of features designed to centralize the '''Content Model''' as the main hub controlling various aspects of document processing.  '''''Behaviors''''' are born of the idea that consolidating the the flow of document data to the objects most relevant to its collection and delivery makes for a more streamlined and effective Grooper experience.
'''''[[Behaviors]]''''' are a new set of features designed to centralize the '''Content Model''' as the main hub controlling various aspects of document processing.  '''''Behaviors''''' are born of the idea that consolidating the the flow of document data to the objects most relevant to its collection and delivery makes for a more streamlined and effective Grooper experience.
Line 22: Line 22:
The following Behavior Types are introduced in 2021:
The following Behavior Types are introduced in 2021:


* '''''Import Behavior'''''
:* '''''Import Behavior'''''
* '''''Export Behavior'''''
:* '''''Export Behavior'''''
* '''''Labeling Behavior'''''
:* '''''Labeling Behavior'''''
* '''''PDF Data Mapping'''''
:* '''''PDF Data Mapping'''''
* '''''Text Rendering'''''
:* '''''Text Rendering'''''


<br clear=all>
<br clear=all>
Line 32: Line 32:
== Introducing... Label Sets! ==
== Introducing... Label Sets! ==


[[File:label-sets-badge.png|thumb|150px]]
[[File:label-sets-badge-3.png|thumb|150px|link=Labeling Behavior]]


The ''[[Labeling Behavior]]'' functionality represents a huge change in how document content can be modeled and collected for structured and semi-structured document sets.  It capitalizes on the utility labels provide to understand a document and its data.  Grooper collects and uses "Label Sets" for each '''Document Type''' for a variety of document processing purposes, including:
The ''[[Labeling Behavior]]'' functionality represents a huge change in how document content can be modeled and collected for structured and semi-structured document sets.  It capitalizes on the utility labels provide to understand a document and its data.  Grooper collects and uses "Label Sets" for each '''Document Type''' for a variety of document processing purposes, including:


* Document classification - Using the ''Labelset-Based'' '''''Classification Method'''''
:* Document classification - Using the ''Labelset-Based'' '''''Classification Method'''''
* Field based data extraction - Primarily using the ''Labeled Value'' '''''Extractor Type'''''
:* Field based data extraction - Primarily using the ''Labeled Value'' '''''Extractor Type'''''
* Tabular data extraction - Primarily using a '''Data Table''' object's ''Tabular Layout'' '''''Extract Method'''''
:* Tabular data extraction - Primarily using a '''Data Table''' object's ''Tabular Layout'' '''''Extract Method'''''
* Sectional data extraction - Primarily using a '''Data Section''' object's ''Transaction Detection'' '''''Extract Method'''''
:* Sectional data extraction - Primarily using a '''Data Section''' object's ''Transaction Detection'' '''''Extract Method'''''


"Label Sets" offer vast improvements to these areas, both simplifying setup and allowing for quicker onboarding of new '''Document Types''' for structured and semi-structures forms.
"Label Sets" offer vast improvements to these areas, both simplifying setup and allowing for quicker onboarding of new '''Document Types''' for structured and semi-structures forms.
Line 47: Line 47:
== Introducing... PDF Data Mapping! ==
== Introducing... PDF Data Mapping! ==


[[File:smart-pdf-badge.png|thumb|150px]]
[[File:smart-pdf-badge-3.png|thumb|150px|link=PDF Data Mapping]]


The ''[[PDF Data Mapping]]'' functionality is part of the foundation for Grooper's "Smart PDF" architecture.  The "Smart PDF" architecture's goal is to unify document content into a single source.  Too often it is the case document content is divided in two, with the image-based and text content being represented as a PDF file and the data content living in a database or other content management platform.
The ''[[PDF Data Mapping]]'' functionality is part of the foundation for Grooper's "Smart PDF" architecture.  The "Smart PDF" architecture's goal is to unify document content into a single source.  Too often it is the case document content is divided in two, with the image-based and text content being represented as a PDF file and the data content living in a database or other content management platform.
Line 55: Line 55:
The ''PDF Data Mapping'' functionality includes the ability to embed PDFs with the following data:
The ''PDF Data Mapping'' functionality includes the ability to embed PDFs with the following data:


*'''''Metadata'''''
:*'''''Metadata'''''
*'''''Bookmarks'''''
:*'''''Bookmarks'''''
*'''''Annotations'''''
:*'''''Annotations'''''


<br clear=all>
<br clear=all>
Line 63: Line 63:
== Introducing... Data Rules! ==
== Introducing... Data Rules! ==


[[File:rules-engine-badge.png|thumb|150px]]
[[File:rules-engine-badge-3.png|thumb|150px|link=Data Rule]]


The '''[[Data Rule]]''' is a new object available in Grooper 2021.  '''Data Rules''' allow for complex validation and manipulation of '''Data Elements''' in a '''Data Model'''.  This allows users to create a conditional hierarchy of actions to take if certain conditions are met.  This includes clearing, copying, appending, parsing and calculating values based on a series of expression based conditions.  '''Data Rules''' expand on simpler validation and calculation methods available to '''Data Element''' objects, and allow for more simplified setup and net new capabilities for more complicated data normalization projects.
The '''[[Data Rule]]''' is a new object available in Grooper 2021.  '''Data Rules''' allow for complex validation and manipulation of '''Data Elements''' in a '''Data Model'''.  This allows users to create a conditional hierarchy of actions to take if certain conditions are met.  This includes clearing, copying, appending, parsing and calculating values based on a series of expression based conditions.  '''Data Rules''' expand on simpler validation and calculation methods available to '''Data Element''' objects, and allow for more simplified setup and net new capabilities for more complicated data normalization projects.
Line 75: Line 75:
== Introducing... API! ==
== Introducing... API! ==


[[File:api-badge.png|thumb|150px]]
[[File:api-badge-3.png|thumb|150px]]


Beginning in 2021, Grooper offers a RESTful Document Ingestion API.  The document ingestion API provides the ability to create and populate batches, and the ability to monitor the status of batch processes, and retrieve results. It allows users to create dashboards or portals that interface with existing processes, including allowing them to build portals that feed documents into a Grooper process, or dashboards that display, and change extracted values.
Beginning in 2021, Grooper offers a RESTful Document Ingestion API.  The document ingestion API provides the ability to create and populate batches, and the ability to monitor the status of batch processes, and retrieve results. It allows users to create dashboards or portals that interface with existing processes, including allowing them to build portals that feed documents into a Grooper process, or dashboards that display, and change extracted values.
Line 87: Line 87:
=== Goodbye Data Formats... Hello Value Reader! ===
=== Goodbye Data Formats... Hello Value Reader! ===


[[File:Document_viewer_00.png|thumb|150px]]
[[File:Value-reader-badge-3.png|thumb|150px|link=Value Reader]]


The '''[[Value Reader]]''' is a new extraction object introduced in Grooper 2021 to replace and improve on the '''Data Format''' object.  The '''Value Reader''' extractor combines over a dozen multiple extractor types into a single extractor for increased functionality and ease of use.
The '''[[Value Reader]]''' is a new extraction object introduced in Grooper 2021 to replace and improve on the '''Data Format''' object.  The '''Value Reader''' extractor combines over a dozen multiple extractor types into a single extractor for increased functionality and ease of use.
Line 98: Line 98:


Part of Grooper's switch to the '''Value Reader''' object allows for simplified extraction of data wrapping multiple lines, in certain situations.
Part of Grooper's switch to the '''Value Reader''' object allows for simplified extraction of data wrapping multiple lines, in certain situations.
*[[Vertical Wrap]] for easier stacked label matching.
 
*[[Constrained Wrap]] for easier pattern matching for data constrained in a box (think table cells).
:*[[Vertical Wrap]] for easier stacked label matching.
:*[[Constrained Wrap]] for easier pattern matching for data constrained in a box (think table cells).


=== OMR Improvements ===
=== OMR Improvements ===
Line 118: Line 119:


2021 also introduces three brand new table extraction methods.
2021 also introduces three brand new table extraction methods.
* ''[[Tabular Layout]]''
* ''[[Tabular Layout]]''
** This method brings the most advanced improvements to table extraction to date.  Building on the best parts of ''Header-Value'' and ''Infer Grid'', this method returns highly accurate results with simplified initial setup but enough configurability to target a wide variety of table structures.
** This method brings the most advanced improvements to table extraction to date.  Building on the best parts of ''Header-Value'' and ''Infer Grid'', this method returns highly accurate results with simplified initial setup but enough configurability to target a wide variety of table structures.
Line 175: Line 177:
Please visit the following articles for a more detailed explaination of Grooper Config and the current Grooper instalation and setup instructions:
Please visit the following articles for a more detailed explaination of Grooper Config and the current Grooper instalation and setup instructions:


* [[Grooper Config]]
:* [[Grooper Config]]
* [[Install and Setup]]
:* [[Install and Setup]]


=== Improved Upgrade Process! ===  
=== Improved Upgrade Process! ===  


Grooper can now upgrade repositions directly to 2021 from versions, 2.72, 2.80 or 2.90.
Grooper can now upgrade repositions directly to 2021 from versions, 2.70, 2.72, 2.80 or 2.90.
* Versions older than 2.72, ''must'' upgrade to 2.72 before upgrading to 2021.
* Versions older than 2.70 ''must'' upgrade to 2.70 before upgrading to 2021.


== Improved PDF Splitting ==
== Improved PDF Splitting ==


Version 2021 makes significant changes to the '''[[Content Action]] Activity'''.
Version 2021 makes significant changes to the '''[[2.90:Content Action]] Activity'''.


In previous versions of Grooper, multipage PDF documents could have individual child Batch Page objects created from the PDF file attached to the parent '''Batch Folder''', using the '''Content Action Activity''' and the ''Split'' action.  This functionality is now performed using a separate '''Activity''' called '''Split Pages'''.  The '''Split Pages''' activity offers many new capabilities not previously available.
In previous versions of Grooper, multipage PDF documents could have individual child Batch Page objects created from the PDF file attached to the parent '''Batch Folder''', using the '''Content Action Activity''' and the ''Split'' action.  This functionality is now performed using a separate '''Activity''' called '''Split Pages'''.  The '''Split Pages''' activity offers many new capabilities not previously available.
Line 195: Line 197:
* Document Viewer improvements
* Document Viewer improvements
* Text file processing improvements
* Text file processing improvements
* Re-organization of the '''Global Resources''' folder.
** The '''Global Resources''' folder now houses all globally accessible Grooper objects, including '''Data Types''', '''CMIS Connections''', and '''Data Connections'''. 
** Furthermore, you can organize these assets however you see fit by creating sub folders.  For example, '''IP Profiles''' can exist in any folder in the '''Global Resources''' folder, not ''just'' an '''IP Profiles''' sub-folder.

Latest revision as of 13:05, 22 December 2023


Welcome to Grooper 2021!


Grooper version 2021 is here! There's a slew of new features, "under-the-hood" architecture improvements, and simplified redesigns to make this version both easiest to use and provide the most accurate capture capabilities to date.

Below you will find brief descriptions on new and/or changed features. When available, follow any links to extended articles on a topic.

Introducing... Behaviors!

Behaviors are a new set of features designed to centralize the Content Model as the main hub controlling various aspects of document processing. Behaviors are born of the idea that consolidating the the flow of document data to the objects most relevant to its collection and delivery makes for a more streamlined and effective Grooper experience.

This allows a Content Model (and its component Content Types) to wrest control from various other disparate Activities, centralizing command of how documents and their data are modeled and what happens to that data once collected. The result is more focused control around how document data is imported, organized, collected, and exported by a Content Model. In other words, how it "behaves".

The following Behavior Types are introduced in 2021:

  • Import Behavior
  • Export Behavior
  • Labeling Behavior
  • PDF Data Mapping
  • Text Rendering


Introducing... Label Sets!

The Labeling Behavior functionality represents a huge change in how document content can be modeled and collected for structured and semi-structured document sets. It capitalizes on the utility labels provide to understand a document and its data. Grooper collects and uses "Label Sets" for each Document Type for a variety of document processing purposes, including:

  • Document classification - Using the Labelset-Based Classification Method
  • Field based data extraction - Primarily using the Labeled Value Extractor Type
  • Tabular data extraction - Primarily using a Data Table object's Tabular Layout Extract Method
  • Sectional data extraction - Primarily using a Data Section object's Transaction Detection Extract Method

"Label Sets" offer vast improvements to these areas, both simplifying setup and allowing for quicker onboarding of new Document Types for structured and semi-structures forms.


Introducing... PDF Data Mapping!

The PDF Data Mapping functionality is part of the foundation for Grooper's "Smart PDF" architecture. The "Smart PDF" architecture's goal is to unify document content into a single source. Too often it is the case document content is divided in two, with the image-based and text content being represented as a PDF file and the data content living in a database or other content management platform.

PDF Data Mapping allows Grooper to store data content directly to the PDF itself, including separation and classification data as well as Data Fields from a Data Model, through document metadata. This way, even if you do store document data in a database, the document itself retains all the information Grooper collected inside the PDF itself as well.

The PDF Data Mapping functionality includes the ability to embed PDFs with the following data:

  • Metadata
  • Bookmarks
  • Annotations


Introducing... Data Rules!

The Data Rule is a new object available in Grooper 2021. Data Rules allow for complex validation and manipulation of Data Elements in a Data Model. This allows users to create a conditional hierarchy of actions to take if certain conditions are met. This includes clearing, copying, appending, parsing and calculating values based on a series of expression based conditions. Data Rules expand on simpler validation and calculation methods available to Data Element objects, and allow for more simplified setup and net new capabilities for more complicated data normalization projects.

There are also two new Batch Processing Activities that apply Data Rules as well:

  • Apply Rules
  • Convert Data


Introducing... API!

Beginning in 2021, Grooper offers a RESTful Document Ingestion API. The document ingestion API provides the ability to create and populate batches, and the ability to monitor the status of batch processes, and retrieve results. It allows users to create dashboards or portals that interface with existing processes, including allowing them to build portals that feed documents into a Grooper process, or dashboards that display, and change extracted values.

The API has some other capabilities, such as the ability to ingest compressed archives of Grooper notes (which could assist in automation of new repository population) and the ability to query certain pieces of information from the repository.


Data Extraction Improvements

Goodbye Data Formats... Hello Value Reader!

The Value Reader is a new extraction object introduced in Grooper 2021 to replace and improve on the Data Format object. The Value Reader extractor combines over a dozen multiple extractor types into a single extractor for increased functionality and ease of use.

It is designed to expand on the extractor functionality of Grooper's regular expression pattern matching capabilities to include newer extraction capabilities, such as extracting values next to OMR (optical mark recognition) checkboxes and barcode values. In previous versions, this functionality was split across multiple objects (or properties of multiple objects). The Value Reader extractor combines these disparate functionalities into a single extractor object with increased functionality.

The Value Reader also adds brand new extraction capabilities through new Extractor Types. This includes the Labeled Value extractor, offering an improvement on Key-Value Pair collated Data Type extraction. On top of that, Labeled Value interacts with Grooper's new Label Set functionality to vastly simplify and improve Data Field extraction for those projects using the Labeling Behavior functionality.

Vertical Wrap and Constrained Wrap

Part of Grooper's switch to the Value Reader object allows for simplified extraction of data wrapping multiple lines, in certain situations.

OMR Improvements

We've made dramatic improvements the Labeled OMR extractor. For the first time ever, this extractor can return labels next to radio buttons!

  • Note: The other two OMR-based extractors, Ordered OMR and Zonal OMR still cannot target circular, radio button style, checkboxes.

New and Improved Table Extraction

In 2021, Grooper offers the following improvements to existing Extract Methods for Data Table objects.

  • Grid Layout
    • Formerly called Infer Grid, this method provides under-the-hood improvements to how line locations determine the grid-like structure of tables on digital and image-based documents.
  • Header-Value
    • Improved header width detection.
  • Delimited Extract
    • This method offers an improvement upon CSV Extract, allowing simplified extraction of delimiter separated text files. Previously, Grooper could only extract CSV files with a Data Table object. With Delimited Extract, both CSV files and delimited text files using other character separators can easily be extracted.

2021 also introduces three brand new table extraction methods.

  • Tabular Layout
    • This method brings the most advanced improvements to table extraction to date. Building on the best parts of Header-Value and Infer Grid, this method returns highly accurate results with simplified initial setup but enough configurability to target a wide variety of table structures.
    • This method also was built with "Label Sets" in mind, further simplifying its set up when using that feature.
  • Fluid Layout
    • This method leverages Tabular Layout and Row Match targeting document sets with highly variable table structures, allowing users to configure Tabular Layout while falling back on the Row Match method if it fails.
  • Fixed Width
    • This new method allows for extraction of tabular data in fixed width text files.

New and Improved Data Section Methods

Grooper 2021 introduces two new Extract Methods for Data Section objects.

  • Transaction Detection
    • This extraction method automatically detects sections in a document using a Data Section's Data Field locations and analyzing the similarities of lines surrounding it. 'Transaction Detection is useful for certain semi-structured documents which have multiple sections which are themselves very structured, repeating the same (or at least very similar) field or table data.
    • Transaction Detection has additional functionality when used in combination with Label Sets.
  • Nested Table
    • This is a specialized extraction method for sections with table data nested within each section. The Nested Table method divides a document into sections by extracting table data within those sections.
    • This method is heavily reliant on Label Sets in order to function.

Introducing... Lambda Expressions!

Grooper 2021 allows use of lambda functions in expression-based functionality. This includes default value, calculated value, and validation expressions for Data Elements as well as for trigger conditions for hierarchical Data Rule execution orders. In many cases, this cuts down on the need for custom code to perform custom data validation in Grooper (particularly when leveraged with Data Rules).

OCR Improvements

We've made several advancements to our OCR Synthesis functionality to return better OCR results by segmenting an image into distinct regions and OCR'ing them independently from another.

We've also added a OCR Cleanup IP Command to our suite of IP Profile image processing commands. This is an exceptionally powerful tool to clean up images prior to OCR with a single IP Command.

Changes to Document Export and Database Export

Goodbye Document Export and Database Export... Hello Export!

In 2021, we heavily reworked Grooper's document and data export functionality, to improve the process and allow for new functionality. As part of this process, we unified Document Export and Database Export into a single Activity: Export

Export is now the single Activity driving all export operations in Grooper. Whether exporting PDFs to a content management system, exporting data to a database, or any content to any external storage platform, Export is your way to go.

Goodbye CMIS Content Types... Hello Import and Export Behaviors!

One big change to how things were done before 2021 is how data is mapped according to its Data Model structure to or from an external storage platform upon document import or export. Previously, these mappings were configured using CMIS Content Type objects, created as children of a CMIS Connection.

In 2021, the CMIS Connection object purely serves the function of integrating Grooper with an external storage platform. Import and export mappings are defined using Import or Export Behaviors. This removes some unnecessary object bloat around the CMIS Connection object and lets the Content Model and Document Types drive their associated Data Model mappings.

  • Import and Export Behaviors are configurable via:
    • Content Models or Content Categories or Document Types
    • The Export Activity (in the case of export related mappings only)

Improved Database Lookups and Export

We now offer improved integration with PostgreSQL, Db2, MySQL, and Oracle.

Install and Setup Changes

The Grooper Config application's interface was dramatically altered in version 2021. This was done to simplify repository configuration, Grooper services, and, most notably, product licensing.

Please visit the following articles for a more detailed explaination of Grooper Config and the current Grooper instalation and setup instructions:

Improved Upgrade Process!

Grooper can now upgrade repositions directly to 2021 from versions, 2.70, 2.72, 2.80 or 2.90.

  • Versions older than 2.70 must upgrade to 2.70 before upgrading to 2021.

Improved PDF Splitting

Version 2021 makes significant changes to the 2.90:Content Action Activity.

In previous versions of Grooper, multipage PDF documents could have individual child Batch Page objects created from the PDF file attached to the parent Batch Folder, using the Content Action Activity and the Split action. This functionality is now performed using a separate Activity called Split Pages. The Split Pages activity offers many new capabilities not previously available.

Furthermore, processing time for the split operation can be dramatically improved. In previous versions, splitting out a PDF's pages could be time consuming for longer documents while Grooper fully rendered each page. This could cause a bottleneck in Batch Processing while the user waits for Grooper to split large PDF files. Now, you can make better use of Grooper's parallel thread processing capabilities by rendering each page multi-threaded with a new Rasterize command for the Execute Activity.

Miscellaneous

  • Document Viewer improvements
  • Text file processing improvements
  • Re-organization of the Global Resources folder.
    • The Global Resources folder now houses all globally accessible Grooper objects, including Data Types, CMIS Connections, and Data Connections.
    • Furthermore, you can organize these assets however you see fit by creating sub folders. For example, IP Profiles can exist in any folder in the Global Resources folder, not just an IP Profiles sub-folder.