2023.1:PDF Data Mapping (Behavior)

From Grooper Wiki
Revision as of 16:35, 26 February 2021 by Dgreenwood (talk | contribs)

2021

This article is in development for the upcoming version of Grooper, Grooper 2021. PDF Generate is a new Content Type Behavior option in 2021. This information is incomplete and/or may change by the time of release.

PDF Generate is a Content Type Behavior designed to create an exportable PDF file using the classification and extraction content of a Batch Folder. This includes capabilities to export extracted data as PDF metadata, inserting bookmarks, and creating PDF annotations, such as highlighting, checkbox and signature widgets.

About

The PDF Generate Behavior allows Grooper users to more fully leverage the capabilities of the PDF file type. The standard PDF Export Format in Grooper will use the page image files and their text data to create a multipage PDF file for each document folder upon Export. However, this is just the "display information" required to open and read the document. There's a lot more to what a PDF can be than just a multipage document with page images and machine readable text. PDF content can also include metadata, keywords, bookmarks, annotations, and more!

PDF Generate creates an exportable PDF file that includes some of this additional content available to the PDF format. This is part of Grooper's evolving "Smart PDF Architecture". This is a design philosophy, striving to more fully utilize the capabilities of the PDF file type and merge them with Grooper's own document processing capabilities.

The expanded PDF Generate Behavior functionality can be divided into three categories:

  • Annotations
  • Bookmarks
  • Metadata

Annotations

Annotations are additional objects you can add to PDF documents. Grooper uses information from Data Elements in a Data Model collected during the Extract activity to add these annotations (also called "widgets"). These annotations can increase the readability and add components for the reader to interact with the document, such as checkboxes and signature boxes.

The kinds of annotations you can add are:

  1. Highlighting
  2. Radio group buttons
  3. Checkboxes
  4. Signature boxes
  5. Editable text boxes

Grooper uses the data instance information from extracted Data Fields to insert these annotations. For example, here we set up a Content Model with a Data Field named "Last Name". After the document's data was collected during the Extract activity, Grooper has a data instance it can associate with the "Last Name" Data Field, including its size and location coordinates on the document. We then used the Highlight Annotation to highlight the extracted last name on the document in yellow.

The size of all these annotations can also be adjusted using a Padding property if the size of the extracted data instance is too small for your needs.

Bookmarks

Bookmarks allow easy navigation for multipage PDF documents. When exporting a single PDF comprised of multiple child sub-documents, you can create bookmarks for each child document. This way, you can keep all the documents together in a single PDF file, easily navigating from one section of the document to another.

For example, this document is an application packet for a study abroad program. Each document in the packed was separated and classified as a child document folder of one Document Type or another. The PDF Generate Behavior was used to export the packet as a single PDF and a bookmark was inserted for each sub-document and named after its Document Type.

Grooper can create bookmarks from extracted Data Fields in the document as well.

Metadata

Metadata refers to a PDF file's content beyond the information required to display the document (the page images and encoded text data). Prior to implementing the PDF Generate Behavior functionality, Grooper only had access to edit minimal PDF metadata, notably the file's name upon export. The PDF Generate Behavior allows Grooper to alter and store additional collected metadata as well, including Data Field values collected during the Extract activity. This means Grooper can now create a viewable document with all the extracted data associated with the document itself, independent of that data being stored elsewhere (such as a database table or content management system).

This metadata can be accessed by opening a PDF in a PDF viewer application, such as Adobe Acrobat, and opening the "Document Properties" window from the File menu.

There are several pieces of metadata Grooper has access to.

  1. All of the fields highlighted here can be created from Grooper, using an expression based syntax to access data extracted from the document and system information.
  2. Note this gives Grooper the capability to generate and insert keywords into the PDF's "Keywords" field.
    • In this case, Grooper has created a keyword based on the word count length of the essay in this study abroad application packet.
  3. Extracted Data Field values can also be exported as PDF metadata. This information can be viewed either using the "Custom" tab or the "Additional Metadata..." window.

  1. In the "Custom" tab...
  2. You can see all the Data Fields Grooper extracted and their values as custom metadata for this document.


Be aware the PDF file format has metadata fields already named "Title", "Author", "Subject", "Keywords", "Creator", "Producer", "CreationDate", "ModDate" and "Trapped".

You may run into an issue upon export if you have Data Fields in your Data Model who share one of these names. If using the Metadata creation capabilities of the PDF Generate Behavior, consider these names "taken" and adjust the name of the Data Field to be something different. For example, in this case a Data Field returning the title of the proposal listed on the application was changed from "Title" to "Title of Proposal"

As a Behavior, PDF Generate is configured on a Content Type object, commonly a Content Model or a Document Type.

  1. Here, we have selected a Content Model in the Node Tree.
  2. To add a Behavior, select the Behaviors property and press the ellipsis button at the end.
  3. This will bring up a dialogue window to add various behaviors to the Content Model, including the PDF Generate Behavior.
  4. Add the PDF Generate Behavior to the list using the "Add" button.
  5. Select PDF Generate Behavior from the listed options.

  1. Once added, you will see a PDF Generate Behavior item added to the Behaviors list.
  2. Selecting this Behavior, you will see property options to configure PDF creation.


The expanded PDF Generate Behavior functionality can be divided into three categories:

  • Metadata
  • Bookmarks
  • Annotations


Before we get into what these properties do, how to configure them, and how they effect the exported PDF, there's one key thing to keep in mind when using the PDF Generate Behavior.

Along with the PDF Generate Behavior, you will also need an Export Behavior configured to export a PDF formatted file. The PDF Generate Behavior does the job of configuring all the extra content (metadata, bookmarks and/or annotations) you want to add to the exported PDF. The Export Behavior does the job of actually creating the PDF (with the content configuration information supplied by the PDF Generate Behavior) and sending it off to an external storage platform.

Export Behaviors can be added to Content Types, such as the Content Model here.

  1. To add an Export Behavior, press the "Add" button in a Behaviors list collector.
  2. Select Export Behavior.


FYI

Export Behaviors can also be configured on the Export activity as local Export Behaviors to the activity configuration.

The benefit to adding it to a Content Model is you will often use information collected from a Content Model upon exporting your documents, such as a document folder's classified Document Type or collected data from a Data Model for field mapping purposes. You might as well do it now, adding it to the Content Model while you're adding the PDF Generate Behavior.

Once the Export Behavior is added, you will need to add an Export Definition. This will control how the file is exported, most notably where the file is exported. Whether exporting to a Windows file system, or an IMAP email mailbox, or a CMIS content management system, Grooper needs to know where to put the file. An Export Definition is how Grooper knows where the file goes.

Importantly for the PDF Generate Behavior, you will also use an Export Definition to define what type(s) of file you want to export. For whichever Export Definition you choose, you will need to ensure you've configured an Export Format for a PDF formatted file in order to export the generated PDF.

  1. To add an Export Definition, select the property and press the ellipsis button at the end.
  2. This will bring up an Export Definitions list collector window.
  3. Here, we've added a CMIS Export definition, using a CMIS Connection to a local NTFS folder.
    • The Export Definition is up to you and your needs. There are many different external storage platforms Grooper can export to.
  4. Note, we've added a PDF Format configuration to the Export Formats property.

We will review some specifics of the PDF Format option's configuration later. For now, just be aware adding a PDF Export Format is a necessary step to export the PDF file generated by the PDF Generate Behavior.

Version Differences

Behaviors are a new functionality in Grooper 2021. Much of the PDF Generate Behavior functionality was not available in previous versions. Prior to version 2021, only annotation creation was possible using the Generate PDF activity. In version 2021, this activity has been replaced by the PDF Generate Behavior, expanding its capabilities to generate bookmarks and document metadata as well.