2023.1:PDF Data Mapping (Behavior)

From Grooper Wiki

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252023.120232021

PDF Data Mapping is a Content Type Behavior designed to create an exportable PDF file with additional native PDF elements.

PDF Data Mapping builds a data rich "Smart PDF" from a document folder's content. Classification results, extracted data, and more can be used to insert native PDF elements into the generated PDF.

PDF elements that can be mapped from Grooper generated results include:

  • Bookmarks
  • Metadata
  • PDF Annotations (such as text highlighting, checkbox widgets and signature widgets)

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). The first contains a Project with resources used in examples throughout this article. The second contains one or more Batches of sample documents.

About

The PDF Data Mapping behavior allows Grooper users to more fully leverage the capabilities of the PDF file type. The standard PDF Export Format in Grooper will use the page image files and their text data to create a multipage PDF file for each document folder upon Export.

However, this is just the "display information" required to open and read the document. There's a lot more to what a PDF can be than just a multipage document with page images and machine readable text. PDF content can also include metadata, keywords, bookmarks, annotations, and more!

PDF Data Mapping expands the Grooper's standard PDF generation capabilities. It creates an exportable PDF file that includes additional content available to the PDF file type. PDF Data Mapping merges Grooper collected data like classifation results and extracted data into the PDF by mapping these values to native PDF elements like bookmarks and annotation.

The expanded PDF Data Mapping functionality can be divided into three categories:

  • Annotations
  • Bookmarks
  • Metadata

Annotations

Annotations are additional objects you can add to PDF documents.

  • These annotations can increase the readability, such using a highlight annotation to call out important information.
  • These annotations can add components for the reader to interact with the document, such as checkboxes and signature widgets.


PDF Data Mapping can add the following kinds of annotations:

  1. Highlighting
  2. Radio group buttons
  3. Checkboxes
  4. Signature boxes
  5. Editable text boxes


Grooper uses information from Data Elements in a Data Model collected during the Extract activity to add these annotations.

  • For example, if Grooper extracts a "Name" field and you want that highlighted on the output PDF, you can use the "Highlight Annotation" to highlight the name Grooper extracted on the document.

FYI

The size of all these annotations can also be adjusted using a Padding property if the size of the extracted data instance is too small for your needs.

Bookmarks

Bookmarks provide easy navigation for multipage PDF documents. PDF Data Mapping can generate bookmarks in one of two ways:

  1. Bookmarks can be generated for extracted Data Field locations.
  2. When exporting a document folder that has child document folders, bookmarks can be generated for each "sub-document".
    • This is the default bookmarking behavior and requires no configuration. Bookmarks will be named however the child document folders are named.


In this example, this document is an application packet for a study abroad program. It has both kinds of bookmarks.

  • The "Signature" bookmark is from an extracted Data Field. It will take the reader to a signature location on the PDF.
  • The rest were generated for each child document in the document folder (Batch Folder) that was exported. PDF Data Mapping inserted a bookmark for each sub-document. The selected "Resume (4)" bookmark in the image took the reader to the resume page in the PDF.

FYI

Bookmarks generated for child document folders will be named whatever the documents are named.

  • A document folder's (Batch Folder) name defaults to its classified Document Type and document number. Here, "Application (2)", "Proposal Summary (3)", "Resume (3)", and so on.
  • A document folder's name can be changed if you edit the Document Type's Caption property. This will then change the bookmarks name.
    • Be aware, the document must be extracted for the Caption to be applied and its name changed.

Metadata

Metadata refers to a PDF file's content beyond the information required to display the document (the page images and encoded text data). Prior to implementing the PDF Data Mapping functionality, Grooper only had access to edit minimal PDF metadata upon export (notably the PDF's file name).

PDF Data Mapping allows Grooper to alter and store additional metadata, including:

  1. The PDF's default metadata fields, including its Title, Author, Subject values and more.
  2. Keywords
  3. Custom metadata fields
    • Custom metadata allows Grooper to embed any single instance Data Field's value directly to the PDF.


This gives Grooper a mechanism to create a viewable document with all extracted (single instance) data associated with the document itself, independent of that data being stored elsewhere (such as a database table or content management system).

FYI

This metadata can be accessed in Adobe Acrobat by opening the "Document Properties" window from the File menu.

Be aware the PDF file format has metadata fields already named "Title", "Author", "Subject", "Keywords", "Creator", "Producer", "CreationDate", "ModDate" and "Trapped".

  • Consider these names reserved.
  • If you are attempting to export Data Field values as custom PDF metadata, they cannot share any reserved names. You will need to rename the Data Field in Grooper to a unique name.

How To: Add a PDF Data Mapping Behavior

Like all Behaviors, PDF Data Mapping is configured on a Content Type node, commonly a Content Model or a Document Type.


  1. Here, we have selected a Content Model in the Node Tree.
  2. To add a Behavior, select the Behaviors property and click the ellipsis button at the end.
  3. This will bring up a dialogue window to add various behaviors to the Content Model, including PDF Data Mapping.
  4. Add PDF Data Mapping to the list by clicking on the "+" button.
  5. Select PDF Data Mapping from the listed options.


  1. Once added, you will see a PDF Data Mapping item added to the Behaviors list.
  2. Selecting this Behavior, you will see property options to configure PDF creation.
  3. Press "OK" when finished configuring PDF Data Mapping.
  4. Don't forget to save changes to the Content Model.

About the documents used in these tutorials

The following tutorials use a mock UNESCO Laura W. Bush Traveling Fellowship application to detail a more specific set up for a PDF Data Mapping. This is a packet of documents from a single applicant containing a cover page and five different kinds of documents.

By the end of this tutorial we will have taken a source application packet, used Grooper to process it, and exported a single PDF with:

  • Metadata collected from Grooper
  • New annotations and widgets
  • Easily navigable bookmarks

Cover Page and Application

This is an application for a traveling abroad scholarship.

Primarily, the cover page and application document will allow us to demonstrate the annotations and widgets PDF Data Mapping can generate. We will use its Annotations settings to add the following annotations:

  • Text Annotation
  • Highlight Annotation
  • Checkbox Widget
  • Radio Group Widget
  • Signature Widget
  • Textbox Widget

Secondarily, we will also use data collected from this form will be used to generate and store default and custom metadata. We will use the Metadata settings to do this.

Lastly, we will embed a bookmark that will take the PDF's reader to the signature field on the document. We will use the Bookmarking settings to do this.

Essay

This application also includes an essay from the student.

This document will demonstrate how to add Keywords to the PDF's metadata. Using the Metadata settings we will configure a code expression to insert "long essay", "normal essay", or "short essay" depending on the essay's length.

Other Documents

This packet contains three other kinds of documents as well:

  • a proposal summary
  • the applicant's resume
  • and a letter of recommendation.

For these documents (as well as the rest) we will insert bookmarks into the generated PDF, taking the reader to each document in the larger file. We will use Bookmarking settings to do this.

Notes on how this source file was separated in Grooper

The original document was imported as a single document into Grooper. We have separated it into child documents which will allow us to insert bookmarks for each separated document.

  1. The PDF Generation Behavior will be applied to the Batch Folders at folder-level one.
    • The attached file is the source application packet.
  2. The Split Pages activity was applied to split the packet into pages. Then, those pages were separated into classified document folders at folder-level two.
  3. PDF Data Mapping can create a bookmark in the generated PDF for each of these five sub documents by enabling the Bookmarking property.


By creating bookmarks for each child document, there is no need to export individual PDFs for each one. Instead, we will use PDF Data Mapping to generate one PDF for the whole application packet as use the bookmarks to navigate between each document.

How To: Configure Annotations

In this tutorial we will configure at least one example of each Annotation option.

  • Text Annotation
  • Highlight Annotation
  • Radio Group Widget
  • Checkbox Widget
  • Signature Widget
  • Textbox Widget

Prereqs: Data Fields and extracted data

For PDF Data Mapping to work, Grooper needs to have data to map.

  • For Annotations this means extracted Data Fields.
  • The Extract activity must run before Merge or Export generates the PDF.


Each of the Annotation Types references a Data Field in a Data Model as part of their configuration. If the Data Field does not collect data during the Extract activity, the PDF Data Mapping won't know where to place the annotation.

About the Data Model used for this tutorial

The Data Model we're working with has several Data Fields that will allow PDF Data Mapping to place annotations and widgets.

The "Last Name" "First Name" and "Middle Initial" Data Fields (in the "Applicant Information" Data Section) will demonstrate the Highlight Annotation

  • These fields use Labeled Value to extract field values next to a label.
  • Be aware, nearly any extractor type can be used to insert a highlight annotation. Grooper just needs a location on the document to draw the highlight boundaries.

The "US Citizen" Data Field will demonstrate the Radio Group Widget.

  • This field uses Labeled OMR to extract a group of checkboxes where only one may be checked.
  • Be aware, any OMR extractor (Labeled OMR, Ordered OMR or Zonal OMR) would be able insert the radio group widget as long as its Check Mode is set to CheckOne.

The "Checklist" Data Field will demonstrate the Checkbox Widget.

  • This field uses Labeled OMR to extract a group of checkboxes where one or more may be checked.
  • Be aware, any OMR extractor (Labeled OMR, Ordered OMR or Zonal OMR) would be able insert the checkbox widget.

The "Signature" Data Field will demonstrate the Signature Widget.

  • This field uses Detect Signature to detect whether or not a signature is present on the document.
  • Be aware, any zonal extractor (Read Zone, Highlight Zone or Detect Signature) would be able insert the signature widget.

The "Signature Date" Data Field will demonstrate the Textbox Widget.

  • Textbox Widget adds a text-editable form field to the PDF to store a field value.
    • Compare this to a Text Annotation which simply adds a text comment to the PDF.
  • This field uses Labeled Value to extract the date the application was signed.
  • Be aware, any zonal extractor (Read Zone, Highlight Zone or Detect Signature) would be able insert the signature widget.

The "IsProcessed" Data Field will demonstrate the Text Annotation.

  • Text Annotation inserts a text comment in the PDF.
    • Compare this to a Textbox Widget which adds an actual form field to the PDF to store a field value.
    • We will use this field and annotation to print the word "PROCESSED" on the output PDF
  • This field uses Highlight Zone to draw an extraction zone for the field and the Data Field's Default Value to determine what's printed.
    • This is a technique common to Text Annotation use cases and will be explained in further depth below.

Adding Annotations

PDF Data Mapping inserts various types of PDF annotations and widgets by configuring its Annotations property. Users can add one or more Annotation Types to the Annotations list. Adding a new Annotation to the list is simple.

With a PDF Data Mapping behavior added to a Content Type:

  1. Select the PDF Data Mapping behavior in the Behaviors editor.
  2. Select the Annotations property and press the ellipsis button at the end.
  3. This will bring up the Annotations editor.
  4. Press the "+" button.
  5. Select the Annotation Type you want to add from the dropdown list.


  1. Once added, you will see the Annotation Type added to the Annotations list.
  2. All Annotation Types will have a set of General properties to configure.
  3. Some Annotation Types have additional properties you can configure.
    • For example, the Highlight Annotation has Appearance properties you can configure to adjust the highlight's color and other appearance properties.
  4. Press "OK" when finished.

Notes on shared properties

All Annotation Types share a set of General properties.

  • Fields
    • The Fields property is required.
    • Select Data Fields to map the Data Fields to the PDF annotation with this property. If you don't select any Data Fields or the selected Data Fields are not extracted, PDF Data Mapping will not insert an annotation in the output PDF.
    • Be aware, all Data Fields are selected by default.
  • Padding
    • The Padding property can adjust the size of the annotation.
    • Grooper uses a Data Field's result instance to draw the annotation's boundaries.
      • The size of the Data Field's instance may be too small for what you want to appear on the output PDF. Use Padding to increase the annotation's size on the PDF generated by PDF Data Mapping.
  • Allow Edit
    • Allow Edit refers to a reader's ability to edit the annotation as a PDF element, such as moving its location on the PDF or adjusting its size. It does not refer to a reader's ability to interact with the annotation (or widget).
    • Enabling this property (turning it True) will allow users to fully adjust the annotation in the PDF, including its size, location and other properties.
    • Be aware, even when False, users will still be able to interact with widgets, such as the Checkbox Widget or Textbox Widget.
  • Print
    • In a PDF viewing application, like Adobe Acrobat, all annotations and widgets PDF Data Mapping generates will be visible. The Print property determines whether or not the annotation is visible when the PDF is printed.
    • Be aware, the default is False.
      • Grooper presumes you will open the "Smart PDF" output by PDF Data Mapping will be opened in a PDF viewer (where all annotations will be visible).
      • Grooper also presumes if you want to print the PDF, you want something more like the original document printed, not the one with additional PDF elements Grooper inserts. If you do want those annotations and widgets visible when the PDF is printed, turn Print to True.

Annotation Types

Highlight Annotation

The Highlight Annotation overlays a colored rectangle with adjustable transparency on a Data Field's extracted location. In other words, it can highlight extraction results.

  • Use this to highlight important values extracted from Grooper.
  • Like all Annotations, this highlight can be printable or not. When the Print property is False, the highlight will show up when viewed in a PDF viewer but not if the PDF is printed.


In this example, we will use the Highlight Annotation to highlight the extracted "Last Name", "First Name" and "Middle Initial" fields from the application form. To configure this Annotation we will:

  • Select the Data Fields we wish to highlight.
  • Adjust how we want the highlight to look.

Before Annotation

After Annotation

With a Highlight Annotation added to the Annotations list:

  1. Use the Fields property to select the Data Fields you wish to highlight.
    • Press the ellipsis button at the end of the Fields property.
  2. In the window that pops up, mark the checkboxes next to the Data Fields you wish to highlight.
    • In our case, we are choosing the "Last Name", "First Name", and "Middle Initial" Data Fields.
    • Be aware, these fields must be extracted by the Extract activity or nothing will be highlighted.
  3. Press "OK" when finished.


  1. Determine if you need to adjust the annotation's padding. Adjust the Padding property if you do.
    • Adjusting Padding for Highlight Annotations is common. In this example, we increased the highlights size by 0.1 in on each side.
  2. Determine if you need to adjust if the annotation is editable or printable. Adjust the Allow Edit or Print properties if you do.
    • Use the defaults to prevent the users from being able to adjust the annotation and prevent it from being visible when printed.


  1. Adjust the highlight's appearance, as desired, using the Appearance properties.
  2. Most commonly, users will adjust the Fill Color.
    • Use the dropdown to select from a list of system colors.
    • Or, enter an RGB value using the format #, #, #
    • This property defaults to the "Grooper green" highlight seen in Review's Data View. In this example, we've changed it to Yellow.
  3. Press "OK" when finished (or continue adding more Annotations).

Radio Group Widget

The Radio Group Widget overlays a group of radio button PDF elements on top of where a Grooper extractor finds OMR checkboxes on a document.

  • Radio buttons are common PDF elements used to indicate a single choice from multiple options in a list.
    • Note radio buttons (inserted by Radio Group Widget) differ from checkboxes (inserted by Checkbox Widget). For radio buttons, only one choice out of a group may be selected. For checkboxes, any number of choices may be selected.
  • The Data Field(s) this annotation references must use an OMR extractor to return results: Labeled OMR, Ordered OMR or Zonal OMR
    • This extractor must also have its Mode set to CheckOne (Only one box out of many may checked/selected).
  • PDF Data Mapping will insert one radio button for each checkbox the extractor locates.

Before Annotation

After Annotation

With a Radio Group Widget added to the Annotations list:

  1. Use the Fields property to select the Data Field you wish use to insert the group of radio buttons.
    • Press the ellipsis button at the end of the Fields property.
  2. In the window that pops up, mark the checkbox next to the Data Field you wish to select.
    • In our case, we are choosing the "US Citizen" Data Field.
    • Be aware, this fields must (1) use an OMR extractor to return results (2) with its Mode set to CheckOne (3) have already been extracted by the Extract activity and (4) have located checkboxes during extraction or no radio buttons will be placed.
  3. Press "OK" when finished.


  1. Determine if you need to adjust the annotation's padding. Adjust the Padding property if you do.
  2. Determine if you need to adjust if the annotation is editable or printable. Adjust the Allow Edit or Print properties if you do.
    • Use the defaults to prevent the users from being able to adjust the widget and prevent it from being visible when printed.
    • Please note: Allow Edit refers to a reader's ability to edit the widget as a PDF element, such as moving its location on the PDF or adjusting its size. It does not refer to a reader's ability to interact with the widget (press a radio button).
  3. Press "OK" when finished (or continue adding more Annotations).

Be Aware: Annotations are overlaid on a page's image

BE AWARE: The Radio Group Widget overlays radio buttons on a page's image. Any printed checkbox on the original page will persist (behind the widget), unless removed by the Image Processing activity.

  • Notice the original image for this document used checkboxes, not radio buttons. We see an "X" inside of a square box.

You can actually see the edges of the square box persist in the generated PDF (Here, highlighted in yellow for your viewing pleasure).

  • In this case, the boxes were detected by the "detection only" Box Detection IP command and not removed by the "detection and removal" Box Removal command.
  • Box Detection finds and store the checkbox locations and check states but does not actually alter the image in any way.

Maybe you care about this, and maybe you don't. If you do, use Box Removal instead.

  • Box Removal will also find and store the checkbox locations and their check states, but it will also digitally remove the checkboxes from the document's image. This will allow Grooper to extract the checkboxes and allow PDF Data Mapping to overlay the radio buttons on a field of blank pixels.
  • Run Box Removal in an IP Profile using the Image Processing activity prior to running the Extract activity to do this.

Checkbox Widget

The Checkbox Widget inserts one or more form-fillable checkboxes into the PDF on top of where a Grooper extractor finds OMR checkboxes.

  • Checkboxes are common PDF elements used to indicate a choice from one or many options.
    • Note checkboxes (inserted by Checkbox Widget) differ from radio buttons (inserted by Radio Group Widget). For radio buttons, only one choice out of a group may be selected. For checkboxes, any number of choices may be selected.
  • The Data Field(s) this annotation references must use an OMR extractor to return results: Labeled OMR, Ordered OMR, or Zonal OMR
  • However, this extractor may use any of the OMR Modes (CheckOne, CheckMulti or Boolean).
  • PDF Data Mapping will insert a simple checkbox PDF element for each checkbox the extractor locates.


In this example, we will create a Checkbox Widget for the checkboxes extracted using the "Checklist" Data Field. This is a Labeled OMR extractor that uses the CheckMulti Mode, indicating one of any number of checkboxes may be checked for each label. Checked or not, the Checkbox Widget will insert a checkbox element into the generated PDF.

Before Annotation

After Annotation

With a Checkbox Widget added to the Annotations list:

  1. Use the Fields property to select the Data Field you wish use to insert the group of radio buttons.
    • Press the ellipsis button at the end of the Fields property.
  2. In the window that pops up, mark the checkbox next to the Data Field you wish to select.
    • In our case, we are choosing the "Checklist" Data Field.
    • Be aware, this fields must (1) use an OMR extractor to return results (2) have already been extracted by the Extract activity and (3) have located checkboxes during extraction or no checkboxes will be placed.
  3. Press "OK" when finished.


  1. Determine if you need to adjust the annotation's padding. Adjust the Padding property if you do.
  2. Determine if you need to adjust if the annotation is editable or printable. Adjust the Allow Edit or Print properties if you do.
    • Use the defaults to prevent the users from being able to adjust the widget and prevent it from being visible when printed.
    • Please note: Allow Edit refers to a reader's ability to edit the widget as a PDF element, such as moving its location on the PDF or adjusting its size. It does not refer to a reader's ability to interact with the widget (check the checkboxes).
  3. Press "OK" when finished (or continue adding more Annotations).

BE AWARE: The Checkbox Widget overlays checkboxes on a page's image. Any printed checkbox on the original page will persist (behind the widget), unless removed by the Image Processing activity.

For more information, see above.

Signature Widget

The Signature Widget inserts a signature block into the PDF.

  • Signature blocks allow PDFs to capture digital signatures. This allows you to create a document that can be digitally signed straight from Grooper on export.
  • The Data Field(s) this annotation references will typically use a zonal extractor to define where the signature block should be: Detect Signature or Highlight Zone most commonly
  • Other extractor types may work, but these are most typical. PDF Data Mapping will insert the signature block using the geometric boundaries of the extraction instance. Zonal extractors are well suited to define fixed boundaries of extraction results.


In this example, we will create a Signature Widget annotation for the signature line on the application form, using the "Signature" Data Field of our Data Model. The Signature Widget will insert an interactable signature element into the generated PDF.

Before Annotation

After Annotation

With a Signature Widget added to the Annotations list:

  1. Use the Fields property to select the Data Field you wish use to insert the signature block.
    • Press the ellipsis button at the end of the Fields property.
  2. In the window that pops up, mark the checkbox next to the Data Field you wish to select.
    • In our case, we are choosing the "Signature" Data Field.
    • Be aware, this fields must (1) have already been extracted by the Extract activity and (2) have drawn a zone defining the location and size of the signature block (Most commonly, Detect Signature or Highlight Zone is used to do this).
  3. Press "OK" when finished.


  1. Determine if you need to adjust the annotation's padding. Adjust the Padding property if you do.
  2. Determine if you need to adjust if the annotation is editable or printable. Adjust the Allow Edit or Print properties if you do.
    • Use the defaults to prevent the users from being able to adjust the widget and prevent it from being visible when printed.
    • Please note: Allow Edit refers to a reader's ability to edit the widget as a PDF element, such as moving its location on the PDF or adjusting its size. It does not refer to a reader's ability to interact with the element (submit a signature).
  3. Press "OK" when finished (or continue adding more Annotations).

BE AWARE: The Signature Widget overlays a signature block on a page's image. If present, any printed signature on the original page will persist (behind the widget), unless removed by the Image Processing activity.

For more information, see above.

Textbox Widget

The Textbox Widget inserts text-editable form fields into the generated PDF.

  • Form fields allow PDFs to collect and store data entered by a user.
  • Users can configure a Textbox Widget to create blank form fields or form fields with a value Grooper extracts already populated.
    • For blank form fields, the Data Field(s) this annotation references should use Highlight Zone to place a blank zone where the field should be inserted.
    • For populated form fields, the Data Field(s) this annotation references can use any extractor type that returns a single-instance value (most typically Labeled Value).
      • This allows Grooper to not only generate a PDF with form fields where they weren't present in the source document, but prefill them with data Grooper collects.
  • Be aware, a Textbox Widget differs from a Text Annotation. Where Textbox Widget will insert a text-editable form field, Text Annotation adds a text comment to to PDF.

Before Annotation

After Annotation

In this example, we will use the Textbox Widget to insert a form field for the "Signature Date" Data Field. This used Labeled Value to extract the date. PDF Data Mapping will overlay the form field on top of the extraction result.

  • FYI: We will also adjust the generated widget's size using the Padding property. This is common when configuring Textbox Widgets when the font size you want to use for the form field is larger than the printed typeface on the document.


With a Textbox Widget added to the Annotations list:

  1. Use the Fields property to select the Data Field(s) you wish to use to create text-editable form fields.
    • Press the ellipsis button at the end of the Fields property.
  2. In the window that pops up, mark the checkboxes next to the Data Field(s) you wish to select.
    • In our case, we are choosing the "Signature Date" Data Field.
    • Be aware, these fields must be extracted by the Extract activity or no textbox will be generated.
  3. Press "OK" when finished.


  1. Determine if you need to adjust the annotation's padding. Adjust the Padding property if you do.
    • Adjusting Padding for Textbox Widgets is common if the desired font size in the textbox differs from that printed on the source document. In this example, we increased the textbox's size by 0.1 in on each side.
  2. Determine if you need to adjust if the annotation is editable or printable. Adjust the Allow Edit or Print properties if you do.
    • Use the defaults to prevent the users from being able to adjust the widget and prevent it from being visible when printed.
    • Please note: Allow Edit refers to a reader's ability to edit the widget as a PDF element, such as moving its location on the PDF or adjusting its size. It does not refer to a reader's ability to edit the value inside the textbox. To configure that, use the Read Only property.


  1. Adjust the textbox's other properties as desired.
    • These properties give you the ability to adjust the font and font size inside the textbox.
    • Please note: If you want to prevent a reader from editing the Grooper collected value inside the textbox, turn Read Only to True.
  2. Press "OK" when finished (or continue adding more Annotations).

Text Annotation

The Text Annotation inserts a text comment in the PDF.

  • This has two primary uses:
    • Insert comments into the PDF that are viewable when opening the PDF in a PDF viewer, but not printable.
    • Print a simple text note on a page.
      • Commonly, users will want to print a word like "PROCESSED" on the output PDF. This notes the document has been processed through Grooper.
  • The Data Field(s) may use any kind of extractor as long as it produces a result with (1) a location on the page to place the comment and (2) a text value to add to the comment.
  • Be aware, a Textbox Widget differs from a Text Annotation. Where Textbox Widget will insert a text-editable form field, Text Annotation adds a text comment to to PDF.


In this example, we will use a Text Annotation to print the word "PROCESSED" on the first page of the PDF generated by PDF Data Mapping.

  • We will use the "IsProcessed" Data Field to do this. The extraction logic to make this happen requires a less-than-common technique. We will show you how we build this Data Field in the #Technique: "IsProcessed" Data Field section.

Before Annotation

After Annotation

With a Text Annotation added to the Annotations list:

  1. Use the Fields property to select the Data Field(s) you wish to use to insert the text comment.
    • Press the ellipsis button at the end of the Fields property.
  2. In the window that pops up, mark the checkboxes next to the Data Fields you wish to select.
    • In our case, we are choosing the "IsProcessed" Data Fields.
    • Be aware, these fields must (1) be extracted by the Extract activity and (2) hold a location and value or no comment will be added.
  3. Press "OK" when finished.


  1. Determine if you need to adjust the annotation's padding. Adjust the Padding property if you do.
  2. Determine if you need to adjust if the annotation is editable or printable. Adjust the Allow Edit or Print properties if you do.
    • Use the defaults to prevent the users from being able to adjust the annotation and prevent it from being visible when printed.
    • In our case, we do want this comment printed when the document is printed. So, we've changed Print to True.


  1. Adjust the comment's appearance, as desired, using the Appearance properties.
    • Users may change the comment's font and font size with the Font Name and Font Size properties.
    • Users may select a Fill Color and Text Color in one of two ways:
      • Using the the dropdown to select from a list of system colors
      • Or, entering an RGB value using the format #, #, #
      • Be aware, there is no true "transparent" Fill Color option. The selectable Transparent option is a system color that equates to "white".
  2. Press "OK" when finished (or continue adding more Annotations).

Technique: "IsProcessed" Data Field

To print the word "PROCESSED" on the PDF, we used a specific technique. A Text Annotation just needs two things from a Data Field to insert the annotation: (1) a location on the page to place the comment and (2) a text value to add to the comment. The word "PROCESSED" did not exist on the source PDF. So, we had to figure out a way to use a Data Field to generate a result rather than extract it.

We did this in essentially two steps:

  1. Use the Highlight Zone extractor to define where the annotation should be printed.
  2. Use a Calculated Value to define the text we want to print (the word "PROCESSED").


This gives a Text Annotation everything it needs to insert the comment: (1) A location and (2) some text

How To: Configure Bookmarks

Bookmarks in PDFs aid readers when navigating through multipage documents. PDF Data Mapping can insert bookmarks into the generated PDF to take advantage of this functionality. This can be done in one of two ways (or both):

  1. Using a document folder's (Batch Folder) child folders (Batch Folder).
  2. Using a document folder's extracted Data Fields.

In this tutorial we take an application packet separated into component child documents and use PDF Data Mapping's Bookmarking property to create bookmarks for each one.

The application packet as a whole consists of five separate and distinguishable documents.

  1. The application itself (and a coversheet)
  2. A proposal summary
  3. The student's resume
  4. A letter of recommendation
  5. An essay

Our goal is to create a bookmark in the generated PDF file for each of these component documents (child documents).

Rather than exporting five separate PDF files for each component document, we will export a single PDF for the whole packet with navigable bookmarks.


We we also demonstrate how to use Data Fields for bookmarking. This allows us to insert PDF bookmarks for locations of extracted data.

  • The "Signature" bookmark in this example would take the reader to the signature line of the PDF, using the location extracted by the "Signature" Data Field in our Data Model.

Prereqs: Bookmarking Option 1 - Separated child documents

For PDF Data Mapping to work, Grooper needs to have data to map.

If enabled, Bookmarking will automatically add bookmarks to a PDF if a document has child documents in the Batch's folder hierarchy.

  • If a document at folder level 1 is exported and has two child documents, the generated PDF will have two bookmarks in the generated PDF.
  • Clicking on the bookmark will take the reader to that child document's page in the PDF.


For this to work:

  • The parent document folder must have separable child pages.
    • Either from scanning pages in with a scanner or using the Split Pages activity to generate pages from an imported PDF.
  • These child pages must then be separated into child folders.
    • Either using a Separation Profile when scanning or using the Separate activity.


Technically speaking, that's all you need. PDF Data Mapping will add PDF bookmarks for every child document and name it using each child folder's name.

  • Be aware, without classifying the child documents these names will just be "Folder (1)" "Folder (2)" "Folder (3)" and so on.

Not separated

No child folders

Separated

Has child folders

Separated and classified

Has child document folders.

What about Page 1 there?

Is it in a folder? No. Then it won't get a bookmark.

Prereqs: Bookmarking Option 2 - Data Fields and extracted data

For PDF Data Mapping to work, Grooper needs to have data to map.

Bookmarking can also insert PDF bookmarks using locations of extracted data. Data Fields collect results using extractors which return results from the source document. 'Bookmarking will use these results locations to embed this kind of bookmark.

For this to work:

  • You must have these Data Fields defined in a Data Model and configured to return results.
  • The Extract activity must run before Merge or Export generates the PDF.

Adding bookmarks for child documents/folders

PDF Data Mapping will create bookmarks for child documents/folders by default. There is no configuration required besides enabling the Bookmarking property.

With a PDF Data Mapping behavior added to a Content Type:

  1. Select the PDF Data Mapping behavior in the Behaviors editor.
  2. Change the Bookmarking property to Enabled.
  3. Press "OK" when finished.


That's it! It's that simple!

As long as the document folder PDF Data Mapping is applied to has child documents/folders, bookmarks will be created for each child document.


Adding bookmarks for Data Fields

How To: Configure Metadata

About

The PDF Data Mapping behavior has the ability to create and insert additional metadata into the generated PDF as well, using information collected during Grooper's document processing. The metadata you are able to create falls into one of three categories:

  1. Editing the PDF's default metadata fields.
    • This includes the following metadata fields that are standard to every PDF file:
      • Title
      • Author
      • Subject
      • Created Date
      • Modified Date
      • Application (Used to establish the "creator" application which created the original file. This can be useful if the original file was created in a different application, like Microsoft Word, and converted to a PDF format by Grooper with a PDF Data Mapping behavior.)
  2. Creating custom metadata fields
    • This is done using extracted Data Field values collected during the Extract activity.
  3. Adding "Keywords" to the PDF metadata
    • This can be done using expression based or extraction based methods.

Notice what's not included in this list is the exported document's filename (e.g. "Im_a_file.pdf"). Filename mappings are always configured using an Export Behavior.


Prereqs - Data Extraction

If we're going to insert some metadata into these PDFs, that data has to come from somewhere. In broad terms, the metadata creation is done in one of two ways (or a combination of the two):

  1. Using expression based creation
    • In the case of the default PDF metadata fields and keywords, expressions can be used to populate the metadata. This gives you access to system data, classification information, extracted Data Field results, and various .NET functions to manipulate it.
  2. Using Data Field results
    • In the case of the custom PDF metadata, the custom fields are generated from Data Fields in the document's Data Model and their collected results from the Extract activity.
    • This means the document must be processed by the Extract activity in order to create and populate these custom fields.

Add the Behavior and Enable Metadata

Metadata is one of the configuration options for the PDF Data Mapping behavior. A Content Type Behavior can tell an activity (specifically the Export activity, in the case of PDF Data Mapping) how to use the Content Type to do something (how to use the Content Model's collected Data Fields and other information to edit the generated PDF's metadata, in this case).

  1. All Behaviors are added to a Content Type object.
    • We will add the PDF Data Mapping behavior to this Content Model named "PDF Data Mapping - UNESCO Packet".
  2. All Behaviors are added using the Behaviors property. Select the Behaviors property and press the ellipsis button at the end to add the PDF Data Mapping behavior.
  3. In the Behaviors editor window that pops up, click the "+" button to add a Behavior.
  4. Choose PDF Data Mapping from the list.

  1. Once added, you will see PDF Data Mapping added to the list on the left. Select it.
  2. To enable the metadata functionality, in the right panel, click the checkbox next to the Metadata property.

Edit Default PDF Metadata

Once enabled, the first six Metadata sub-properties all pertain to the default PDF metadata fields Grooper can edit: Title, Author, Subject, Creation Date, Modified Date, and Creator

These are edited with code expressions.

  1. The Title property corresponds to the PDF's "Title" field.
    • By default, this expression is set to CurrentDocument.ContentTypeName
      • This will make the title whatever the document's Document Type classification is.
      • In our case, these document folders are assigned the "UNESCO Application Packet" Document Type of our Content Model.
  2. The Author property corresponds to the PDF's "Author" field.
    • By default, this expression is set to LDAP.CurrentUserDisplayName
      • This will make the author the display name of whatever user is logged into the machine exporting the documents.
    • We've changed this to Candidate
      • This will make the author the result of the "Candidate" Data Field (which is "Dog O Doggerson" for our example document).
  3. The Creator property corresponds to the PDF's "Application" field.
    • This field is intended to be used when generating PDFs from different file types. For example, if the file was originally a Microsoft Word document, you might enter "Microsoft Word" to fill this field.
    • This field is blank by default, and we have left it so.
  4. The Subject property corresponds to the PDF's "Subject" field.
    • This field is blank by default.
    • We've decided to populate this field with the extracted proposal title, using the results of the "Title of Proposal" Data Field and the expression Title_of_Proposal
      • Note: Spaces in Data Fields must be replaced with underscores in expressions.
  5. The Creation Date' and Modification Date properties correspond to the PDF's "Created" and "Modified" fields.
    • By default, these both use the expression DateTime.Now
      • This will return the current system time of your machine at the time of export.

  1. When we open the document in Adobe Acrobat and view these fields using the "Document Properties" window, you can see the metadata this configuration generated for the PDF.

Add Keywords

Grooper can add keywords into the PDF's "Keywords" field in one of two ways, either using an expression or a referenced extractor's results.

In our case, we're going to use an expression to determine if the word count of the "Essay" document in the application packet is "Long", "Short", or "Normal".

  1. We will use the results of the "Essay Word Count" Data Field of our Data Model to do this.
  2. This Data Field's extraction is configured to count the number of words in the essay.

If the word count is above 600 words, we'll call that a long essay. If it's below 400 words, we'll call that a short essay. And if it's anywhere in between, we'll call it a normal essay.

The expression below uses a series of nested conditional statements using the IIf() function to accomplish this.

IIf(Essay_Information.Essay_Word_Count > 600, "Long Essay", IIf(Essay_Information.Essay_Word_Count > 400, "Normal Essay", "Short Essay"))

If the result is greater than 600 the keyword will evaluate to "Long Essay". Otherwise, if the result is less than 400, the keyword will evaluated to "Short Essay". If neither condition is met, the keyword evaluates to "Normal Essay".

To use this expression to add the keyword to the generated PDF's metadata, we will configure the Keywords property.

  1. In the Metadata sub-properties, select the Keywords property and click the ellipsis button at the end.
  2. In the expression editor that pops up, enter the expression you wish to use create the keywords.
    • As is the case with any expression editor, Grooper's IntelliSense code completion will aid you when writing your code expressions.
  3. Click "OK" when finished.

  1. When we open the generated PDF in Adobe Acrobat and view the "Document Properties" window, you can see the metadata this configuration generated for the PDF.
    • The keyword "Normal Essay" has been added to the keywords list.
    • The extracted value for the "Essay Word Count" field was 485, which is less than 600 and greater than 400. Evaluated by our Keywords expression, that returns a value of "Normal Essay".

Add Custom Metadata

Last but not least, you can add custom metadata fields to the generated PDF using extraction results from the document's Data Model. A custom metadata field is generated for every Data Field you choose in the Content Type's Data Model.

  1. Remember, we add Behaviors to Content Types (Typically a Content Model or a Document Type). In this case we're adding the PDF Data Mapping behavior to the Content Model
  2. Content Models and Document Types can have their own Data Models as one of their children. Configuring PDF Data Mapping on the Content Model, we will utilize its Data Model to export this custom metadata.
  3. This Data Model is configured with several Data Fields. These Data Fields will collect information about the "UNESCO Application Packet" and its component documents, such as the applicant's name and information about the proposal.
    • This will be done during the Extract activity. Once collected, PDF Data Mapping can insert the results into the generated PDF, creating one custom metadata field and corresponding result for each Data Field and its extracted result.

To do this, we will use the Export Data Fields option of PDF Data Mapping's Metadata properties.

  1. In the Metadata sub-properties, click the check box next to the Export Data Fields property to change it from False to True
  2. By default, once you enable this property, Grooper will export all available Data Fields to the Content Type on which PDF Data Mapping is configured.
    • You can be more selective about what you want to include using the Field Filter property.
    • This will give you a drop down list of all the Data Field nodes available for custom PDF metadata creation. You can check the box next to which ones you wish to include, leaving those Data Fields you wish to exclude unchecked.

  1. When we open the generated PDF in Adobe Acrobat and view the "Document Properties" window, you can see the custom metadata generated in the "Custom" tab.
  2. The Data Fields' names show up in the "Names" column.
    • Note: Data Fields in Data Sections will have their names appended to the Data Section's name. For example the "Proposal Title" Data Field in the "Proposal Information" Data Section's name translates to "Proposal_Information.Proposal_Title".
  3. The Data Field's result, collected by the Extract activity show up in the "Value" column.

Be aware the PDF file format has metadata fields already named "Title", "Author", "Subject", "Keywords", "Creator", "Producer", "CreationDate", "ModDate" and "Trapped".

You may run into an issue upon export if you have Data Fields in your Data Model who share one of these names. If using the Metadata creation capabilities of PDF Data Mapping, consider these names "taken" and adjust the name of the Data Field to be something different. For example, in this case a Data Field returning the title of the proposal listed on the application was changed from "Title" to "Title of Proposal"

  1. You can also access this data using the "Additional Metadata..." button in the "Description" tab.
  2. Select the "Advanced" item.
  3. You'll see all the generated custom metadata listed under the "http://ns.adobe.com/pdfx/1.3/" node.

How To: Generate the PDF using Merge or Export