PDF Data Mapping (Behavior)

From Grooper Wiki
(Redirected from PDF Data Mapping)

This article was migrated from an older version and has not been updated for the current version of Grooper.

This tag will be removed upon article review and update.

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 2023.120232021

PDF Data Mapping is a Content Type Behavior designed to enhance PDF files generated by the Merge or Export activities with metadata, bookmarks, annotations and/or different kinds of widgets.

PDF Data Mapping builds a data rich "Smart PDF" from a document folder's content. Classification results, extracted data, and more can be used to insert native PDF elements into the generated PDF.

PDF elements that can be mapped from Grooper generated results include:

  • Bookmarks
  • Metadata
  • PDF Annotations (such as text highlighting, checkbox widgets and signature widgets)

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). The first contains a Project with resources used in examples throughout this article. The second contains one or more Batches of sample documents.

About

The PDF Data Mapping behavior allows Grooper users to more fully leverage the capabilities of the PDF file type. The standard PDF Export Format (and Merge Format) in Grooper will use the page image files and their text data to create a multipage PDF file for each document folder upon Export (or Merge).

However, this is just the "display information" required to open and read the document. There's a lot more to what a PDF can be than just a multipage document with page images and machine readable text. PDF content can also include metadata, keywords, bookmarks, annotations, and more!

PDF Data Mapping expands Grooper's standard PDF generation capabilities. It creates an exportable PDF file that includes additional content available to the PDF file type. PDF Data Mapping merges data collected by Grooper into the PDF by mapping these values to native PDF elements like bookmarks and annotations.

The expanded PDF Data Mapping functionality can be divided into three categories:

  • Annotations: Highlight important text, insert comments, and embed interactive widgets like editable form fields and checkboxes.
  • Bookmarks: Organize complex documents with bookmarks linking to child documents and/or extracted Data Fields.
  • Metadata: Alter the PDFs default metadata, add searchable keywords and export custom metadata using data collected by Grooper.

Annotations

Annotations are native PDF elements used to highlight and comment text in a PDF file. For PDF Data Mapping, "annotations" also refer to interactable "widgets" such as checkbox and text form fields. The Annotations functionality allows you to embed many of these native PDF annotations and widgets into Grooper generated PDFs.

Annotations can serve many purposes:

  • Annotations can increase the readability, such using a highlight annotation to call out important information.
  • Annotations can add components for the reader to interact with the document, such as checkboxes and signature widgets.


PDF Data Mapping can add the following kinds of annotations/widgets:

  1. Highlighting
  2. Radio group buttons
  3. Checkboxes
  4. Signature boxes
  5. Editable text boxes


Grooper uses information from Data Elements in a Data Model collected during the Extract activity to add these annotations.

  • For example, if Grooper extracts a "Name" field and you want that highlighted on the output PDF, you can use the "Highlight Annotation" to highlight the name Grooper extracted on the document.

FYI

The size of all these annotations can also be adjusted using a Padding property if the size of the extracted data instance is too small for your needs.

Bookmarks

Bookmarks provide easy navigation for multipage PDF documents. PDF Data Mapping can generate bookmarks in one of two ways:

  1. Bookmarks can be generated for extracted Data Field locations.
  2. When exporting a document folder that has child document folders, bookmarks can be generated for each "sub-document".
    • This is the default bookmarking behavior and requires no configuration. Bookmarks will be named however the child document folders are named.


In this example, this document is an application packet for a study abroad program. It has both kinds of bookmarks.

  • The "Signature" bookmark is from an extracted Data Field. It will take the reader to a signature location on the PDF.
  • The rest were generated for each child document in the document folder (Batch Folder) that was exported. PDF Data Mapping inserted a bookmark for each sub-document. The selected "Resume (4)" bookmark in the image took the reader to the resume page in the PDF.

FYI

Bookmarks generated for child document folders will be named whatever the documents are named.

  • A document folder's (Batch Folder) name defaults to its classified Document Type and document number. Here, "Application (2)", "Proposal Summary (3)", "Resume (3)", and so on.
  • A document folder's name can be changed if you edit the Document Type's Caption property. This will then change the bookmarks name.
    • Be aware, the document must be extracted for the Caption to be applied and its name changed.

Metadata

Metadata refers to a PDF file's content beyond the information required to display the document (the page images and encoded text data). Prior to implementing the PDF Data Mapping functionality, Grooper only had access to edit minimal PDF metadata upon export (notably the PDF's file name).

PDF Data Mapping allows Grooper to alter and store additional metadata, including:

  1. The PDF's default metadata fields, including its "Title", "Author", "Subject" and more.
  2. Keywords
  3. Custom metadata fields
    • Custom metadata allows Grooper to embed any single instance Data Field's value directly to the PDF.


This gives Grooper a mechanism to create a viewable document with all extracted (single instance) data associated with the document itself, independent of that data being stored elsewhere (such as a database table or content management system).

FYI

This metadata can be accessed in Adobe Acrobat by opening the "Document Properties" window from the File menu.

Be aware the PDF file format has metadata fields already named "Title", "Author", "Subject", "Keywords", "Creator", "Producer", "CreationDate", "ModDate" and "Trapped".

  • Consider these names reserved.
  • If you are attempting to export Data Field values as custom PDF metadata, they cannot share any reserved names. You will need to rename the Data Field in Grooper to a unique name.

How To: Add a PDF Data Mapping Behavior

Like all Behaviors, PDF Data Mapping is configured on a Content Type node, commonly a Content Model or a Document Type.


  1. Here, we have selected a Content Model in the Node Tree.
  2. To add a Behavior, select the Behaviors property and click the ellipsis button at the end.
  3. This will bring up a dialogue window to add various behaviors to the Content Model, including PDF Data Mapping.
  4. Add PDF Data Mapping to the list by clicking on the "+" button.
  5. Select PDF Data Mapping from the listed options.


  1. Once added, you will see a PDF Data Mapping item added to the Behaviors list.
  2. Selecting this Behavior, you will see property options to configure PDF creation.
  3. Press "OK" when finished configuring PDF Data Mapping.
  4. Don't forget to save changes to the Content Model.

About the documents used in these tutorials

The following tutorials use a mock UNESCO Laura W. Bush Traveling Fellowship application to detail a more specific set up for a PDF Data Mapping. This is a packet of documents from a single applicant containing a cover page and five different kinds of documents.

By the end of this tutorial we will have taken a source application packet, used Grooper to process it, and exported a single PDF with:

  • Metadata collected from Grooper
  • New annotations and widgets
  • Easily navigable bookmarks

Cover Page and Application

This is an application for a traveling abroad scholarship.

Primarily, the cover page and application document will allow us to demonstrate the annotations and widgets PDF Data Mapping can generate. We will use its Annotations settings to add the following annotations:

  • Text Annotation
  • Highlight Annotation
  • Checkbox Widget
  • Radio Group Widget
  • Signature Widget
  • Textbox Widget

Secondarily, we will also use data collected from this form will be used to generate and store default and custom metadata. We will use the Metadata settings to do this.

Lastly, we will embed a bookmark that will take the PDF's reader to the signature field on the document. We will use the Bookmarking settings to do this.

Essay

This application also includes an essay from the student.

This document will demonstrate how to add Keywords to the PDF's metadata. Using the Metadata settings we will configure a code expression to insert "long essay", "normal essay", or "short essay" depending on the essay's length.

Other Documents

This packet contains three other kinds of documents as well:

  • a proposal summary
  • the applicant's resume
  • and a letter of recommendation.

For these documents (as well as the rest) we will insert bookmarks into the generated PDF, taking the reader to each document in the larger file. We will use Bookmarking settings to do this.

Notes on PDF Data Mapping, child documents and bookmarking

The original document was imported as a single document into Grooper. We have separated it into child documents which will allow us to insert bookmarks for each separated document.

  1. The PDF Generation Behavior will be applied to the Batch Folders at folder-level one.
    • The attached file is the source application packet.
  2. The Split Pages activity was applied to split the packet into pages. Then, those pages were separated into classified document folders at folder-level two.
  3. PDF Data Mapping can create a bookmark in the generated PDF for each of these five sub documents by enabling the Bookmarking property.


By creating bookmarks for each child document, there is no need to export individual PDFs for each one. Instead, we will use PDF Data Mapping to generate one PDF for the whole application packet as use the bookmarks to navigate between each document.

How To: Configure Annotations

Annotations are native PDF elements used to highlight and comment text in a PDF file. For PDF Data Mapping "annotations" also refer to interactable "widgets" such as checkbox and text form fields. In this tutorial we will configure at least one example of each Annotation option. In this tutorial we will configure at least one example of each Annotation option.

  • Text Annotation - Inserts a text-based comment in the PDF.
  • Highlight Annotation - Highlights text on the PDF.
  • Radio Group Widget - Inserts a group of selectable radio buttons in the PDF.
  • Checkbox Widget - Inserts checkable checkboxes in the PDF.
  • Signature Widget - Inserts a signature block in the PDF.
  • Textbox Widget - Inserts an editable form field in the PDF.

BE AWARE: PDF Data Mapping cannot insert annotations on PDF pages with form fields.

If a PDF page is form-fillable, it is ill advised to insert annotations and widgets on top of these form fields. This can result in a corrupted PDF when it is generated by Merge or Export. PDF Data Mapping will not allow you to insert annotations and widgets on PDF pages with form fields.

Prereqs: Data Fields and extracted data

For PDF Data Mapping to work, Grooper needs to have data to map.

  • For Annotations this means Data Fields.
  • Data must be saved for each Data Field prior to the PDF being generated.
    • The Extract activity must run before Merge or Export generates the PDF.
    • If performing user assisted data review, the Review activity must complete before Merge or Export generates the PDF.


Each of the Annotation Types references a Data Field in a Data Model as part of their configuration. If the Data Field does not collect data during the Extract activity, the PDF Data Mapping won't know where to place the annotation.

About the Data Model used for this tutorial

The Data Model we're working with has several Data Fields that will allow PDF Data Mapping to place annotations and widgets.

The "Last Name" "First Name" and "Middle Initial" Data Fields (in the "Applicant Information" Data Section) will demonstrate the Highlight Annotation

  • These fields use Labeled Value to extract field values next to a label.
  • Be aware, nearly any extractor type can be used to insert a highlight annotation. Grooper just needs a location on the document to draw the highlight boundaries.

The "US Citizen" Data Field will demonstrate the Radio Group Widget.

  • This field uses Labeled OMR to extract a group of checkboxes where only one may be checked.
  • Be aware, any OMR extractor (Labeled OMR, Ordered OMR or Zonal OMR) would be able insert the radio group widget as long as its Check Mode is set to CheckOne.

The "Checklist" Data Field will demonstrate the Checkbox Widget.

  • This field uses Labeled OMR to extract a group of checkboxes where one or more may be checked.
  • Be aware, any OMR extractor (Labeled OMR, Ordered OMR or Zonal OMR) would be able insert the checkbox widget.

The "Signature" Data Field will demonstrate the Signature Widget.

  • This field uses Detect Signature to detect whether or not a signature is present on the document.
  • Be aware, any zonal extractor (Read Zone, Highlight Zone or Detect Signature) would be able insert the signature widget.

The "Signature Date" Data Field will demonstrate the Textbox Widget.

  • Textbox Widget adds a text-editable form field to the PDF to store a field value.
    • Compare this to a Text Annotation which simply adds a text comment to the PDF.
  • This field uses Labeled Value to extract the date the application was signed.
  • Be aware, any zonal extractor (Read Zone, Highlight Zone or Detect Signature) would be able insert the signature widget.

The "IsProcessed" Data Field will demonstrate the Text Annotation.

  • Text Annotation inserts a text comment in the PDF.
    • Compare this to a Textbox Widget which adds an actual form field to the PDF to store a field value.
    • We will use this field and annotation to print the word "PROCESSED" on the output PDF
  • This field uses Highlight Zone to draw an extraction zone for the field and the Data Field's Default Value to determine what's printed.
    • This is a technique common to Text Annotation use cases and will be explained in further depth below.

Adding Annotations

PDF Data Mapping inserts various types of PDF annotations and widgets by configuring its Annotations property. Users can add one or more Annotation Types to the Annotations list. Adding a new Annotation to the list is simple.

With a PDF Data Mapping behavior added to a Content Type:

  1. Select the PDF Data Mapping behavior in the Behaviors editor.
  2. Select the Annotations property and press the ellipsis button at the end.
  3. This will bring up the Annotations editor.
  4. Press the "+" button.
  5. Select the Annotation Type you want to add from the dropdown list.


  1. Once added, you will see the Annotation Type added to the Annotations list.
  2. All Annotation Types will have a set of General properties to configure.
  3. Some Annotation Types have additional properties you can configure.
    • For example, the Highlight Annotation has Appearance properties you can configure to adjust the highlight's color and other appearance properties.
  4. Press "OK" when finished.

Notes on shared properties

All Annotation Types share a set of General properties.

  • Fields
    • Select Data Fields to map the Data Fields to the PDF annotation with this property.
    • The Fields property is required.
      • One or more Data Field must be selected to generate the annotation.
      • If you don't select any Data Fields or the selected Data Fields are not extracted, PDF Data Mapping will not insert an annotation in the output PDF.
      • Be aware, all Data Fields are selected by default.
  • Padding
    • The Padding property can adjust the size of the annotation.
    • Grooper uses a Data Field's result instance to draw the annotation's boundaries.
      • The size of the Data Field's instance may be too small for what you want to appear on the output PDF.
      • If so, use Padding to increase the annotation's size on the PDF generated by PDF Data Mapping.
  • Allow Edit
    • Allow Edit refers to a reader's ability to edit the annotation as a PDF element, such as moving its location on the PDF or adjusting its size. It does not refer to a reader's ability to interact with the annotation (or widget).
    • Enabling this property (turning it True) will allow users to fully adjust the annotation in the PDF, including its size, location and other properties.
    • Be aware, even when False, users will still be able to interact with widgets, such as the Checkbox Widget or Textbox Widget.
  • Print
    • In a PDF viewing application, like Adobe Acrobat, all annotations and widgets PDF Data Mapping generates will be visible. The Print property determines whether or not the annotation is visible when the PDF is printed.
    • Be aware, the default is False.
      • Grooper presumes you will open the "Smart PDF" output by PDF Data Mapping will be opened in a PDF viewer (where all annotations will be visible).
      • Grooper also presumes if you want to print the PDF, you want something more like the original document printed, not the one with additional PDF elements Grooper inserts. If you do want those annotations and widgets visible when the PDF is printed, turn Print to True.

Annotation Types

There are currently six types of annotations Grooper can add to the PDF it creates:

Highlight Annotation

The Highlight Annotation overlays a colored rectangle with adjustable transparency on a Data Field's extracted location. In other words, it can highlight extraction results.

  • Use this to highlight important values extracted from Grooper.
  • Like all Annotations, this highlight can be printable or not. When the Print property is False, the highlight will show up when viewed in a PDF viewer but not if the PDF is printed.


In this example, we will use the Highlight Annotation to highlight the extracted "Last Name", "First Name" and "Middle Initial" fields from the application form. To configure this Annotation we will:

  • Select the Data Fields we wish to highlight.
  • Adjust how we want the highlight to look.

Before Annotation

After Annotation

With a Highlight Annotation added to the Annotations list:

  1. Use the Fields property to select the Data Fields you wish to highlight.
    • Press the ellipsis button at the end of the Fields property.
  2. In the window that pops up, mark the checkboxes next to the Data Fields you wish to highlight.
    • In our case, we are choosing the "Last Name", "First Name", and "Middle Initial" Data Fields.
    • Be aware, these fields must be extracted by the Extract activity or nothing will be highlighted.
  3. Press "OK" when finished.


  1. Determine if you need to adjust the annotation's padding. Adjust the Padding property if you do.
    • Adjusting Padding for Highlight Annotations is common. In this example, we increased the highlights size by 0.1 in on each side.
  2. Determine if you need to adjust if the annotation is editable or printable. Adjust the Allow Edit or Print properties if you do.
    • Use the defaults to prevent the users from being able to adjust the annotation and prevent it from being visible when printed.


  1. Adjust the highlight's appearance, as desired, using the Appearance properties.
  2. Most commonly, users will adjust the Fill Color.
    • Use the dropdown to select from a list of system colors.
    • Or, enter an RGB value using the format #, #, #
    • This property defaults to the "Grooper green" highlight seen in Review's Data View. In this example, we've changed it to Yellow.
  3. Press "OK" when finished (or continue adding more Annotations).

Radio Group Widget

The Radio Group Widget overlays a group of radio button PDF elements on top of where a Grooper extractor finds OMR checkboxes on a document.

  • Radio buttons are common PDF elements used to indicate a single choice from multiple options in a list.
    • Note radio buttons (inserted by Radio Group Widget) differ from checkboxes (inserted by Checkbox Widget). For radio buttons, only one choice out of a group may be selected. For checkboxes, any number of choices may be selected.
  • The Data Field(s) this annotation references must use an OMR extractor to return results: Labeled OMR, Ordered OMR or Zonal OMR
    • This extractor must also have its Mode set to CheckOne (Only one box out of many may checked/selected).
  • PDF Data Mapping will insert one radio button for each checkbox the extractor locates.

Before Annotation

After Annotation

With a Radio Group Widget added to the Annotations list:

  1. Use the Fields property to select the Data Field you wish use to insert the group of radio buttons.
    • Press the ellipsis button at the end of the Fields property.
  2. In the window that pops up, mark the checkbox next to the Data Field you wish to select.
    • In our case, we are choosing the "US Citizen" Data Field.
    • Be aware, this fields must (1) use an OMR extractor to return results (2) with its Mode set to CheckOne (3) have already been extracted by the Extract activity and (4) have located checkboxes during extraction or no radio buttons will be placed.
  3. Press "OK" when finished.


  1. Determine if you need to adjust the annotation's padding. Adjust the Padding property if you do.
  2. Determine if you need to adjust if the annotation is editable or printable. Adjust the Allow Edit or Print properties if you do.
    • Use the defaults to prevent the users from being able to adjust the widget and prevent it from being visible when printed.
    • Please note: Allow Edit refers to a reader's ability to edit the widget as a PDF element, such as moving its location on the PDF or adjusting its size. It does not refer to a reader's ability to interact with the widget (press a radio button).
  3. Press "OK" when finished (or continue adding more Annotations).

Be Aware: Annotations are overlaid on a page's image

BE AWARE: The Radio Group Widget overlays radio buttons on a page's image. Any printed checkbox on the original page will persist (behind the widget), unless removed by the Image Processing activity.

  • Notice the original image for this document used checkboxes, not radio buttons. We see an "X" inside of a square box.

You can actually see the edges of the square box persist in the generated PDF (Here, highlighted in yellow for your viewing pleasure).

  • In this case, the boxes were detected by the "detection only" Box Detection IP command and not removed by the "detection and removal" Box Removal command.
  • Box Detection finds and store the checkbox locations and check states but does not actually alter the image in any way.

Maybe you care about this, and maybe you don't. If you do, use Box Removal instead.

  • Box Removal will also find and store the checkbox locations and their check states, but it will also digitally remove the checkboxes from the document's image. This will allow Grooper to extract the checkboxes and allow PDF Data Mapping to overlay the radio buttons on a field of blank pixels.
  • Run Box Removal in an IP Profile using the Image Processing activity prior to running the Extract activity to do this.

Checkbox Widget

The Checkbox Widget inserts one or more form-fillable checkboxes into the PDF on top of where a Grooper extractor finds OMR checkboxes.

  • Checkboxes are common PDF elements used to indicate a choice from one or many options.
    • Note checkboxes (inserted by Checkbox Widget) differ from radio buttons (inserted by Radio Group Widget). For radio buttons, only one choice out of a group may be selected. For checkboxes, any number of choices may be selected.
  • The Data Field(s) this annotation references must use an OMR extractor to return results: Labeled OMR, Ordered OMR, or Zonal OMR
  • However, this extractor may use any of the OMR Modes (CheckOne, CheckMulti or Boolean).
  • PDF Data Mapping will insert a simple checkbox PDF element for each checkbox the extractor locates.


In this example, we will create a Checkbox Widget for the checkboxes extracted using the "Checklist" Data Field. This is a Labeled OMR extractor that uses the CheckMulti Mode, indicating one of any number of checkboxes may be checked for each label. Checked or not, the Checkbox Widget will insert a checkbox element into the generated PDF.

Before Annotation

After Annotation

With a Checkbox Widget added to the Annotations list:

  1. Use the Fields property to select the Data Field you wish use to insert the group of radio buttons.
    • Press the ellipsis button at the end of the Fields property.
  2. In the window that pops up, mark the checkbox next to the Data Field you wish to select.
    • In our case, we are choosing the "Checklist" Data Field.
    • Be aware, this fields must (1) use an OMR extractor to return results (2) have already been extracted by the Extract activity and (3) have located checkboxes during extraction or no checkboxes will be placed.
  3. Press "OK" when finished.


  1. Determine if you need to adjust the annotation's padding. Adjust the Padding property if you do.
  2. Determine if you need to adjust if the annotation is editable or printable. Adjust the Allow Edit or Print properties if you do.
    • Use the defaults to prevent the users from being able to adjust the widget and prevent it from being visible when printed.
    • Please note: Allow Edit refers to a reader's ability to edit the widget as a PDF element, such as moving its location on the PDF or adjusting its size. It does not refer to a reader's ability to interact with the widget (check the checkboxes).
  3. Press "OK" when finished (or continue adding more Annotations).

BE AWARE: The Checkbox Widget overlays checkboxes on a page's image. Any printed checkbox on the original page will persist (behind the widget), unless removed by the Image Processing activity.

For more information, see above.

Signature Widget

The Signature Widget inserts a signature block into the PDF.

  • Signature blocks allow PDFs to capture digital signatures. This allows you to create a document that can be digitally signed straight from Grooper on export.
  • The Data Field(s) this annotation references will typically use a zonal extractor to define where the signature block should be: Detect Signature or Highlight Zone most commonly
  • Other extractor types may work, but these are most typical. PDF Data Mapping will insert the signature block using the geometric boundaries of the extraction instance. Zonal extractors are well suited to define fixed boundaries of extraction results.


In this example, we will create a Signature Widget annotation for the signature line on the application form, using the "Signature" Data Field of our Data Model. The Signature Widget will insert an interactable signature element into the generated PDF.

Before Annotation

After Annotation

With a Signature Widget added to the Annotations list:

  1. Use the Fields property to select the Data Field you wish use to insert the signature block.
    • Press the ellipsis button at the end of the Fields property.
  2. In the window that pops up, mark the checkbox next to the Data Field you wish to select.
    • In our case, we are choosing the "Signature" Data Field.
    • Be aware, this fields must (1) have already been extracted by the Extract activity and (2) have drawn a zone defining the location and size of the signature block (Most commonly, Detect Signature or Highlight Zone is used to do this).
  3. Press "OK" when finished.


  1. Determine if you need to adjust the annotation's padding. Adjust the Padding property if you do.
  2. Determine if you need to adjust if the annotation is editable or printable. Adjust the Allow Edit or Print properties if you do.
    • Use the defaults to prevent the users from being able to adjust the widget and prevent it from being visible when printed.
    • Please note: Allow Edit refers to a reader's ability to edit the widget as a PDF element, such as moving its location on the PDF or adjusting its size. It does not refer to a reader's ability to interact with the element (submit a signature).
  3. Press "OK" when finished (or continue adding more Annotations).

BE AWARE: The Signature Widget overlays a signature block on a page's image. If present, any printed signature on the original page will persist (behind the widget), unless removed by the Image Processing activity.

For more information, see above.

Textbox Widget

The Textbox Widget inserts text-editable form fields into the generated PDF.

  • Form fields allow PDFs to collect and store data entered by a user.
  • Users can configure a Textbox Widget to create blank form fields or form fields with a value Grooper extracts already populated.
    • For blank form fields, the Data Field(s) this annotation references should use Highlight Zone to place a blank zone where the field should be inserted.
    • For populated form fields, the Data Field(s) this annotation references can use any extractor type that returns a single-instance value (most typically Labeled Value).
      • This allows Grooper to not only generate a PDF with form fields where they weren't present in the source document, but prefill them with data Grooper collects.
  • Be aware, a Textbox Widget differs from a Text Annotation. Where Textbox Widget will insert a text-editable form field, Text Annotation adds a text comment to to PDF.

Before Annotation

After Annotation

In this example, we will use the Textbox Widget to insert a form field for the "Signature Date" Data Field. This used Labeled Value to extract the date. PDF Data Mapping will overlay the form field on top of the extraction result.

  • FYI: We will also adjust the generated widget's size using the Padding property. This is common when configuring Textbox Widgets when the font size you want to use for the form field is larger than the printed typeface on the document.


With a Textbox Widget added to the Annotations list:

  1. Use the Fields property to select the Data Field(s) you wish to use to create text-editable form fields.
    • Press the ellipsis button at the end of the Fields property.
  2. In the window that pops up, mark the checkboxes next to the Data Field(s) you wish to select.
    • In our case, we are choosing the "Signature Date" Data Field.
    • Be aware, these fields must be extracted by the Extract activity or no textbox will be generated.
  3. Press "OK" when finished.


  1. Determine if you need to adjust the annotation's padding. Adjust the Padding property if you do.
    • Adjusting Padding for Textbox Widgets is common if the desired font size in the textbox differs from that printed on the source document. In this example, we increased the textbox's size by 0.1 in on each side.
  2. Determine if you need to adjust if the annotation is editable or printable. Adjust the Allow Edit or Print properties if you do.
    • Use the defaults to prevent the users from being able to adjust the widget and prevent it from being visible when printed.
    • Please note: Allow Edit refers to a reader's ability to edit the widget as a PDF element, such as moving its location on the PDF or adjusting its size. It does not refer to a reader's ability to edit the value inside the textbox. To configure that, use the Read Only property.


  1. Adjust the textbox's other properties as desired.
    • These properties give you the ability to adjust the font and font size inside the textbox.
    • Please note: If you want to prevent a reader from editing the Grooper collected value inside the textbox, turn Read Only to True.
  2. Press "OK" when finished (or continue adding more Annotations).

Text Annotation

The Text Annotation inserts a text comment in the PDF.

  • This has two primary uses:
    • Insert comments into the PDF that are viewable when opening the PDF in a PDF viewer, but not printable.
    • Print a simple text note on a page.
      • Commonly, users will want to print a word like "PROCESSED" on the output PDF. This notes the document has been processed through Grooper.
  • The Data Field(s) may use any kind of extractor as long as it produces a result with (1) a location on the page to place the comment and (2) a text value to add to the comment.
  • Be aware, a Textbox Widget differs from a Text Annotation. Where Textbox Widget will insert a text-editable form field, Text Annotation adds a text comment to to PDF.


In this example, we will use a Text Annotation to print the word "PROCESSED" on the first page of the PDF generated by PDF Data Mapping.

  • We will use the "IsProcessed" Data Field to do this. The extraction logic to make this happen requires a less-than-common technique. We will show you how we build this Data Field in the #Technique: "IsProcessed" Data Field section.

Before Annotation

After Annotation

With a Text Annotation added to the Annotations list:

  1. Use the Fields property to select the Data Field(s) you wish to use to insert the text comment.
    • Press the ellipsis button at the end of the Fields property.
  2. In the window that pops up, mark the checkboxes next to the Data Fields you wish to select.
    • In our case, we are choosing the "IsProcessed" Data Fields.
    • Be aware, these fields must (1) be extracted by the Extract activity and (2) hold a location and value or no comment will be added.
  3. Press "OK" when finished.


  1. Determine if you need to adjust the annotation's padding. Adjust the Padding property if you do.
  2. Determine if you need to adjust if the annotation is editable or printable. Adjust the Allow Edit or Print properties if you do.
    • Use the defaults to prevent the users from being able to adjust the annotation and prevent it from being visible when printed.
    • In our case, we do want this comment printed when the document is printed. So, we've changed Print to True.


  1. Adjust the comment's appearance, as desired, using the Appearance properties.
    • Users may change the comment's font and font size with the Font Name and Font Size properties.
    • Users may select a Fill Color and Text Color in one of two ways:
      • Using the the dropdown to select from a list of system colors
      • Or, entering an RGB value using the format #, #, #
      • Be aware, there is no true "transparent" Fill Color option. The selectable Transparent option is a system color that equates to "white".
  2. Press "OK" when finished (or continue adding more Annotations).

Technique: "IsProcessed" Data Field

To print the word "PROCESSED" on the PDF, we used a specific technique. A Text Annotation just needs two things from a Data Field to insert the annotation: (1) a location on the page to place the comment and (2) a text value to add to the comment. The word "PROCESSED" did not exist on the source PDF. So, we had to figure out a way to use a Data Field to generate a result rather than extract it.

We did this in essentially two steps:

  1. Use the Highlight Zone extractor to define where the annotation should be printed.
  2. Use a Calculated Value to define the text we want to print (the word "PROCESSED").


This gives a Text Annotation everything it needs to insert the comment: (1) A location and (2) some text

How To: Configure Bookmarks

Bookmarks in PDFs aid readers when navigating through multipage documents. PDF Data Mapping can insert bookmarks into the generated PDF to take advantage of this functionality. This can be done in one of two ways (or both):

  1. Using a document folder's (Batch Folder) child folders (Batch Folder).
  2. Using a document folder's extracted Data Fields.

In this tutorial we take an application packet separated into component child documents and use PDF Data Mapping's Bookmarking property to create bookmarks for each one.

The application packet as a whole consists of five separate and distinguishable documents.

  1. The application itself (and a coversheet)
  2. A proposal summary
  3. The student's resume
  4. A letter of recommendation
  5. An essay

Our goal is to create a bookmark in the generated PDF file for each of these component documents (child documents).

Rather than exporting five separate PDF files for each component document, we will export a single PDF for the whole packet with navigable bookmarks.


We we also demonstrate how to use Data Fields for bookmarking. This allows us to insert PDF bookmarks for locations of extracted data.

  • The "Signature" bookmark in this example would take the reader to the signature line of the PDF, using the location extracted by the "Signature" Data Field in our Data Model.

Bookmarking Option 1: Child document/folder bookmarks

There are two ways the Bookmarking feature can insert bookmarks into a PDF generated by PDF Data Mapping.

  1. It can insert a bookmark for each child document/folder.
  2. It can insert a bookmark for selected (single instance) Data Fields.

This section will detail how to insert bookmarks using child documents.

Option 1 Prereqs: Separated child documents

For PDF Data Mapping to work, Grooper needs to have data to map.

If enabled, Bookmarking will automatically add bookmarks to a PDF if a document has child documents in the Batch's folder hierarchy.

  • If a document at folder level 1 is exported and has two child documents, the generated PDF will have two bookmarks in the generated PDF.
  • Clicking on the bookmark will take the reader to that child document's page in the PDF.


For this to work:

  • The parent document folder must have separable child pages.
    • Either from scanning pages in with a scanner or using the Split Pages activity to generate pages from an imported PDF.
  • These child pages must then be separated into child folders.
    • Either using a Separation Profile when scanning or using the Separate activity.


Technically speaking, that's all you need. PDF Data Mapping will add PDF bookmarks for every child document and name it using each child folder's name.

  • Be aware, without classifying the child documents these names will just be "Folder (1)" "Folder (2)" "Folder (3)" and so on.

Not separated

No child folders

Separated

Has child folders

Separated and classified

Has child document folders.

What about Page 1 there?

Is it in a folder? No. Then it won't get a bookmark.

Adding bookmarks for child documents/folders

PDF Data Mapping will create bookmarks for child documents/folders by default. There is no configuration required besides enabling the Bookmarking property.

With a PDF Data Mapping behavior added to a Content Type:

  1. Select the PDF Data Mapping behavior in the Behaviors editor.
  2. Change the Bookmarking property to Enabled.
  3. Press "OK" when finished.


That's it! It's that simple!

As long as the document folder PDF Data Mapping is applied to has child documents/folders, bookmarks will be created for each child document.


Bookmarking Option 2: Data Field bookmarks

There are two ways the Bookmarking feature can insert bookmarks into a PDF generated by PDF Data Mapping.

  1. It can insert a bookmark for each child document/folder.
  2. It can insert a bookmark for selected (single instance) Data Fields.

This section will detail how to insert bookmarks using Data Fields. This allows PDF Data Mapping to bookmark important field value locations extracted by Grooper in the output PDF.

Option 2 Prereqs: Data Fields and extracted data

For PDF Data Mapping to work, Grooper needs to have data to map.

Bookmarking can also insert PDF bookmarks using extracted data and their location. Data Fields collect results using extractors which return results from the source document. Bookmarking will use these results' locations to embed this kind of bookmark.

For this to work:

  • You must have these Data Fields defined in a Data Model and configured to return results.
  • Data must be saved for each Data Field prior to the PDF being generated.
    • The Extract activity must run before Merge or Export generates the PDF.
    • If performing user assisted data review, the Review activity must complete before Merge or Export generates the PDF.

Adding bookmarks for Data Fields

PDF Data Mapping will insert bookmarks for extracted Data Field value locations by simply selecting which Data Field(s) you want to bookmark.

  • Please note: Only single-instance Data Fields may be bookmarked.


With a PDF Data Mapping behavior added to a Content Type:

  1. Select the PDF Data Mapping behavior in the Behaviors editor.
  2. Change the Bookmarking property to Enabled and expand its sub-properties.
  3. Select Data Elements and press the ellipsis button at the end.
  4. A "Data Elements" selection editor will appear.
  5. Select the Data Field whose location you wish to bookmark.
    • Please note: Only single-instance Data Fields may be bookmarked.
  6. Press "OK" when finished selecting Data Fields.
  7. Press "OK" when finished configuring PDF Data Mapping.


As long as the document folder PDF Data Mapping is applied to has extracted the selected Data Field(s) with an Extract activity, bookmarks will be created for each Data Field selected.

How To: Configure Metadata

The PDF Data Mapping behavior has the ability to create and insert additional metadata into the generated PDF as well, using information collected during Grooper's document processing. The metadata you are able to create falls into one of three categories:

  1. Editing the PDF's default metadata fields, including:
    • Title
    • Author
    • Subject
    • Created Date
    • Modified Date
    • Application
  2. Adding "Keywords" to the PDF metadata
    • This can be done using expression based or extraction based methods.
  3. Creating custom metadata fields and values
    • Custom metadata can be stored for any (single instance) Data Field values collected during the Extract activity.

Notice what's not included in this list is the exported document's filename (e.g. "Im_a_file.pdf"). Filename mappings are always configured using an Export Behavior.

Prereqs: Data extraction

For PDF Data Mapping to work, Grooper needs to have data to map.

For Metadata, data coming from Grooper can be mapped to the PDF in one of two ways:

  1. Using Data Field results
    • To embed custom PDF metadata, the custom fields are generated from Data Fields in the document's Data Model and their collected results.
    • This means the document must be processed by the Extract activity in order to create and populate these custom fields.
    • Or, if performing user assisted data review, the values must be previously recorded during the Review activity.
  2. Using code expressions
    • In the case of the default PDF metadata fields and keywords, expressions can be used to populate the metadata.
    • This gives you access to not only extracted Data Field results but also system data, classification information, and various functions to manipulate it.

Mapping default PDF metadata

PDF Data Mapping's Metadata settings can edit a PDF's default metadata values for its "Title", "Author", "Subject", "Application", "Created" and "Modified" properties.

With a PDF Data Mapping behavior added to a Content Type:

  1. Select the PDF Data Mapping behavior in the Behaviors editor.
  2. Change the Metadata property to Enabled and expand its sub-properties.
  3. Use a code expression to create custom values for the following default PDF metadata:
    • Title for the PDF's "Title" field
    • Author for the PDF's "Author" field
    • Creator for the PDF's "Application" field
    • Subject for the PDF's "Subject" field
    • Creation Date for the PDF's "Created" field
    • Modification Date for the PDF's "Modified" field
  4. Press "OK" when finished.


In our example, we made the following changes to the default PDF metadata:

  • Title:
    • This defaults to the expression CurrentDocument.ContentTypeName. This will make the title whatever the document's Document Type classification is.
    • We did not change Grooper's default.
  • Author:
    • This defaults to the expression LDAP.CurrentUserDisplayName. This will set the author to the Windows username for the Grooper user or service who created the PDF.
    • We changed this to evaluate to the applicant's first name, middle initial, and last name as collected by Grooper using the following expression:
      • $"{Applicant_Information.First_Name} {Applicant_Information.Middle_Initial} {Applicant_Information.Last_Name}"
  • Creator:
    • This will adjust the PDF's "Application" field. This field is left blank by default.
    • We changed this to the simple string "Grooper PDF Data Mapping".
  • Subject:
    • This field is left blank by default.
    • We changed this to use the value of the "Proposal Title" Data Field in the "Proposal Information" Data Section with the expression Proposal_Information.Proposal_Title
  • Creation Date:
    • This sets the PDF's "Created" date value and defaults to the expression DateTime.Now. This returns the current system time of your machine at the time the PDF is generated.
    • We did not change Grooper's default.
  • Modification Date:
    • This sets the PDF's "Modified" date value and defaults to the expression DateTime.Now. This returns the current system time of your machine at the time the PDF is generated.
    • We did not change Grooper's default.

Mapping Keywords

The Metadata settings can add terms to the PDF's "Keywords" field in one of two ways:

  1. Using a code expression
  2. Using an extractor (Data Type, Value Reader or Field Class)


With a PDF Data Mapping behavior added to a Content Type:

  1. Select the PDF Data Mapping behavior in the Behaviors editor.
  2. Change the Metadata property to Enabled and expand its sub-properties.
  3. To add keyword terms with a code expression, add the expression to the Keywords property.
    • This expression should evaluate to a string value. This string will be added to the PDF's "Keywords" field.
  4. To add keyword terms with an extractor, reference the extractor with the Keywords Extractor property.
    • This extractor should return a string value. This string will be added to the PDF's "Keywords" field.
  5. Press "OK" when finished.


In our example, we used an expression to insert a keyword based on the word count of the "Essay" document in the application packet.

  • "Short Essay" for essays under 400 words
  • "Long Essay" for essays over 600 words
  • "Normal Essay" for essays between 400 and 600 words


We also used an extractor to add a "signed" keyword if the application was signed and "not signed" if the application was not signed.

Mapping custom Metadata

Be aware the PDF file format has metadata fields already named "Title", "Author", "Subject", "Keywords", "Creator", "Producer", "CreationDate", "ModDate" and "Trapped".

  • Consider these names reserved.
  • If you are attempting to export Data Field values as custom PDF metadata, they cannot share any reserved names. You will need to rename the Data Field in Grooper to a unique name.

PDF Data Mapping's Metadata feature can store custom metadata as well, exporting Data Field values to custom PDF metadata fields. This is a way for Grooper to save Data Field values directly to the PDF.

  • BE AWARE: Only single-instance data can be exported to a PDF's custom metadata.
    • Data Fields at the root of a Data Model or in single instance Data Sections can be exported.
    • Data Fields in multi-instance Data Sections and Data Column values cannot be exported.


With a PDF Data Mapping behavior added to a Content Type:

  1. Select the PDF Data Mapping behavior in the Behaviors editor.
  2. Change the Metadata property to Enabled and expand its sub-properties.
  3. Turn Export Data Fields to True.
  4. Use the Field Filter editor to select a specific set of Data Fields to export. Otherwise, all Data Fields will be exported to custom PDF metadata fields.
  5. Press "OK" when finished.


In our example, we exported all Data Fields to the generated PDF's custom fields. Custom metadata can be viewed using Adobe Acrobat. Go to "Document Properties...". Then select the "Custom" tab. All selected Data Fields will be exported to this "Custom Properties" list in the PDF.

  • FYI: Spaces and other special characters in a Data Field's name will be replaced with underscores (i.e. "Field_Name")
  • FYI: Data Fields in single instance Data Sections will be named using dot notation (i.e. "Section_Name.Field_Name")

How To: Configure Piece Info

!!

BE AWARE: PIECE INFO IS STILL UNDER DEVELOPMENT

Please consider the Piece Info feature in "beta" at this time. This feature will be more fully documented once fully developed.

"PieceInfo" is a PDF dictionary of additional data stored by other applications. For example, when you save a PDF from Adobe Illustrator, PieceInfo will store the original Illustrator file (which allows the PDF to be edited in Illustrator as if it were the original). PieceInfo can be stored at the document level for the whole PDF or at the page level for one or more pages in the PDF.

PDF Data Mapping uses PieceInfo dictionaries to store extracted Data Field values as a PDF dictionary embedded in the document's structure by enabling and configuring the Piece Info settings.

  • Contrast this with the Metadata settings which store Data Field values at the as custom metadata fields in the document properties.
  • Piece Info is unique in that it can export data from a Data Table in very specific scenarios. Using the Key Column property, it can build the dictionary from only two columns in a table, and only if one of those columns acts as a "key" with unique values for each extracted row.

PieceInfo at document level vs PieceInfo at page level

With Piece Info enabled and configured, PDF Data Mapping will store the dictionary at either the document level or on a page, depending on the Batch's folder structure.


Imagine a Batch Folder that looks like this:


If PDF Data Mapping with Piece Info is configured for a parent document's Document Type, the PieceInfo dictionary is stored at the document level in the PDF.



If PDF Data Mapping with Piece Info is configured for a child document's Document Type, the PieceInfo dictionary is stored at the page level, on the first page of that child document in the PDF.

  • In this example PDF Data Mapping with Piece Info was configured for the "Green" Document Type.
    • With a PDF generated for the parent document folder, the output PDF will be 5 pages long total (because there are a total of five pages in the three child document folders).
    • Page 1 of the child document folder "Green (2)" will be page 2 in the output PDF.
    • The PieceInfo dictionary will therefore be stored in page 2 of the output PDF.
  • Be aware, it doesn't matter if the child document is a multipage document with extracted results on multiple pages. The PieceInfo dictionary is only stored once, on the first page only.



FYI

You can inspect PieceInfo with Adobe Acrobat Pro.

For inspecting PieceInfo at the document level:

  • Open the Preflight tool (Go to "All tools" > "Use Print Production" > "Preflight"). Select "Options" > "Browse Internal PDF Structure...". Click the Lightbulb icon. Expand "The document root" and look for "PieceInfo". Expand "PieceInfo" and look for whatever you named your dictionary in the Piece Info configuration.

For inspecting PieceInfo at the page level:

  • Open the Preflight tool (Go to "All tools" > "Use Print Production" > "Preflight"). Select "Options" > "Browse Internal PDF Structure...". Click the Page icon. Expand a Page and look for "PieceInfo". Expand "PieceInfo" and look for whatever you named your dictionary in the Piece Info configuration.

Known Piece Info Issues

Issue #1: The Elements Property

The Elements property does nothing. Its original intent was to be a kind of filter that allowed for simpler configuration of the Fields property. However, it was never fully implemented. It has been deemed an unnecessary property and will be removed in future versions.

Issue #2: Page Level Classification

When separating and classifying documents using ESP Auto Separation, Grooper performs page-level classification. This can cause Piece Info to create a blank PDF PieceInfo dictionary for every page in certain PDF Data Mapping configurations.

How To: Generate the PDF using Merge or Export

A PDF Data Mapping configuration is applied when Grooper builds a PDF. This will happen when one of two activities is applied to a Batch Folder:

  • Either the Export activity
  • Or the Merge activity


In either case, three conditions must be met for Grooper to create a PDF with the additional PDF Data Mapping settings.

  1. The Batch Folder being processed must be assigned a Document Type that inherits the PDF Data Mapping behavior.
    • PDF Data Mapping will need to be configured for that Document Type, its parent Content Category or its parent Content Model.
  2. A PDF Format must be added.
    • For the Export activity: To the Export Formats configuration in the Export Behavior.
    • For the Merge activity: To the Merge Format configuration.
  3. The PDF Format's Always Build property should be set to True.
    • This will ensure a new output file will be generated in cases where an imported PDF is already attached to the Batch Folder in Grooper.