2023.1:PDF Data Mapping (Behavior)

From Grooper Wiki

WIP

This article is a work-in-progress or created as a placeholder for testing purposes. This article is subject to change and/or expansion. It may be incomplete, inaccurate, or stop abruptly.

This tag will be removed upon draft completion.

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252023.120232021

PDF Data Mapping is a Content Type Behavior designed to create an exportable PDF file with additional native PDF elements.

PDF Data Mapping builds a data rich "Smart PDF" from a document folder's content. Classification results, extracted data, and more can be used to insert native PDF elements into the generated PDF.

PDF elements that can be mapped from Grooper generated results include:

  • Bookmarks
  • Metadata
  • PDF Annotations (such as text highlighting, checkbox widgets and signature widgets)

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). The first contains a Project with resources used in examples throughout this article. The second contains one or more Batches of sample documents.

About

The PDF Data Mapping behavior allows Grooper users to more fully leverage the capabilities of the PDF file type. The standard PDF Export Format in Grooper will use the page image files and their text data to create a multipage PDF file for each document folder upon Export.

However, this is just the "display information" required to open and read the document. There's a lot more to what a PDF can be than just a multipage document with page images and machine readable text. PDF content can also include metadata, keywords, bookmarks, annotations, and more!

PDF Data Mapping expands the Grooper's standard PDF generation capabilities. It creates an exportable PDF file that includes additional content available to the PDF file type. PDF Data Mapping merges Grooper collected data like classifation results and extracted data into the PDF by mapping these values to native PDF elements like bookmarks and annotation.

The expanded PDF Data Mapping functionality can be divided into three categories:

  • Annotations
  • Bookmarks
  • Metadata

Annotations

Annotations are additional objects you can add to PDF documents.

  • These annotations can increase the readability, such using a highlight annotation to call out important information.
  • These annotations can add components for the reader to interact with the document, such as checkboxes and signature widgets.


PDF Data Mapping can add the following kinds of annotations:

  1. Highlighting
  2. Radio group buttons
  3. Checkboxes
  4. Signature boxes
  5. Editable text boxes


Grooper uses information from Data Elements in a Data Model collected during the Extract activity to add these annotations.

  • For example, if Grooper extracts a "Name" field and you want that highlighted on the output PDF, you can use the "Highlight Annotation" to highlight the name Grooper extracted on the document.

FYI

The size of all these annotations can also be adjusted using a Padding property if the size of the extracted data instance is too small for your needs.

Bookmarks

Bookmarks provide easy navigation for multipage PDF documents. PDF Data Mapping can generate bookmarks in one of two ways:

  1. Bookmarks can be generated for extracted Data Field locations.
  2. When exporting a document folder that has child document folders, bookmarks can be generated for each "sub-document".
    • This is the default bookmarking behavior and requires no configuration. Bookmarks will be named however the child document folders are named.


In this example, this document is an application packet for a study abroad program. It has both kinds of bookmarks.

  • The "Signature" bookmark is from an extracted Data Field. It will take the reader to a signature location on the PDF.
  • The rest were generated for each child document in the document folder (Batch Folder) that was exported. PDF Data Mapping inserted a bookmark for each sub-document. The selected "Resume (4)" bookmark in the image took the reader to the resume page in the PDF.

FYI

Bookmarks generated for child document folders will be named whatever the documents are named.

  • A document folder's (Batch Folder) name defaults to its classified Document Type and document number. Here, "Application (2)", "Proposal Summary (3)", "Resume (3)", and so on.
  • A document folder's name can be changed if you edit the Document Type's Caption property. This will then change the bookmarks name.
    • Be aware, the document must be extracted for the Caption to be applied and its name changed.

Metadata

Metadata refers to a PDF file's content beyond the information required to display the document (the page images and encoded text data). Prior to implementing the PDF Data Mapping functionality, Grooper only had access to edit minimal PDF metadata upon export (notably the PDF's file name).

PDF Data Mapping allows Grooper to alter and store additional metadata, including:

  1. The PDF's default metadata fields, including its Title, Author, Subject values and more.
  2. Keywords
  3. Custom metadata fields
    • Custom metadata allows Grooper to embed any single instance Data Field's value directly to the PDF.


This gives Grooper a mechanism to create a viewable document with all extracted (single instance) data associated with the document itself, independent of that data being stored elsewhere (such as a database table or content management system).

FYI

This metadata can be accessed in Adobe Acrobat by opening the "Document Properties" window from the File menu.

Be aware the PDF file format has metadata fields already named "Title", "Author", "Subject", "Keywords", "Creator", "Producer", "CreationDate", "ModDate" and "Trapped".

  • Consider these names reserved.
  • If you are attempting to export Data Field values as custom PDF metadata, they cannot share any reserved names. You will need to rename the Data Field in Grooper to a unique name.

How To: Add a PDF Data Mapping Behavior

Like all Behaviors, PDF Data Mapping is configured on a Content Type node, commonly a Content Model or a Document Type.


  1. Here, we have selected a Content Model in the Node Tree.
  2. To add a Behavior, select the Behaviors property and click the ellipsis button at the end.
  3. This will bring up a dialogue window to add various behaviors to the Content Model, including PDF Data Mapping.
  4. Add PDF Data Mapping to the list by clicking on the "+" button.
  5. Select PDF Data Mapping from the listed options.


  1. Once added, you will see a PDF Data Mapping item added to the Behaviors list.
  2. Selecting this Behavior, you will see property options to configure PDF creation.
  3. Press "OK" when finished configuring PDF Data Mapping.
  4. Don't forget to save changes to the Content Model.

About the documents used in these How To tutorials

The following tutorials use a mock UNESCO Laura W. Bush Traveling Fellowship application to detail a more specific set up for a PDF Data Mapping. This is a packet of documents from a single applicant containing a cover page and five different kinds of documents.

By the end of this tutorial we will have taken a source application packet, used Grooper to process it, and exported a single PDF with:

  • Metadata collected from Grooper
  • New annotations and widgets
  • Easily navigable bookmarks

Cover Page and Application

This is an application for a traveling abroad scholarship.

Primarily, the cover page and application document will allow us to demonstrate the annotations and widgets PDF Data Mapping can generate. We will use its Annotations settings to add the following annotations:

  • Text Annotation
  • Highlight Annotation
  • Checkbox Widget
  • Radio Group Widget
  • Signature Widget
  • Textbox Widget

Secondarily, we will also use data collected from this form will be used to generate and store default and custom metadata. We will use the Metadata settings to do this.

Lastly, we will embed a bookmark that will take the PDF's reader to the signature field on the document. We will use the Bookmarking settings to do this.

Essay

This application also includes an essay from the student.

This document will demonstrate how to add Keywords to the PDF's metadata. Using the Metadata settings we will configure a code expression to insert "long essay", "normal essay", or "short essay" depending on the essay's length.

Other Documents

This packet contains three other kinds of documents as well:

  • a proposal summary
  • the applicant's resume
  • and a letter of recommendation.

For these documents (as well as the rest) we will insert bookmarks into the generated PDF, taking the reader to each document in the larger file. We will use Bookmarking settings to do this.

Notes on how this source file was separated in Grooper

The original document was imported as a single document into Grooper. We have separated it into child documents which will allow us to insert bookmarks for each separated document.

  1. The PDF Generation Behavior will be applied to the Batch Folders at folder-level one.
    • The attached file is the source application packet.
  2. The Split Pages activity was applied to split the packet into pages. Then, those pages were separated into classified document folders at folder-level two.
  3. PDF Data Mapping can create a bookmark in the generated PDF for each of these five sub documents by enabling the Bookmarking property.


By creating bookmarks for each child document, there is no need to export individual PDFs for each one. Instead, we will use PDF Data Mapping to generate one PDF for the whole application packet as use the bookmarks to navigate between each document.

How To: Configure Annotations

In this tutorial we will configure at least one example of each Annotation option.

  • Text Annotation
  • Highlight Annotation
  • Radio Group Widget
  • Checkbox Widget
  • Signature Widget
  • Textbox Widget

Prereqs - Data Fields and extracted data

For PDF Data Mapping to work, Grooper needs to have data to map.

  • For Annotations this means extracted Data Fields.
  • The Extract activity must run before Merge or Export generates the PDF.


Each of the Annotation Types references a Data Field in a Data Model as part of their configuration. If the Data Field does not collect data during the Extract activity, the PDF Data Mapping won't know where to place the annotation.

About the Data Model used for this tutorial

The Data Model we're working with has several Data Fields that will allow PDF Data Mapping to place annotations and widgets.

The "Last Name" "First Name" and "Middle Initial" Data Fields (in the "Applicant Information" Data Section) will demonstrate the Highlight Annotation

  • These fields use Labeled Value to extract field values next to a label.
  • Be aware, nearly any extractor type can be used to insert a highlight annotation. Grooper just needs a location on the document to draw the highlight boundaries.

The "US Citizen" Data Field will demonstrate the Radio Group Widget.

  • This field uses Labeled OMR to extract a group of checkboxes where only one may be checked.
  • Be aware, any OMR extractor (Labeled OMR, Ordered OMR or Zonal OMR) would be able insert the radio group widget as long as its Check Mode is set to CheckOne.

The "Checklist" Data Field will demonstrate the Checkbox Widget.

  • This field uses Labeled OMR to extract a group of checkboxes where one or more may be checked.
  • Be aware, any OMR extractor (Labeled OMR, Ordered OMR or Zonal OMR) would be able insert the checkbox widget.

The "Signature" Data Field will demonstrate the Signature Widget.

  • This field uses Detect Signature to detect whether or not a signature is present on the document.
  • Be aware, any zonal extractor (Read Zone, Highlight Zone or Detect Signature) would be able insert the signature widget.

The "Signature Date" Data Field will demonstrate the Textbox Widget.

  • Textbox Widget adds a text-editable form field to the PDF to store a field value.
    • Compare this to a Text Annotation which simply adds a text comment to the PDF.
  • This field uses Labeled Value to extract the date the application was signed.
  • Be aware, any zonal extractor (Read Zone, Highlight Zone or Detect Signature) would be able insert the signature widget.

The "IsProcessed" Data Field will demonstrate the Text Annotation.

  • Text Annotation inserts a text comment in the PDF.
    • Compare this to a Textbox Widget which adds an actual form field to the PDF to store a field value.
    • We will use this field and annotation to print the word "PROCESSED" on the output PDF
  • This field uses Highlight Zone to draw an extraction zone for the field and the Data Field's Default Value to determine what's printed.
    • This is a technique common to Text Annotation use cases and will be explained in further depth below.

Adding Annotations

PDF Data Mapping inserts various types of PDF annotations and widgets by configuring its Annotations property. Users can add one or more Annotation Types to the Annotations list. Adding a new Annotation to the list is simple.

With a PDF Data Mapping behavior added to a Content Type:

  1. Select the PDF Data Mapping behavior in the Behaviors editor.
  2. Select the Annotations property and press the ellipsis button at the end.
  3. This will bring up the Annotations editor.
  4. Press the "+" button.
  5. Select the Annotation Type you want to add from the dropdown list.


  1. Once added, you will see the Annotation Type added to the Annotations list.
  2. All Annotation Types will have a set of General properties to configure.
  3. Some Annotation Types have additional properties you can configure.
    • For example, the Highlight Annotation has Appearance properties you can configure to adjust the highlight's color and other appearance properties.
  4. Press "OK" when finished.

Notes on shared properties

All Annotation Types share a set of General properties.

  • Fields
    • The Fields property is required.
    • Select Data Fields to map the Data Fields to the PDF annotation with this property. If you don't select any Data Fields or the selected Data Fields are not extracted, PDF Data Mapping will not insert an annotation in the output PDF.
    • Be aware, all Data Fields are selected by default.
  • Padding
    • The Padding property can adjust the size of the annotation.
    • Grooper uses a Data Field's result instance to draw the annotation's boundaries.
      • The size of the Data Field's instance may be too small for what you want to appear on the output PDF. Use Padding to increase the annotation's size on the PDF generated by PDF Data Mapping.
  • Allow Edit
    • Self explanatory. Turn this property to True if you want the generated annotation/widget to be editable. Keep it False if you want it to be "read only".
  • Print
    • In a PDF viewing application, like Adobe Acrobat, all annotations and widgets PDF Data Mapping generates will be visible. The Print property determines whether or not the annotation is visible when the PDF is printed.
    • Be aware, the default is False.
      • Grooper presumes you will open the "Smart PDF" output by PDF Data Mapping will be opened in a PDF viewer (where all annotations will be visible).
      • Grooper also presumes if you want to print the PDF, you want something more like the original document printed, not the one with additional PDF elements Grooper inserts. If you do want those annotations and widgets visible when the PDF is printed, turn Print to True.

Annotation Types

Highlight Annotation

The Highlight Annotation overlays a colored rectangle with adjustable transparency on a Data Field's extracted location. In other words, it can highlight extraction results.

  • Use this to highlight important values extracted from Grooper.
  • Like all Annotations, this highlight can be printable or not. When the Print property is False, the highlight will show up when viewed in a PDF viewer but not if the PDF is printed.


In this example, we will use the Highlight Annotation to highlight the extracted "Last Name", "First Name" and "Middle Initial" fields from the application form. To configure this Annotation we will:

  • Select the Data Fields we wish to highlight.
  • Adjust how we want the highlight to look.

Before Annotation

After Annotation

With a Highlight Annotation added to the Annotations list:

  1. Use the Fields property to select the Data Fields you wish to highlight.
    • Press the ellipsis button at the end of the Fields property.
  2. In the window that pops up, mark the checkboxes next to the Data Fields you wish to highlight.
    • In our case, we are choosing the "Last Name", "First Name", and "Middle Initial" Data Fields.
    • Be aware, these fields must be extracted by the Extract activity or nothing will be highlighted.
  3. Press "OK" when finished.


  1. Determine if you need to adjust the annotation's padding. Adjust the Padding property if you do.
    • Adjusting Padding for Highlight Annotations is common. In this example, we increased the highlights size by 0.1 in on each side.
  2. Determine if you need to adjust if the annotation is editable or printable. Adjust the Allow Edit or Print properties if you do.
    • Use the defaults to keep the highlight "read only" and prevent it from being visible when printed.


  1. Adjust the highlight's appearance, as desired, using the Appearance properties.
  2. Most commonly, users will adjust the Fill Color.
    • Use the dropdown to select from a list of system colors.
    • Or, enter an RGB value using the format #, #, #
    • This property defaults to the "Grooper green" highlight seen in Review's Data View. In this example, we've changed it to Yellow.
  3. Press "OK" when finished (or continue adding more Annotations).

Radio Group Widget

The Radio Group Widget overlays a group of radio button PDF elements into the document on top of where a Grooper extractor finds OMR checkboxes.

  • Radio buttons are common PDF elements used to indicate a single choice from multiple options in a list.
  • The Data Field(s) this annotation references must use an OMR extractor to return results: Labeled OMR, Ordered OMR or Zonal OMR
    • This extractor must also have its Mode set to CheckOne (Only one box out of many may checked/selected).
  • PDF Data Mapping will insert one radio button for each checkbox the extractor locates.


With a Radio Group Widget added to the Annotations list:

  1. Use the Fields property to select the Data Field you wish use to insert the group of radio buttons.
    • Press the ellipsis button at the end of the Fields property.
  2. In the window that pops up, mark the checkbox next to the Data Field you wish to select.
    • In our case, we are choosing the "US Citizen" Data Field.
    • Be aware, this fields must (1) use an OMR extractor to return results (2) with its Mode set to CheckOne (3) have already been extracted by the Extract activity and (4) have located checkboxes during extraction or no radio buttons will be placed.
  3. Press "OK" when finished.


  1. Determine if you need to adjust the annotation's padding. Adjust the Padding property if you do.
  2. Determine if you need to adjust if the annotation is editable or printable. Adjust the Allow Edit or Print properties if you do.
    • Use the defaults to keep the highlight "read only" and prevent it from being visible when printed.
  3. Press "OK" when finished (or continue adding more Annotations).

Be Aware: Annotations are overlaid on a page's image

In the case of every Annotation Type, PDF Data Mapping inserts the annotation by overlaying it on top of a page's image.

  • Notice the original image for this document used checkboxes, not radio buttons. We see an "X" inside of a square box.

You can actually see the edges of the square box persist in the generated PDF (Here, highlighted in yellow for your viewing pleasure).

  • In this case, the boxes were detected by the "detection only" Box Detection IP command and not removed by the "detection and removal" Box Removal command.
  • Box Detection finds and store the checkbox locations and check states but does not actually alter the image in any way.

Maybe you care about this, and maybe you don't. If you do, use Box Removal instead.

  • Box Removal will also find and store the checkbox locations and their check states, but it will also digitally remove the checkboxes from the document's image. This will allow Grooper to extract the checkboxes and allow PDF Data Mapping to overlay the radio buttons on a field of blank pixels.
  • Run Box Removal in an IP Profile using the Image Processing activity prior to running the Extract activity to do this.

Checkbox Widget



PDF Data Mapping also has the capability to insert form-fillable checkboxes as well, using the Checkbox Widget Annotation Type. This Annotation Type also uses OMR extraction techniques (such as Labeled OMR and Zonal OMR) to find existing checkboxes on the document. It works a lot like the Radio Group Widget annotation, just instead of radio buttons, editable checkboxes are overlaid on the document.

For example, we will create a Checkbox Widget annotation for the checkboxes in the "Checklist" section of this document, the "Application", "Proposal Summary", "Essay", "Resume" and "Recommendation Letter" Data Fields. These are Boolean OMR checkboxes, returning "true" if the box next to the corresponding label is checked, and "false" if unchecked. In either case, checked or not, the Checkbox Widget will insert an editable checkbox element into the generated PDF.

Before Annotation

bad picture

After Annotation

bad picture

  1. In the Annotations collection editor, click the "+" button to add the Checkbox Widget annotation.
    • Refer to the "Add the Behavior" tab if you are unclear how we got to this window in Grooper Design Studio.
  2. Select Checkbox Widget from the list.

  1. This will add a Checkbox Widget to the Annotations list.
  2. The only configuration that is strictly required is to indicate which Data Fields you wish to use to create the checkboxes. Click the ellipsis icon to the right of the Fields property to select these Data Fields.
    • Whatever result is returned by the selected Data Fields will be used to draw and insert the checkboxes.
    • You may use the Padding property to adjust the size of the checkboxes if you desire.
    • These Data Fields must use an OMR based extraction method (Labeled OMR, Ordered OMR, or Zonal OMR) to insert the checkboxes.
  3. In the window that pops up, check the boxes next to the Data Fields you wish to use to create the checkboxes.
    • In our case, we are choosing the "Application", "Proposal Summary", "Essay", "Resume" and "Recommendation Letter" Data Fields. Once collected by the Extract activity, Grooper will know which results you want to use to create the checkboxes. This will include the checkbox locations and check states stored in the document's layout data. The Checkbox Widget annotation will then insert checkboxes into the generated PDF as seen in the "After Annotation" image above.

Signature Widget



Form-fillable signature boxes can be inserted using the Signature Widget annotation. This Annotation Type uses a zonal extraction type (such as Detect Signature or Highlight Zone) to draw the boundaries of the inserted signature widget. This allows you to create a document that can be digitally signed straight from Grooper upon exporting the generated PDF.

For example, we will create a Signature Widget annotation for the signature line on the application form, using the "Signature" Data Field of our Data Model. The Signature Widget will insert an interactable signature element into the generated PDF.

Before Annotation

After Annotation

  1. In the Annotations collection editor, click the "+" button to add the Signature Widget annotation.
    • Refer to the "Add the Behavior" tab if you are unclear how we got to this window in Grooper Design Studio.
  2. Select Signature Widget from the list.

  1. This will add a Signature Widget to the Annotations list.
  2. The only configuration that is strictly required is to indicate which Data Fields you wish to use to create the signature box. Click the ellipsis icon to the right of the Fields property to select these Data Fields.
    • Whatever result is returned by the selected Data Fields will be used to draw and insert the signature box widget.
    • You may use the Padding property to adjust the size of the signature box if you desire.
    • Zonal based extraction methods (such as Signature Detection and Highlight Zone) are typically used as the Data Field's extractor type.
  3. When the window pops up, check the boxes next to the Data Fields you wish to use to create the checkboxes.
    • In our case, we are choosing the "Signature" Data Field. Once collected by the Extract activity, Grooper will be supplied the size and location of the Data Field's extraction zone, which will form the size and location of the PDF signature widget. The Signature Widget annotation will then insert the form-fillable signature box into the generated PDF as seen in the "After Annotation" image above.

Just like any Annotation Type, the extraction result from the Data Field is critical for placing the signature annotation on the generated PDF. Let's look at the "Signature" Data Field's result to understand a little better how these results are used to create the signature widget.

In our case, we're using the Detect Signature extractor type to supply these results. The Detect Signature extractor is perfectly suited for the Signature Widget Annotation Type.

  • It actually combines both Zonal and OMR based extraction techniques to determine if a signature is present in the zone. It sets the boundaries of where you expect to find a signature using Zonal based methods and detects if the signature is present by counting the percentage of filled pixels in the zone, which is the basis of OMR based extraction methods. You can then output different values if the zone is filled above or below a certain percentage. In this case, the extractor returns "Not Signed" because there aren't enough pixels present in the extraction zone to count as filled. If there were a signature present, there'd be more pixels present, accounting for a higher filled percentage.

This is great for our purposes because it gives us the exact information we need for the Signature Widget, which is an extraction zone. Grooper needs a data instance indicating the size and location for the generated signature widget.

  • But wait there's more! We also get some bonus information about whether or not there's a signature present. Does the Signature Widget Annotation Type need to know if there's a signature present? No. It does not. It will place the widget no matter what the result is. But might that information be otherwise useful to you? Probably.
  1. We have selected the "Signature" Data Field in our Data Model.
  2. This Data Field uses the Detect Signature extractor to draw the extraction zone used to insert the signature widget.
  3. This extractor uses the Text Region Location option.
  4. This gives us the ability to anchor the extraction zone to an extractable text anchor, using the Text Extractor property.
    • In this case we've anchored the zone to the word "Signature" outlined in blue in the document viewer. Where do we want to place the extraction zone (and ultimately the signature widget)? On the signature line. How do we know where that line is? It's above the text label "Signature".
  5. The extraction zone itself is drawn using the Translation and Adjustment properties.
    • This allows us to set the size (Adjustment) and location (Translation) of the extraction zone (and ultimately the signature widget) relative to the Text Extractor's result.
    • The extraction zone will be the green rectangle in the document viewer.
  6. Click over to the "Tester" tab and test the extraction.

  1. When the PDF Data Mapping behavior builds the PDF, using the Signature Widget annotation, the extraction zone's size and location forms the inserted signature widget.

Textbox Widget



The Textbox Widget Annotation Type will insert editable text boxes into the generated PDF. One simple way to use this functionality is to use the Highlight Zone extractor type to place a blank zone where you want to place an empty text box on the PDF. However, any extractor type can be used to define the textbox's location. Furthermore, if the Data Field used to create the annotation collects a valued during the Extract activity, not only will a textbox be inserted into the generated PDF, but it will be prefilled with the Data Field's extracted value upon export.

For example, we will use the Textbox Widget functionality to fill out the blank coversheet on the first page of our application packet. We will end up using a Highlight Zone extractor to define the size and location of the text box. However, we're going to go one step further and populate the Data Field's used with some information from other Data Field's in our Data Model. By the end of it, PDF Data Mapping will not only insert editable textboxes into the generated PDF, but fill them in with text. By the end of it, we end up with this blank coversheet automatically populated with some information collected during the Extract activity.

Before Annotation

After Annotation

  1. In the Annotations collection editor, click the "+" button to add the Textbox Widget annotation.
    • Refer to the "Add the Behavior" tab if you are unclear how we got to this window in Grooper Design Studio.
  2. Select Textbox Widget from the list.

  1. This will add a Textbox Widget to the Annotations list.
  2. The only configuration that is strictly required is to indicate which Data Fields you wish to use to create the signature box. Click the ellipsis icon to the right of the Fields property to select these Data Fields.
    • Whatever result is returned by the selected Data Fields will be used to draw and insert the textbox widget. If that Data Field collected a value during the Extract activity, it will also be filled with the returned value.
  3. In the window that pops up, check the box next to the Data Fields you wish to use to create the checkboxes.
    • In our case, we are choosing the "Candidate", "Title of Proposal" and "Country of Travel" Data Fields. Once collected by the Extract activity, Grooper will be supplied the sizes and locations of the Data Field's data instances for each result. This will form the size and location of the textbox widget. The Textbox Widget annotation will then insert the form-fillable textbox into the generated PDF as seen in the "After Annotation" image above. These boxes will also be prefilled with the extraction results from each Data Field.

The Textbox Widget annotation has some additional configuration options as well.

  1. As with all Annotation Types, you can optionally adjust the size of the annotation using the Padding property.
  2. You can also change the font and font size of the editable text in the textbox using the Font Name and Font Size.

As far as looking behind the scenes, there's at least two things going on with how we've set up these Data Fields' extraction, ultimately supplying the result used to insert the Textbox Widget annotation.

First, we used the Highlight Zone extractor type to draw the textbox, defining the size and location of the annotation upon generating the PDF.

  1. We have selected the "Candidate" Data Field in our 'Data Model.
  2. Each Data Field's Value Extractor is set to Highlight Zone.
  3. We used the Relative Region Location option to anchor an extraction zone to the box next to the label "Candidate".
    • This will form the size and and location of the inserted textbox annotation.

Second, we used an expression to return a value, using the results of other Data Fields in our Data Model.

  1. We've used the Calculated Value property (in Calculate Mode Always Set) to return the full name of the candidate extracted by the "Last Name", "First Name", and "Middle Initial" Data Fields
    • The full expression is as follows: Applicant_Information.First_Name + " " + Applicant_Information.Middle_Initial + " " + Applicant_Information.Last_Name
  2. This will take the extraction results of these three Data Fields and concatenate them with space characters in between.

  1. However, if we go to the "Tester" tab...
  2. ... and test extraction, we're going to get an error.
    • We're in the wrong scope! We need to go up to the Data Model's level and test extraction there. We need the full Data Model's results to do what we're trying to do here. Testing extraction on this "Candidate" Data Field, it can't "see" the "Last Name", "First Name" and "Middle Initial" Data Fields results to combine them.

  1. Once we test extraction on the Data Model you'll see what results are actually collected by the Extract activity.
  2. Make sure you're on the "Tester" tab and test the extraction.
  3. The Calculated Value expression we configured forms one result for the "Candidate"...
  4. ...using the results of the "Last Name", "First Name" and "Middle Initial" Data Field's results.
  5. With a result returned and zone drawn upon extract, the Textbox Widget annotation has all the information it needs to place the form-fillable textbox and fill it with the results.


FYI

This certainly isn't the only way to set up a Data Field for a Textbox Widget. This is just how we did it for the point of illustrating the Textbox Widget functionality. You are not required to use the Highlight Zone extractor type. You can use whatever extractor type best suits your document's needs. Often Grooper users will use the Reference extractor to point to a Data Type's results and adjust the size of the Textbox Widget using its Padding property.

Text Annotation

The Text Annotation inserts a text comment in the PDF. This has two primary uses:

  • Insert comments into the PDF that are viewable when opening the PDF in a PDF viewer, but not printable.
  • Print a simple text note on a page.
    • Commonly, users will want to print a word like "PROCESSED" on the output PDF. This notes the document has been processed through Grooper.

To be continued

How To: Configure Bookmarks

About

Bookmarks in PDFs aid readers when navigating through multipage documents. PDF Data Mapping can insert bookmarks into the generated PDF to take advantage of this functionality. This can be done in one of two ways (or both):

  1. Using a Batch Folder's child document folders.
  2. Using the document's extracted Data Fields.

We will focus on the bookmarking method (as it is more common). Often it is the case you will import a file into Grooper that has multiple documents inside you want to separate and classify, but otherwise all belong together in one way or another.

Such is the case with our study abroad application packet. The application packet as a whole consists of five separate and distinguishable documents.

  1. The application itself (and a coversheet)
  2. A proposal summary
  3. The student's resume
  4. A letter of recommendation
  5. An essay

Our goal is to create a bookmark in the generated PDF file for each of these component documents (or child documents as we will come to call them).

Rather than exporting five separate PDF files for each component document, we will export a single PDF for the whole packet with navigable bookmarks corresponding to each component document.

  1. Application - For the application itself (and its coversheet)
  2. Proposal Summary - For the proposal summary
  3. Resume - For the student's resume
  4. Rec Letter - For the letter of recommendation
  5. Essay - For the essay

Prereqs - Split Pages, Separation, and Classification

In order to accomplish this goal, we're going to have to do some things to this application packet before we configure PDF Data Mapping.

By the end of it, we're looking for a Batch whose documents have a structure like this. The documents in this batch consist of two Batch Folder levels.

  1. Folder Level 1: This is the parent document folder. It is the container for the full document. All seven pages of the application packet in this case.
  2. Folder Level 2: These are the child document folders for the parent document. They are the containers for each component document of the full application packet.

This is what we want to end up with. How did we get there? Long story short, we have some document separation and classification requirements before we can insert bookmarks in the generated PDF. The bookmarks are inserted for each child document folder and named after their classified Document Type's name. In order to do that, we need to split out the pages of the imported document, separate them into child document folders, and classify them first.

The full application document came into Grooper like this. A 7 page PDF file with each of these 5 component documents was imported into a new Batch. This is now the parent document folder at Folder Level 1.

But there's documents in them there document! How do we get them out?

First, we need to use the Split Pages activity to create child Batch Page objects.

This will split out the pages of the imported PDF file, creating one child Batch Batch for each page in PDF on the parent document folder. Now we have page objects we can manipulate in our Batch.

Now that we have Batch Page objects in our Batch, we can use the Separate activity to insert the second folder level. This is the first step in organizing these pages into child documents. We need to distinguish between one collection of pages as a document and another collection of pages as a document. Creating a folders is the first part of that equation.

Now, we have child document folders for this parent document folder, but they are just blank folders. There is nothing to distinguish one folder from the next.

By default, the Separate activity runs on the Batch level scope, inserting folders at Folder Level 1. When separating child documents like this, you will need to change the Scope property of the Separate activity to run it at the Folder Level 1 scope. This will separate the loose pages of folders at Level 1, inserting child document folders at Level 2 below the parent folder at Level 1.

And, that's the second part of the organization equation, classification. Next, these folders will be assigned a Document Type from our Content Model using the Classify activity.

By default, the Classify activity runs on the Folder Level 1 scope, classifying document folders at the first folder level in the Batch hierarchy. We want to classify the child document folders at Folder Level 2. When classifying child document folders like this, you will need to change the Scope property of the Classify activity to run at the Folder Level 2 scope.

Furthermore, that parent document folder would need a Document Type assigned to it at some point as well. The Batch Process for this Batch might have two Classify activities. One running on Folder Level 1 to classify the parent document folder and another running on Folder Level 2 to classify the child document folders.

Now, we have everything we need to configure the bookmarking functionality of PDF Data Mapping. Bookmarks will be created every time a new child document is encountered and named after the Document Type assigned to that folder.

When the full PDF is generated, a bookmark named "Application" will be inserted at the first page of the PDF. That child document is two pages long. The third page of the full PDF will be the proposal summary. So a bookmark named "Proposal Summery" will be inserted at page three. A "Resume" bookmark will be inserted at page four. And so on.

FYI

There are many ways to separate and classify documents, including ESP Auto Separation which both separates and classifies documents with a single activity (just Separate). But this is the general idea to get us where we need to go.

One way or another, create classified child document folders from a parent document folder. That way when we generate the PDF for the parent document folder upon export, bookmarks will be created for the classified child document folders.

Add the Behavior and Configure It for Bookmarking

Bookmarking is one of the configuration options for the PDF Data Maping Behavior. A Content Type Behavior can tell an activity (specifically the Export activity, in the case of PDF Data Mapping) how to use the Content Type to do something (in this case, how to use the Content Model's Document Types to insert bookmarks into the PDF upon export).

  1. All Behaviors are added to a Content Type object.
    • We will add the PDF Data Mapping behavior to this Content Model named "PDF Data Mapping - UNESCO Packet".
  2. All Behaviors are added using the Behaviors property. Select the Behaviors property and press the ellipsis button at the end to add PDF Data Mapping.
  3. In the Behaviors editor window that pops up, click the "+" button to add a Behavior.
  4. Choose PDF Data Mapping from the list.

  1. Once added, you will see PDF Data Mapping added to the list on the left. Select it.
  2. To enable the bookmarking functionality, in the right panel, click the checkbox next to Bookmarking property.
  3. Open up the subproperties and we see we have two Label properties. Here you can change the Label Style and Label Color to your preference.

For our purposes, this is all we need to configure at this point. However, be aware of the Bookmarking configuration options.

  1. Click the ellipsis icon to the right of the Data Elements property.
  2. In the new "Data Elements" window that pops up, click the check boxes next to the elements you want bookmarked.
    • You can add bookmarks to any of the data elements. You can expand the Data Sections to add individual Data Fields within those sections as well if you like. However, if you add a Data Field that is a child of a Data Section, then that Data Section must be added too.

How To: Configure Metadata

About

The PDF Data Mapping behavior has the ability to create and insert additional metadata into the generated PDF as well, using information collected during Grooper's document processing. The metadata you are able to create falls into one of three categories:

  1. Editing the PDF's default metadata fields.
    • This includes the following metadata fields that are standard to every PDF file:
      • Title
      • Author
      • Subject
      • Created Date
      • Modified Date
      • Application (Used to establish the "creator" application which created the original file. This can be useful if the original file was created in a different application, like Microsoft Word, and converted to a PDF format by Grooper with a PDF Data Mapping behavior.)
  2. Creating custom metadata fields
    • This is done using extracted Data Field values collected during the Extract activity.
  3. Adding "Keywords" to the PDF metadata
    • This can be done using expression based or extraction based methods.

Notice what's not included in this list is the exported document's filename (e.g. "Im_a_file.pdf"). Filename mappings are always configured using an Export Behavior.


Prereqs - Data Extraction

If we're going to insert some metadata into these PDFs, that data has to come from somewhere. In broad terms, the metadata creation is done in one of two ways (or a combination of the two):

  1. Using expression based creation
    • In the case of the default PDF metadata fields and keywords, expressions can be used to populate the metadata. This gives you access to system data, classification information, extracted Data Field results, and various .NET functions to manipulate it.
  2. Using Data Field results
    • In the case of the custom PDF metadata, the custom fields are generated from Data Fields in the document's Data Model and their collected results from the Extract activity.
    • This means the document must be processed by the Extract activity in order to create and populate these custom fields.

Add the Behavior and Enable Metadata

Metadata is one of the configuration options for the PDF Data Mapping behavior. A Content Type Behavior can tell an activity (specifically the Export activity, in the case of PDF Data Mapping) how to use the Content Type to do something (how to use the Content Model's collected Data Fields and other information to edit the generated PDF's metadata, in this case).

  1. All Behaviors are added to a Content Type object.
    • We will add the PDF Data Mapping behavior to this Content Model named "PDF Data Mapping - UNESCO Packet".
  2. All Behaviors are added using the Behaviors property. Select the Behaviors property and press the ellipsis button at the end to add the PDF Data Mapping behavior.
  3. In the Behaviors editor window that pops up, click the "+" button to add a Behavior.
  4. Choose PDF Data Mapping from the list.

  1. Once added, you will see PDF Data Mapping added to the list on the left. Select it.
  2. To enable the metadata functionality, in the right panel, click the checkbox next to the Metadata property.

Edit Default PDF Metadata

Once enabled, the first six Metadata sub-properties all pertain to the default PDF metadata fields Grooper can edit: Title, Author, Subject, Creation Date, Modified Date, and Creator

These are edited with code expressions.

  1. The Title property corresponds to the PDF's "Title" field.
    • By default, this expression is set to CurrentDocument.ContentTypeName
      • This will make the title whatever the document's Document Type classification is.
      • In our case, these document folders are assigned the "UNESCO Application Packet" Document Type of our Content Model.
  2. The Author property corresponds to the PDF's "Author" field.
    • By default, this expression is set to LDAP.CurrentUserDisplayName
      • This will make the author the display name of whatever user is logged into the machine exporting the documents.
    • We've changed this to Candidate
      • This will make the author the result of the "Candidate" Data Field (which is "Dog O Doggerson" for our example document).
  3. The Creator property corresponds to the PDF's "Application" field.
    • This field is intended to be used when generating PDFs from different file types. For example, if the file was originally a Microsoft Word document, you might enter "Microsoft Word" to fill this field.
    • This field is blank by default, and we have left it so.
  4. The Subject property corresponds to the PDF's "Subject" field.
    • This field is blank by default.
    • We've decided to populate this field with the extracted proposal title, using the results of the "Title of Proposal" Data Field and the expression Title_of_Proposal
      • Note: Spaces in Data Fields must be replaced with underscores in expressions.
  5. The Creation Date' and Modification Date properties correspond to the PDF's "Created" and "Modified" fields.
    • By default, these both use the expression DateTime.Now
      • This will return the current system time of your machine at the time of export.

  1. When we open the document in Adobe Acrobat and view these fields using the "Document Properties" window, you can see the metadata this configuration generated for the PDF.

Add Keywords

Grooper can add keywords into the PDF's "Keywords" field in one of two ways, either using an expression or a referenced extractor's results.

In our case, we're going to use an expression to determine if the word count of the "Essay" document in the application packet is "Long", "Short", or "Normal".

  1. We will use the results of the "Essay Word Count" Data Field of our Data Model to do this.
  2. This Data Field's extraction is configured to count the number of words in the essay.

If the word count is above 600 words, we'll call that a long essay. If it's below 400 words, we'll call that a short essay. And if it's anywhere in between, we'll call it a normal essay.

The expression below uses a series of nested conditional statements using the IIf() function to accomplish this.

IIf(Essay_Information.Essay_Word_Count > 600, "Long Essay", IIf(Essay_Information.Essay_Word_Count > 400, "Normal Essay", "Short Essay"))

If the result is greater than 600 the keyword will evaluate to "Long Essay". Otherwise, if the result is less than 400, the keyword will evaluated to "Short Essay". If neither condition is met, the keyword evaluates to "Normal Essay".

To use this expression to add the keyword to the generated PDF's metadata, we will configure the Keywords property.

  1. In the Metadata sub-properties, select the Keywords property and click the ellipsis button at the end.
  2. In the expression editor that pops up, enter the expression you wish to use create the keywords.
    • As is the case with any expression editor, Grooper's IntelliSense code completion will aid you when writing your code expressions.
  3. Click "OK" when finished.

  1. When we open the generated PDF in Adobe Acrobat and view the "Document Properties" window, you can see the metadata this configuration generated for the PDF.
    • The keyword "Normal Essay" has been added to the keywords list.
    • The extracted value for the "Essay Word Count" field was 485, which is less than 600 and greater than 400. Evaluated by our Keywords expression, that returns a value of "Normal Essay".

Add Custom Metadata

Last but not least, you can add custom metadata fields to the generated PDF using extraction results from the document's Data Model. A custom metadata field is generated for every Data Field you choose in the Content Type's Data Model.

  1. Remember, we add Behaviors to Content Types (Typically a Content Model or a Document Type). In this case we're adding the PDF Data Mapping behavior to the Content Model
  2. Content Models and Document Types can have their own Data Models as one of their children. Configuring PDF Data Mapping on the Content Model, we will utilize its Data Model to export this custom metadata.
  3. This Data Model is configured with several Data Fields. These Data Fields will collect information about the "UNESCO Application Packet" and its component documents, such as the applicant's name and information about the proposal.
    • This will be done during the Extract activity. Once collected, PDF Data Mapping can insert the results into the generated PDF, creating one custom metadata field and corresponding result for each Data Field and its extracted result.

To do this, we will use the Export Data Fields option of PDF Data Mapping's Metadata properties.

  1. In the Metadata sub-properties, click the check box next to the Export Data Fields property to change it from False to True
  2. By default, once you enable this property, Grooper will export all available Data Fields to the Content Type on which PDF Data Mapping is configured.
    • You can be more selective about what you want to include using the Field Filter property.
    • This will give you a drop down list of all the Data Field nodes available for custom PDF metadata creation. You can check the box next to which ones you wish to include, leaving those Data Fields you wish to exclude unchecked.

  1. When we open the generated PDF in Adobe Acrobat and view the "Document Properties" window, you can see the custom metadata generated in the "Custom" tab.
  2. The Data Fields' names show up in the "Names" column.
    • Note: Data Fields in Data Sections will have their names appended to the Data Section's name. For example the "Proposal Title" Data Field in the "Proposal Information" Data Section's name translates to "Proposal_Information.Proposal_Title".
  3. The Data Field's result, collected by the Extract activity show up in the "Value" column.

Be aware the PDF file format has metadata fields already named "Title", "Author", "Subject", "Keywords", "Creator", "Producer", "CreationDate", "ModDate" and "Trapped".

You may run into an issue upon export if you have Data Fields in your Data Model who share one of these names. If using the Metadata creation capabilities of PDF Data Mapping, consider these names "taken" and adjust the name of the Data Field to be something different. For example, in this case a Data Field returning the title of the proposal listed on the application was changed from "Title" to "Title of Proposal"

  1. You can also access this data using the "Additional Metadata..." button in the "Description" tab.
  2. Select the "Advanced" item.
  3. You'll see all the generated custom metadata listed under the "http://ns.adobe.com/pdfx/1.3/" node.

How To: Generate the PDF using Merge or Export