2023.1:PDF Data Mapping (Behavior): Difference between revisions

Revision as of 15:36, 4 March 2021

2021

This article is in development for the upcoming version of Grooper, Grooper 2021. PDF Generate is a new Content Type Behavior option in 2021. This information is incomplete and/or may change by the time of release.

The PDF Generate Behavior is a Content Type Behavior designed to create an exportable PDF file with additional native PDF elements, using the classification and extraction content of a Batch Folder. This includes capabilities to export extracted data as PDF metadata, inserting bookmarks, and creating PDF annotations, such as highlighting, checkbox and signature widgets.

About

The PDF Generate Behavior (or PDF Generate for short) allows Grooper users to more fully leverage the capabilities of the PDF file type. The standard PDF Export Format in Grooper will use the page image files and their text data to create a multipage PDF file for each document folder upon Export. However, this is just the "display information" required to open and read the document. There's a lot more to what a PDF can be than just a multipage document with page images and machine readable text. PDF content can also include metadata, keywords, bookmarks, annotations, and more!

The PDF Generate Behavior creates an exportable PDF file that includes some of this additional content available to the PDF format. This is part of Grooper's evolving "Smart PDF Architecture". This is a design philosophy striving to more fully utilize the capabilities of the PDF file type and merge them with Grooper's own document processing capabilities.

The expanded PDF Generate Behavior functionality can be divided into three categories:

Annotations
Bookmarks
Metadata

AnnotationsBookmarksMetadata

Annotations

Annotations are additional objects you can add to PDF documents. Grooper uses information from Data Elements in a Data Model collected during the Extract activity to add these annotations (also called "widgets"). These annotations can increase the readability and add components for the reader to interact with the document, such as checkboxes and signature boxes.

The kinds of annotations you can add are:

Highlighting
Radio group buttons
Checkboxes
Signature boxes
Editable text boxes

Grooper uses the data instance information from extracted Data Fields to insert these annotations. For example, here we set up a Content Model with a Data Field named "Last Name". After the document's data was collected during the Extract activity, Grooper has a data instance it can associate with the "Last Name" Data Field, including its size and location coordinates on the document. We then used the Highlight Annotation to highlight the extracted last name on the document in yellow.

The size of all these annotations can also be adjusted using a Padding property if the size of the extracted data instance is too small for your needs.

Bookmarks

Bookmarks allow easy navigation for multipage PDF documents. When exporting a single PDF comprised of multiple child sub-documents, you can create bookmarks for each child document. This way, you can keep all the documents together in a single PDF file, easily navigating from one section of the document to another.

For example, this document is an application packet for a study abroad program. Each document in the packed was separated and classified as a child document folder of one Document Type or another. The PDF Generate Behavior was used to export the packet as a single PDF and a bookmark was inserted for each sub-document and named after its Document Type.

Grooper can create bookmarks from extracted Data Fields in the document as well.

Metadata

Metadata refers to a PDF file's content beyond the information required to display the document (the page images and encoded text data). Prior to implementing the PDF Generate Behavior functionality, Grooper only had access to edit minimal PDF metadata, notably the file's name upon export. The PDF Generate Behavior allows Grooper to alter and store additional collected metadata as well, including Data Field values collected during the Extract activity. This means Grooper can now create a viewable document with all the extracted data associated with the document itself, independent of that data being stored elsewhere (such as a database table or content management system).

This metadata can be accessed by opening a PDF in a PDF viewer application, such as Adobe Acrobat, and opening the "Document Properties" window from the File menu.

There are several pieces of metadata Grooper has access to.

All of the fields highlighted here can be created from Grooper, using an expression based syntax to access data extracted from the document and system information.
Note this gives Grooper the capability to generate and insert keywords into the PDF's "Keywords" field.
- In this case, Grooper has created a keyword based on the word count length of the essay in this study abroad application packet.
Extracted Data Field values can also be exported as PDF metadata. This information can be viewed either using the "Custom" tab or the "Additional Metadata..." window.

In the "Custom" tab...
You can see all the Data Fields Grooper extracted and their values as custom metadata for this document.

⚠

Be aware the PDF file format has metadata fields already named "Title", "Author", "Subject", "Keywords", "Creator", "Producer", "CreationDate", "ModDate" and "Trapped".

You may run into an issue upon export if you have Data Fields in your Data Model who share one of these names. If using the Metadata creation capabilities of the PDF Generate Behavior, consider these names "taken" and adjust the name of the Data Field to be something different. For example, in this case a Data Field returning the title of the proposal listed on the application was changed from "Title" to "Title of Proposal"

As a Behavior, PDF Generate is configured on a Content Type object, commonly a Content Model or a Document Type.

Here, we have selected a Content Model in the Node Tree.
To add a Behavior, select the Behaviors property and press the ellipsis button at the end.
This will bring up a dialogue window to add various behaviors to the Content Model, including the PDF Generate Behavior.
Add the PDF Generate Behavior to the list using the "Add" button.
Select PDF Generate Behavior from the listed options.

Once added, you will see a PDF Generate Behavior item added to the Behaviors list.
Selecting this Behavior, you will see property options to configure PDF creation.

The expanded PDF Generate Behavior functionality can be divided into three categories:

Metadata
Bookmarks
Annotations

Before we get into what these properties do, how to configure them, and how they effect the exported PDF, there's one key thing to keep in mind when using the PDF Generate Behavior.

Along with the PDF Generate Behavior, you will also need an Export Behavior configured to export a PDF formatted file. The PDF Generate Behavior does the job of configuring all the extra content (metadata, bookmarks and/or annotations) you want to add to the exported PDF. The Export Behavior does the job of actually creating the PDF (with the content configuration information supplied by the PDF Generate Behavior) and sending it off to an external storage platform.

Export Behaviors can be added to Content Types, such as the Content Model here.

To add an Export Behavior, press the "Add" button in a Behaviors list collector.
Select Export Behavior.

FYI

Export Behaviors can also be configured on the Export activity as local Export Behaviors to the activity configuration.

The benefit to adding it to a Content Model is you will often use information collected from a Content Model upon exporting your documents, such as a document folder's classified Document Type or collected data from a Data Model for field mapping purposes. You might as well do it now, adding it to the Content Model while you're adding the PDF Generate Behavior.

Once the Export Behavior is added, you will need to add an Export Definition. This will control how the file is exported, most notably where the file is exported. Whether exporting to a Windows file system, or an IMAP email mailbox, or a CMIS content management system, Grooper needs to know where to put the file. An Export Definition is how Grooper knows where the file goes.

Importantly for the PDF Generate Behavior, you will also use an Export Definition to define what type(s) of file you want to export. For whichever Export Definition you choose, you will need to ensure you've configured an Export Format for a PDF formatted file in order to export the generated PDF.

To add an Export Definition, select the property and press the ellipsis button at the end.
This will bring up an Export Definitions list collector window.
Here, we've added a CMIS Export definition, using a CMIS Connection to a local NTFS folder.
- The Export Definition is up to you and your needs. There are many different external storage platforms Grooper can export to.
Note, we've added a PDF Format configuration to the Export Formats property.

We will review some specifics of the PDF Format option's configuration later. For now, just be aware adding a PDF Export Format is a necessary step to export the PDF file generated by the PDF Generate Behavior.

How To

The following tutorials use a mock UNESCO Laura W. Bush Traveling Fellowship application to detail more specific set up for a PDF Generate Behavior. This is a packet of documents from a single applicant containing five different kinds of documents.

ApplicationEssayOther Documents

Application

This document consists of two pages. The first is a coversheet for the whole application packet. The second is the application form itself.

Primarily, this document will allow us to demonstrate the different kinds of annotations available when using a PDF Generate Behavior to generate a PDF file (using its Annotations property configuration). We will see how to set up one example of each of the following annotation types available in Grooper:

Highlight Annotation
Checkbox Widget
Radio Group Widget
Signature Widget
Textbox Widget

Importantly for any annotation type, a Data Field must be extracted in order to place the annotation. How does Grooper know what you want to highlight? It uses the extraction result of a Data Field, which includes information about where that value is located on the page. Even if the extraction result is just a blank zone without returning any actual information, Grooper needs some kind of coordinates to know where to place the annotation.

Since we're going to end up extracting some data in order to place these annotations, this will also give us the opportunity to see some of the collected data inserted as PDF metadata as well.

Essay

This application also includes an essay from the student. This document will demonstrate how to add keywords to the PDF's metadata.

We will use an extractor to count the number of words in the essay and configure the PDF Generate Behavior's Metadata properties to insert a keyword of "long essay", "medium essay", or "short essay" depending on the essay's length.

Configure PDF Generation for Annotations

AboutPrereqs - Data Fields & Extracted DataAdd the BehaviorHighlight AnnotationRadio Group WidgetCheckbox WidgetSignature WidgetTextbox Widget

About

The PDF Generate Behavior has the capability of inserting various annotations and native pdf widgets into the generated PDF. This increases the document's readability and adds functionality for the reader to interact with the document through widgets such as radio group buttons, checkboxes and signature fields.

We will demonstrate how to configure one example for each of the Annotation Types.

Highlight Annotation
- We will use Grooper to highlight the extraction result for the applicant's name on the document.
Radio Group Widget
- Radio buttons are useful for documents when you have a collection of choices listed and can only select one option. Such is the case for the "US Citizen" field on this document. You either are or are not a US Citizen and can answer "Yes" or "No". We will insert a radio group widget into this document to allow the user to toggle between these choices.
Checkbox Widget
- It seems every standard form uses checkboxes for one thing or another. This annotation will allow us to insert checkable checkboxes into the PDF file if located using OMR based extraction techniques. For example, the checkboxes here next to each checklist item for the application packet.
Signature Widget
- With the Signature Widget we can create a form-fillable signature box for the generated PDF. Notice the document as imported is not signed. With the PDF Generate Behavior we can add a signature box to the processed file. This way you could send the application back to the applicant and have them sign the document digitally.

We will also use the Textbox Widget to insert editable text boxes into the document's coversheet. These text boxes will also be populated with some corresponding information from the rest of the document.

A textbox will be created for the "Candidate" on the coversheet and populated with the applicant's first name, middle initial and last name (Dog O Doggerson).
A textbox will be created for the "Title" on the coversheet and populated with the proposal title (Who's a Good Boy?)
A textbox will be created for the "Country of Travel" on the coversheet and populated with the proposed travel country for the study abroad program (Japan).

Prereqs - Data Fields & Extracted Data

Before a PDF annotation can be generated, a document's data must be extracted. Put another way, the Extract activity must run before the Export activity (when the PDF Generate Behavior ultimately builds the PDF and exports it).

Each of the Annotation Types point to a Data Field in a Data Model as part of their configuration. If the Data Field does not collect data during the Extract activity, the PDF Generate Behavior won't know where to place the annotation.

We will ultimately configure the PDF Generate Behavior using the Behaviors property of this Content Model which we've named "PDF Generate - UNESCO Packet"
- Before we do that, we will need to ensure we have Data Fields that correspond to the annotations we want to place.
We've added the necessary Data Fields to the Content Model's Data Model.
The "Candidate", "Title of Proposal", and "Country of Travel" Data Fields will be used to place the Textbox Widget annotations.
The "Last Name", "First Name", and "Middle Initial" Data Fields will be used to place the Highlight Annotation annotations.
The "US Citizen" Data Field will be used to place the Radio Group Widget annotation.
The "Application", "Proposal Summary", "Essay", "Resume" and "Recommendation Letter" Data Fields will be used to place the Checkbox Widget annotations.
The "Signature" Data Field will be used to place the Signature Widget annotation.

Add the Behavior

Annotations are one of the configuration options for the PDF Generate Behavior. A Content Type Behavior can tell an activity (specifically the Export activity, in the case of the PDF Generate Behavior) how to use the Content Type to do something (how to use the Content Model's collected Data Fields to insert additional content when generating a PDF upon export, in this case).

All Behaviors are added to a Content Type object.
- We will add the PDF Generate Behavior to this Content Model named "PDF Generate - UNESCO Packet".
All Behaviors are added using the Behaviors property. Select the Behaviors property and press the ellipsis button at the end to add the PDF Generate Behavior.
This will bring up the Behaviors editor window.
Press the "Add" button to add a Behavior.
Choose "PDF Generate Behavior" from the list.

Once added, you will see the PDF Generate Behavior added to the list on the left. Select it to add an Annotation.
In the right panel, select the Annotations property and press the ellipsis button at the end.
This will bring up an Annotations collection editor.

We will detail collection and configuration of the various Annotation Types in the next tabs of this tutorial.

Highlight Annotation

We will look at the Highlight Annotation first. This annotation is what it sounds like. You can use it to highlight portions of a PDF.

In this example, we will use the Highlight Annotation to highlight the extracted "Last Name", "First Name" and "Middle Initial" fields from the application form.

Before Annotation

After Annotation

In the Annotations collection editor, press the "Add" button to add the Highlight Annotation annotation.
- Refer to the previous tab if you are unclear how we got to this window in Grooper Design Studio.
Select Highlight Annotation from the list.

This will add a Highlight Annotation to the Annotations list.
The only configuration that is strictly required is to indicate which Data Fields you wish to highlight. Use the Fields property to select which Data Fields you wish to highlight.
- Whatever result is returned by the selected Data Fields will be used to create the highlighted annotation.
Using the dropdown list, select the Data Fields you wish to highlight.
- In our case, we are choosing the "Last Name", "First Name", and "Middle Initial" Data Fields. Once collected by the Extract activity, Grooper will know where these results are located on the document. The Highlight Annotation annotation will then highlight the document as seen in the "After Annotation" image above.

Optionally, you can control how the highlight looks. Its color, size, opacity and whether or not there's a stroke around the highlighted rectangle.

For instance, we set the Padding property to 0.1in
- This will increase the size of the highlight rectangle by 0.1 inches on all sides.
- All annotations have the ability to be padded to increase their size, not just Highlight Annotation.
- You can also expand the Padding property's sub properties to adjust specific configurations for padding the Left, Top, Right, and Bottom' edges.
While we did not choose to do so, you can add a colored border around the highlighted rectangle by choosing a Border Style (such as Solid for a solid border or Dashed for a dashed line border)
- The Border Color and Border Width properties will further help you configure the border produced.
- Note: While the Border Color and Border Width properties are configured to 64, 64, 64 and 1pt by default, the Border Style is set to None by default. With no border produced, these properties are ignored. They will not be used to create a border until you choose a Border Style.
We also set the Fill Color to Yellow.
- Grooper defaults to green. This is the same green you see extraction results highlighted when you're testing out extractors in Grooper Design Studio.
- You can select colors using a dropdown list or use comma-separated values in the RBG color space. For example, "yellow" is also 255, 255, 128 in the RBG color space.

Radio Group Widget

The Radio Group Widget annotation allows you to add radio buttons to the document. Radio buttons are common PDF elements used to indicate a single choice from multiple options in a list. This PDF Generate Annotation Type uses OMR extraction techniques (such as Labeled OMR and Zonal OMR) to find existing checkboxes on the document. A group of radio buttons are then overlaid on top of the checkboxes when the PDF Generate Behavior builds the PDF file.

For example, we will create a Radio Group Widget annotation from the "US Citizen" Data Field's result. We have two choices, either "Yes" or "No". Only one or the other can be chosen. So, this is well suited for a radio button group.

Before Annotation

After Annotation

In the Annotations collection editor, press the "Add" button to add the Radio Group Widget annotation.
- Refer to the "Add the Behavior" tab if you are unclear how we got to this window in Grooper Design Studio.
Select Radio Group Widget from the list.

This will add a Radio Group Widget to the Annotations list.
The only configuration that is strictly required is to indicate which Data Fields you wish to use to create the radio buttons. Use the Fields property to select these Data Fields.
- Whatever result is returned by the selected Data Fields will be used to draw and insert the radio buttons.
- You may use the Padding property to adjust the size of the radio button if you desire.
- These Data Fields must use an OMR based extraction method (Labeled OMR, Ordered OMR, or Zonal OMR) to insert the radio buttons.
Using the dropdown list, select the Data Fields you wish to use to create the group of radio buttons.
- In our case, we are choosing the "US Citizen" Data Field. Once collected by the Extract activity, Grooper will know which results you want to use to create the radio buttons. This will include the checkbox locations and check states stored in the document's layout data. The Radio Group Widget annotation will then insert radio buttons into the generated PDF as seen in the "After Annotation" image above.

Let's briefly look at this "US Citizen" Data Field and see what's happening behind the scenes when the PDF Generate Behavior creates the radio buttons.

We have selected the "US Citizen" Data Field in the Grooper Node Tree.
This Data Field uses the Labeled OMR extractor to return its result, looking for checkboxes next to the labels "Yes" and "No" on the document.
The box next to "Yes" is checked. This is ultimately the result returned to the "US Citizen" Data Field.
- This is how the Radio Group Widget annotation knows where to place the radio button. The data instance used to insert the PDF radio button is drawn around the detected box (in this case highlighted in green in the Document Viewer).
- Since this is the detected checked result, the radio button is configured as "pressed" upon outputting the generated PDF.
The box next to "No" is not checked. The Radio Group Widget will also create radio buttons for the unchecked boxes next to labels on the document as well.
- The alternate candidate data instances are used to insert the other PDF radio buttons in the group (in this case highlighted in red in the Document Viewer).
- The unchecked boxes must be detected from a Box Detection or Box Removal IP Command in order to be inserted in the generated PDF. They must be present in the document's layout data file before the Extract activity runs.
- Since this is detected as an unchecked result, the radio button is not pressed upon outputting the generated PDF.

FYI

In the case of every Annotation Type, the PDF Generate Behavior inserts the annotation by overlaying it on top of the document. This can be important to keep in mind for all annotations but is often particularly relevant when inserting radio buttons using the Radio Group Widget.

Notice the original image for this document used checkboxes, not radio buttons. We see an "X" inside of a square box.

The radio button annotations are simply overlaid on the page's image. You can actually see the edges of the square box persist in the generated PDF (Here, highlighted in yellow for your viewing pleasure).

In this case, the boxes were stored in the layout data using the Box Detection IP Command. This will find and store the checkbox locations and check states, but not actually alter the image in any way.

Maybe you care about this, and maybe you don't. If you do, you may consider using the Box Removal IP Command instead. Box Removal will also find and store the checkbox locations and their check states, but it will also digitally remove the checkboxes from the document's image.

In this case, the boxes were stored in the layout data using the Box Removal IP Command. Since the boxes are removed before the Export activity, the edges of the boxes are not present on the final image. The radio button annotations are placed on blank pixels.

Checkbox Widget

WIP

The Checkbox Widget documentation needs to be finalized after getting some guidance from dev. If it seems incomplete or images don't match up with text, that is why.

The PDF Generate Behavior also has the capability to insert form-fillable checkboxes as well, using the Checkbox Widget Annotation Type. This Annotation Type also uses OMR extraction techniques (such as Labeled OMR and Zonal OMR) to find existing checkboxes on the document. It works a lot like the Radio Group Widget annotation, just instead of radio buttons, editable checkboxes are overlaid on the document.

For example, we will create a Checkbox Widget annotation for the checkboxes in the "Checklist" section of this document, the "Application", "Proposal Summary", "Essay", "Resume" and "Recommendation Letter" Data Fields. These are Boolean OMR checkboxes, returning "true" if the box next to the corresponding label is checked, and "false" if unchecked. In either case, checked or not, the Checkbox Widget will insert an editable checkbox element into the generated PDF.

Before Annotation

After Annotation

In the Annotations collection editor, press the "Add" button to add the Checkbox Widget annotation.
- Refer to the "Add the Behavior" tab if you are unclear how we got to this window in Grooper Design Studio.
Select Checkbox Widget from the list.

This will add a Checkbox Widget to the Annotations list.
The only configuration that is strictly required is to indicate which Data Fields you wish to use to create the checkboxes. Use the Fields property to select these Data Fields.
- Whatever result is returned by the selected Data Fields will be used to draw and insert the checkboxes.
- You may use the Padding property to adjust the size of the checkboxes if you desire.
- These Data Fields must use an OMR based extraction method (Labeled OMR, Ordered OMR, or Zonal OMR) to insert the checkboxes.
Using the dropdown list, select the Data Fields you wish to use to create the checkboxes.
- In our case, we are choosing the "Application", "Proposal Summary", "Essay", "Resume" and "Recommendation Letter" Data Fields. Once collected by the Extract activity, Grooper will know which results you want to use to create the checkboxes. This will include the checkbox locations and check states stored in the document's layout data. The Checkbox Widget annotation will then insert checkboxes into the generated PDF as seen in the "After Annotation" image above.

Signature Widget

Form-fillable signature boxes can be inserted using the Signature Widget annotation. This Annotation Type uses a zonal extraction type (such as Detect Signature or Highlight Zone) to draw the boundaries of the inserted signature widget. This allows you to create a document that can be digitally signed straight from Grooper upon exporting the generated PDF.

For example, we will create a Signature Widget annotation for the signature line on the application form, using the "Signature" Data Field of our Data Model. The Checkbox Widget will insert an interactable signature element into the generated PDF.

Before Annotation

After Annotation

In the Annotations collection editor, press the "Add" button to add the Signature Widget annotation.
- Refer to the "Add the Behavior" tab if you are unclear how we got to this window in Grooper Design Studio.
Select Signature Widget from the list.

This will add a Signature Widget to the Annotations list.
The only configuration that is strictly required is to indicate which Data Fields you wish to use to create the signature box. Use the Fields property to select these Data Fields.
- Whatever result is returned by the selected Data Fields will be used to draw and insert the signature box widget.
- You may use the Padding property to adjust the size of the signature box if you desire.
- Zonal based extraction methods (such as Signature Detection and Highlight Zone) are typically used as the Data Field's extractor type.
Using the dropdown list, select the Data Fields you wish to use to create the checkboxes.
- In our case, we are choosing the "Signature" Data Field. Once collected by the Extract activity, Grooper will be supplied the size and location of the Data Field's extraction zone, which will form the size and location of the PDF signature widget. The Signature Widget annotation will then insert the form-fillable signature box into the generated PDF as seen in the "After Annotation" image above.

Just like any Annotation Type, the extraction result from the Data Field is critical for placing the signature annotation on the generated PDF. Let's look at the "Signature" Data Field's result to understand a little better how these results are used to create the signature widget.

In our case, we're using the Detect Signature extractor type to supply these results. The Detect Signature extractor is perfectly suited for the Signature Widget Annotation Type.

It actually combines both Zonal and OMR based extraction techniques to determine if a signature is present in the zone. It sets the boundaries of where you expect to find a signature using Zonal based methods and detects if the signature is present by counting the percentage of filled pixels in the zone, which is the basis of OMR based extraction methods. You can then output different values if the zone is filled above or below a certain percentage. In this case, the extractor returns "Not Signed" because there aren't enough pixels present in the extraction zone to count as filled. If there were a signature present, there'd be more pixels present, accounting for a higher filled percentage.

This is great for our purposes because it gives us the exact information we need for the Signature Widget, which is an extraction zone. Grooper needs a data instance indicating the size and location for the generated signature widget.

But wait there's more! We also get some bonus information about whether or not there's a signature present. Does the Signature Widget Annotation Type need to know if there's a signature present? No. It does not. It will place the widget no matter what the result is. But might that information be otherwise useful to you? Probably.

We have selected the "Signature" Data Field in our Data Model.
This Data Field uses the Detect Signature extractor to draw the extraction zone used to insert the signature widget.
This extractor uses the Text Region Location option.
This gives us the ability to anchor the extraction zone to an extractable text anchor, using the Text Extractor property.
- In this case we've anchored the zone to the word "Signature" outlined in blue in the document viewer. Where do we want to place the extraction zone (and ultimately the signature widget)? On the signature line. How do we know where that line is? It's above the text label "Signature".
The extraction zone itself is drawn using the Translation and Adjustment properties.
- This allows us to set the size (Adjustment) and location (Translation) of the extraction zone (and ultimately the signature widget) relative to the Text Extractor's result.
- The extraction zone is the green rectangle in the document viewer.
When the PDF Generate Behavior builds the PDF, using the Signature Widget annotation, the extraction zone's size and location forms the inserted signature widget.

Textbox Widget

The Textbox Widget Annotation Type will insert editable text boxes into the generated PDF. One simple way to use this functionality is to use the Highlight Zone extractor type to place a blank zone where you want to place an empty text box on the PDF. However, any extractor type can be used to define the textbox's location. Furthermore, if the Data Field used to create the annotation collects a valued during the Extract activity, not only will a textbox be inserted into the generated PDF, but it will be prefilled with the Data Field's extracted value upon export.

For example, we will use the Textbox Widget functionality to fill out the blank coversheet on the first page of our application packet. We will end up using a Highlight Zone extractor to define the size and location of the text box. However, we're going to go one step further and populate the Data Field's used with some information from other Data Field's in our Data Model. By the end of it, the PDF Generate Behavior will not only insert editable textboxes into the generated PDF, but fill them in with text. By the end of it, we end up with this blank coversheet automatically populated with some information collected during the Extract activity.

Before Annotation

After Annotation

In the Annotations collection editor, press the "Add" button to add the Textbox Widget annotation.
- Refer to the "Add the Behavior" tab if you are unclear how we got to this window in Grooper Design Studio.
Select Textbox Widget from the list.

This will add a Textbox Widget to the Annotations list.
The only configuration that is strictly required is to indicate which Data Fields you wish to use to create the signature box. Use the Fields property to select these Data Fields.
- Whatever result is returned by the selected Data Fields will be used to draw and insert the textbox widget. If that Data Field collected a value during the Extract activity, it will also be filled with the returned value.
Using the dropdown list, select the Data Fields you wish to use to create the checkboxes.
- In our case, we are choosing the "Candidate", "Title of Proposal" and "Country of Travel" Data Fields. Once collected by the Extract activity, Grooper will be supplied the sizes and locations of the Data Field's data instances for each result. This will form the size and location of the textbox widget. The Textbox Widget annotation will then insert the form-fillable textbox into the generated PDF as seen in the "After Annotation" image above. These boxes will also be prefilled with the extraction results from each Data Field.

The Textbox Widget annotation has some additional configuration options as well.

As with all Annotation Types, you can optionally adjust the size of the annotation using the Padding property.
You can also change the font and font size of the editable text in the textbox using the Font Name and Font Size.

As far as looking behind the scenes, there's at least two things going on with how we've set up these Data Fields' extraction, ultimately supplying the result used to insert the Textbox Widget annotation.

First, we used the Highlight Zone extractor type to draw the textbox, defining the size and location of the annotation upon generating the PDF.

We have selected the "Candidate" Data Field in our 'Data Model.
Each Data Field's Value Extractor is set to Highlight Zone.
We used the Relative Region Location option to anchor an extraction zone to the box next to the label "Candidate".
- This will form the size and and location of the inserted textbox annotation.

Second, we used an expression to return a value, using the results of other Data Fields in our Data Model.

We've used the Calculated Value property (in Calculate Mode Always Set) to return the full name of the candidate extracted by the "Last Name", "First Name", and "Middle Initial" Data Fields
- The full expression is as follows: Applicant_Information.First_Name + " " + Applicant_Information.Middle_Initial + " " + Applicant_Information.Last_Name
This will take the extraction results of these three Data Fields and jam them together with space characters in between them.
However, if we test extraction at this point, we're going to get an error.
- We're in the wrong scope! We need to go up to the Data Model's level and test extraction there. We need the full Data Model's results to do what we're trying to do here. Testing extraction on this "Candidate" Data Field, it can't "see" the "Last Name", "First Name" and "Middle Initial" Data Fields results to combine them.

Once we test extraction on the Data Model you'll see what results are actually collected by the Extract activity.
The Calculated Value expression we configured forms one result for the "Candidate"...
...using the results of the "Last Name", "First Name" and "Middle Initial" Data Field's results.
With a result returned and zone drawn upon extract, the Textbox Widget annotation has all the information it needs to place the form-fillable textbox and fill it with the results.

FYI

This certainly isn't the only way to set up a Data Field for a Textbox Widget. This is just how we did it for the point of illustrating the Textbox Widget functionality. You are not required to use the Highlight Zone extractor type. You can use whatever extractor type best suits your document's needs. Often Grooper users will use the Reference extractor to point to a Data Type's results and adjust the size of the Textbox Widget using its Padding property.

Configure PDF Generation for Bookmarks

AboutPrereqs - Split Pages, Separation and ClassificationAdd the Behavior and Configure It for Bookmarking

About

Bookmarks in PDFs aid readers when navigating through multipage documents. The PDF Generate Behavior can insert bookmarks into the generated PDF to take advantage of this functionality. This can be done in one of two ways (or both):

Using a Batch Folder's child document folders.
Using the document's extracted Data Fields.

We will focus on the bookmarking method (as it is more common). Often it is the case you will import a file into Grooper that has multiple documents inside you want to separate and classify, but otherwise all belong together in one way or another.

Such is the case with our study abroad application packet. The application packet as a whole consists of five separate and distinguishable documents.

The application itself (and a coversheet)
A proposal summary
The student's resume
A letter of recommendation
An essay

Our goal is to create a bookmark in the generated PDF file for each of these component documents (or child documents as we will come to call them).

Rather than exporting five separate PDF files for each component document, we will export a single PDF for the whole packet with navigable bookmarks corresponding to each component document.

Application - For the application itself (and its coversheet)
Proposal Summary - For the proposal summary
Resume - For the student's resume
Rec Letter - For the letter of recommendation
Essay - For the essay

Prereqs - Split Pages, Separation and Classification

In order to accomplish this goal, we're going to have to do some things to this application packet before we configure the PDF Generate Behavior.

By the end of it, we're looking for a Batch whose documents have a structure like this. The documents in this batch consist of two Batch Folder levels.

Folder Level 1: This is the parent document folder. It is the container for the full document. All seven pages of the application packet in this case.
Folder Level 2: These are the child document folders for the parent document. They are the containers for each component document of the full application packet.

This is what we want to end up with. How did we get there? Long story short, we have some document separation and classification requirements before we can insert bookmarks in the generated PDF. The bookmarks are inserted for each child document folder and named after their classified Document Type's name. In order to do that, we need to split out the pages of the imported document, separate them into child document folders, and classify them first.

The full application document came into Grooper like this. A 7 page PDF file with each of these 5 component documents was imported into a new Batch. This is now the parent document folder at Folder Level 1.

But there's documents in them there document! How do we get them out?

First, we need to use the Split Pages activity to create child Batch Page objects.

This will split out the pages of the imported PDF file, creating one child Batch Batch for each page in PDF on the parent document folder. Now we have page objects we can manipulate in our Batch.

Now that we have Batch Page objects in our Batch, we can use the Separate activity to insert the second folder level. This is the first step in organizing these pages into child documents. We need to distinguish between one collection of pages as a document and another collection of pages as a document. Creating a folders is the first part of that equation.

Now, we have child document folders for this parent document folder, but they are just blank folders. There is nothing to distinguish one folder from the next.

⚠

By default, the Separate activity runs on the Batch level scope, inserting folders at Folder Level 1. When separating child documents like this, you will need to change the Scope property of the Separate activity to run it at the Folder Level 1 scope. This will separate the loose pages of folders at Level 1, inserting child document folders at Level 2 below the parent folder at Level 1.

And, that's the second part of the organization equation, classification. Next, these folders will be assigned a Document Type from our Content Model using the Classify activity.

⚠

By default, the Classify activity runs on the Folder Level 1 scope, classifying document folders at the first folder level in the Batch hierarchy. We want to classify the child document folders at Folder Level 2. When classifying child document folders like this, you will need to change the Scope property of the Classify activity to run at the Folder Level 2 scope.

Furthermore, that parent document folder would need a Document Type assigned to it at some point as well. The Batch Process for this Batch might have two Classify activities. One running on Folder Level 1 to classify the parent document folder and another running on Folder Level 2 to classify the child document folders.

Now, we have everything we need to configure the bookmarking functionality of the PDF Generate Behavior. Bookmarks will be created every time a new child document is encountered and named after the Document Type assigned to that folder.

When the full PDF is generated, a bookmark named "Application" will be inserted at the first page of the PDF. That child document is two pages long. The third page of the full PDF will be the proposal summary. So a bookmark named "Proposal Summery" will be inserted at page three. A "Resume" bookmark will be inserted at page four. And so on.

FYI

There are many ways to separate and classify documents, including ESP Auto Separation which both separates and classifies documents time with a single activity (just Separate). But this is the general idea to get us where we need to go.

One way or another, create classified child document folders from a parent document folder. That way when we generate the PDF for the parent document folder upon export, bookmarks will be created for the classified child document folders.

Add the Behavior and Configure It for Bookmarking

Bookmarking is one of the configuration options for the PDF Generate Behavior. A Content Type Behavior can tell an activity (specifically the Export activity, in the case of the PDF Generate Behavior) how to use the Content Type to do something (in this case, how to use the Content Model's Document Types to insert bookmarks into the PDF upon export).

All Behaviors are added to a Content Type object.
- We will add the PDF Generate Behavior to this Content Model named "PDF Generate - UNESCO Packet".
All Behaviors are added using the Behaviors property. Select the Behaviors property and press the ellipsis button at the end to add the PDF Generate Behavior.
This will bring up the Behaviors editor window.
Press the "Add" button to add a Behavior.
Choose "PDF Generate Behavior" from the list.

Once added, you will see the PDF Generate Behavior added to the list on the left. Select it.
To enable the bookmarking functionality, in the right panel, select the Bookmarking property.
Change it from Disabled to Enabled.

For our purposes, this is all we need to configure at this point. However, be aware of the Bookmarking configuration options.

The Label property allows you to alter the generated bookmark names using an expression editor.
- Left blank, Grooper will use the Document Type's name for each child document encountered, which is exactly what we want to do. We will leave this property unconfigured.
The Enable Data Bookmarks allows you to create bookmarks using the locations of extracted Data Field results.
- Set this property from False to True if you want to use this feature.
- Once set to True you may also choose to insert bookmarks for every Data Field in the Content Type's Data Model or manually select which ones you want to use to create bookmarks.

Configure PDF Generation for Metadata

AboutPrereqs - Data ExtractionAdd the Behavior and Enable Metadata

About

The PDF Generate Behavior has the ability to create and insert additional metadata into the generated PDF as well, using information collected during Grooper's document processing. The metadata you are able to create falls into one of three categories:

Editing the PDF's default metadata fields. This includes the following metadata fields that are standard to every PDF file:
- Title
- Author
- Subject
- Creation Date
- Modification Date
- Application (Used to establish the "creator" application which created the original file. This can be useful if the original file was created in a different application, like Microsoft Word, and converted to a PDF format with the 'PDF Generate Behavior.)
Creating custom metadata fields
- This is done using extracted Data Field values collected during the Extract activity.
Adding "Keywords" to the PDF metadata
- This can be done using expression based or extraction based methods.

⚠	Notice what's not included in this list is the exported document's filename (i.e. "Im_a_file.pdf"). Filename mappings are always configured using an Export Behavior.

Prereqs - Data Extraction

Add the Behavior and Enable Metadata

Metadata is one of the configuration options for the PDF Generate Behavior. A Content Type Behavior can tell an activity (specifically the Export activity, in the case of the PDF Generate Behavior) how to use the Content Type to do something (how to use the Content Model's collected Data Fields and other information to edit the generated PDF's metadata, in this case).

All Behaviors are added to a Content Type object.
- We will add the PDF Generate Behavior to this Content Model named "PDF Generate - UNESCO Packet".
All Behaviors are added using the Behaviors property. Select the Behaviors property and press the ellipsis button at the end to add the PDF Generate Behavior.
This will bring up the Behaviors editor window.
Press the "Add" button to add a Behavior.
Choose "PDF Generate Behavior" from the list.

Once added, you will see the PDF Generate Behavior added to the list on the left. Select it.
To enable the bookmarking functionality, in the right panel, select the Metadata property.
Change it from Disabled to Enabled.

Export the Generated PDF

Version Differences

Behaviors are a new functionality in Grooper 2021. Much of the PDF Generate Behavior functionality was not available in previous versions. Prior to version 2021, only annotation creation was possible using the Generate PDF activity. In version 2021, this activity has been replaced by the PDF Generate Behavior, expanding its capabilities to generate bookmarks and document metadata as well.

@@ Line 282: / Line 282: @@
 {|cellpadding=10 cellspacing=5
 |style="width:40%" valign=top|
-Annotations are one of the configuration options for the ''PDF Generate Behavior''.  They are one way a '''Content Type''' '''''Behavior''''' can tell an activity (specifically the '''Export''' activity) how to use the '''Content Type''' to do something (specifically how to use the '''Content Model's''' collected '''Data Fields''' to insert additional content when generating a PDF upon export).
+Annotations are one of the configuration options for the ''PDF Generate Behavior''.  A '''Content Type''' '''''Behavior''''' can tell an activity (specifically the '''Export''' activity, in the case of the ''PDF Generate Behavior'') how to use the '''Content Type''' to do something (how to use the '''Content Model's''' collected '''Data Fields''' to insert additional content when generating a PDF upon export, in this case).
 # All '''''Behaviors''''' are added to a '''Content Type''' object.
@@ Line 672: / Line 672: @@
 <tab name="About" style="margin:20px">
 === About ===
-{|cellpadding=10 cellspacing=5
-|style="width:40%" valign=top|
 Bookmarks in PDFs aid readers when navigating through multipage documents.  The ''PDF Generate Behavior'' can insert bookmarks into the generated PDF to take advantage of this functionality.  This can be done in one of two ways (or both):
@@ Line 773: / Line 772: @@
 |}
 </tab>
-<tab name="Add the Behavior and Configure it For Bookmarking" style="margin:20px">
+<tab name="Add the Behavior and Configure It for Bookmarking" style="margin:20px">
-=== Add the Behavior ===
+=== Add the Behavior and Configure It for Bookmarking===
 {|cellpadding=10 cellspacing=5
 |style="width:40%" valign=top|
-Annotations are one of the configuration options for the ''PDF Generate Behavior''.  They are one way a '''Content Type''' '''''Behavior''''' can tell an activity (specifically the '''Export''' activity) how to use the '''Content Type''' to do something (in this case, how to use the '''Content Model's''' '''Document Types''' to insert bookmarks into the PDF upon export).
+Bookmarking is one of the configuration options for the ''PDF Generate Behavior''.  A '''Content Type''' '''''Behavior''''' can tell an activity (specifically the '''Export''' activity, in the case of the ''PDF Generate Behavior'') how to use the '''Content Type''' to do something (in this case, how to use the '''Content Model's''' '''Document Types''' to insert bookmarks into the PDF upon export).
 # All '''''Behaviors''''' are added to a '''Content Type''' object.
@@ Line 810: / Line 809: @@
 === Configure PDF Generation for Metadata ===
+<tabs style="margin:20px">
+<tab name="About" style="margin:20px">
+=== About ===
+{|cellpadding=10 cellspacing=5
+|style="width:50%" valign=top|
+The ''PDF Generate Behavior'' has the ability to create and insert additional metadata into the generated PDF as well, using information collected during Grooper's document processing.  The metadata you are able to create falls into one of three categories:
+# Editing the PDF's default metadata fields.  This includes the following metadata fields that are standard to every PDF file:
+#* Title
+#* Author
+#* Subject
+#* Creation Date
+#* Modification Date
+#* Application (Used to establish the "creator" application which created the original file.  This can be useful if the original file was created in a different application, like Microsoft Word, and converted to a PDF format with the '''PDF Generate Behavior''.)
+# Creating custom metadata fields
+#* This is done using extracted '''Data Field''' values collected during the '''Extract''' activity.
+# Adding "Keywords" to the PDF metadata
+#* This can be done using expression based or extraction based methods.
+{|cellpadding="10" cellspacing="5"
+|-style="background-color:#f89420; color:white"
+|style="font-size:22pt"|'''&#9888;'''
+|
+Notice what's not included in this list is the exported document's ''filename'' (i.e. "Im_a_file.pdf").  Filename mappings are always configured using an ''Export Behavior''.
+|}
+|
+[[File:Pdf-generate-howto-43.png]]
+|}
+</tab>
+<tab name="Prereqs - Data Extraction" style="margin:20px">
+=== Prereqs - Data Extraction ===
+</tab>
+<tab name="Add the Behavior and Enable Metadata" style="margin:20px">
+=== Add the Behavior and Enable Metadata ===
+{|cellpadding=10 cellspacing=5
+|style="width:40%" valign=top|
+Metadata is one of the configuration options for the ''PDF Generate Behavior''.  A '''Content Type''' '''''Behavior''''' can tell an activity (specifically the '''Export''' activity, in the case of the ''PDF Generate Behavior'') how to use the '''Content Type''' to do something (how to use the '''Content Model's''' collected '''Data Fields''' and other information to edit the generated PDF's metadata, in this case).
+# All '''''Behaviors''''' are added to a '''Content Type''' object.
+#* We will add the ''PDF Generate Behavior'' to this '''Content Model''' named "PDF Generate - UNESCO Packet".
+# All '''''Behaviors''''' are added using the '''''Behaviors''''' property.  Select the '''''Behaviors''''' property and press the ellipsis button at the end to add the ''PDF Generate Behavior''.
+# This will bring up the '''''Behaviors''''' editor window.
+# Press the "Add" button to add a '''''Behavior'''''.
+# Choose "PDF Generate Behavior" from the list.
+|
+[[File:Pdf-generate-howto-06.png]]
+|-
+|valign=top|
+# Once added, you will see the ''PDF Generate Behavior'' added to the list on the left.  Select it.
+# To enable the bookmarking functionality, in the right panel, select the '''''Metadata''''' property.
+# Change it from ''Disabled'' to ''Enabled''.
+|}
+</tab>
+</tabs>
+=== Export the Generated PDF ===
 == Version Differences ==
 '''''Behaviors''''' are a new functionality in '''Grooper 2021'''.  Much of the ''PDF Generate Behavior'' functionality was not available in previous versions.  Prior to version '''2021''', only annotation creation was possible using the '''[[Generate PDF]]''' activity.  In version '''2021''', this activity has been replaced by the ''PDF Generate Behavior'', expanding its capabilities to generate bookmarks and document metadata as well.