2023.1:PDF Data Mapping (Behavior): Difference between revisions
Dgreenwood (talk | contribs) No edit summary |
Dgreenwood (talk | contribs) No edit summary |
||
| Line 3: | Line 3: | ||
|style="font-size:14pt"| | |style="font-size:14pt"| | ||
'''2021''' | '''2021''' | ||
|This article is in development for the upcoming version of Grooper, '''Grooper 2021'''. ''PDF | |This article is in development for the upcoming version of Grooper, '''Grooper 2021'''. ''PDF Data Mapping'' is a new '''Content Type''' '''''Behavior''''' option in 2021. This information is incomplete and/or may change by the time of release. | ||
|} | |} | ||
<blockquote style="font-size:14pt"> | <blockquote style="font-size:14pt"> | ||
''PDF Data Mapping'' is a '''Content Type''' '''''Behavior''''' designed to create an exportable PDF file with additional native PDF elements, using the classification and extraction content of a '''Batch Folder'''. This includes capabilities to export extracted data as PDF metadata, inserting bookmarks, and creating PDF annotations, such as highlighting, checkbox and signature widgets. | |||
</blockquote> | </blockquote> | ||
| Line 20: | Line 20: | ||
|} | |} | ||
''PDF Data Mapping'' '''''Behavior''''' allows Grooper users to more fully leverage the capabilities of the PDF file type. The standard PDF '''''Export Format''''' in Grooper will use the page image files and their text data to create a multipage PDF file for each document folder upon '''Export'''. However, this is just the "display information" required to open and read the document. There's a lot more to what a PDF can be than just a multipage document with page images and machine readable text. PDF content can also include metadata, keywords, bookmarks, annotations, and more! | |||
''PDF Data Mapping'' creates an exportable PDF file that includes some of this additional content available to the PDF format. This is part of Grooper's evolving "Smart PDF Architecture". This is a design philosophy striving to more fully utilize the capabilities of the PDF file type and merge them with Grooper's own document processing capabilities. | |||
The expanded ''PDF | The expanded ''PDF Data Mapping'' functionality can be divided into three categories: | ||
* '''''Annotations''''' | * '''''Annotations''''' | ||
* '''''Bookmarks''''' | * '''''Bookmarks''''' | ||
| Line 57: | Line 57: | ||
Bookmarks allow easy navigation for multipage PDF documents. When exporting a single PDF comprised of multiple child sub-documents, you can create bookmarks for each child document. This way, you can keep all the documents together in a single PDF file, easily navigating from one section of the document to another. | Bookmarks allow easy navigation for multipage PDF documents. When exporting a single PDF comprised of multiple child sub-documents, you can create bookmarks for each child document. This way, you can keep all the documents together in a single PDF file, easily navigating from one section of the document to another. | ||
For example, this document is an application packet for a study abroad program. Each document in the packet was separated and classified as a child document folder of one '''Document Type''' or another. | For example, this document is an application packet for a study abroad program. Each document in the packet was separated and classified as a child document folder of one '''Document Type''' or another. ''PDF Data Mapping'' was used to export the packet as a single PDF and a bookmark was inserted for each sub-document and named after its '''Document Type'''. | ||
Grooper can create bookmarks from extracted '''Data Fields''' in the document as well. | Grooper can create bookmarks from extracted '''Data Fields''' in the document as well. | ||
| Line 68: | Line 68: | ||
{|cellpadding=10 cellspacing=5 | {|cellpadding=10 cellspacing=5 | ||
|valign=top style="width:50%"| | |valign=top style="width:50%"| | ||
Metadata refers to a PDF file's content beyond the information required to display the document (the page images and encoded text data). Prior to implementing the ''PDF | Metadata refers to a PDF file's content beyond the information required to display the document (the page images and encoded text data). Prior to implementing the ''PDF Data Mapping'' functionality, Grooper only had access to edit minimal PDF metadata, notably the file's name upon export. ''PDF Data Mapping'' allows Grooper to alter and store additional collected metadata as well, including '''Data Field''' values collected during the '''Extract''' activity. This means Grooper can now create a viewable document with all the extracted data associated with the document itself, independent of that data being stored elsewhere (such as a database table or content management system). | ||
This metadata can be accessed by opening a PDF in a PDF viewer application, such as Adobe Acrobat, and opening the "Document Properties" window from the File menu. | This metadata can be accessed by opening a PDF in a PDF viewer application, such as Adobe Acrobat, and opening the "Document Properties" window from the File menu. | ||
| Line 95: | Line 95: | ||
Be aware the PDF file format has metadata fields already named "Title", "Author", "Subject", "Keywords", "Creator", "Producer", "CreationDate", "ModDate" and "Trapped". | Be aware the PDF file format has metadata fields already named "Title", "Author", "Subject", "Keywords", "Creator", "Producer", "CreationDate", "ModDate" and "Trapped". | ||
You may run into an issue upon export if you have '''Data Fields''' in your '''Data Model''' who share one of these names. If using the '''''Metadata''''' creation capabilities of | You may run into an issue upon export if you have '''Data Fields''' in your '''Data Model''' who share one of these names. If using the '''''Metadata''''' creation capabilities of ''PDF Data Mapping'', consider these names "taken" and adjust the name of the '''Data Field''' to be something different. For example, in this case a '''Data Field''' returning the title of the proposal listed on the application was changed from "Title" to "Title of Proposal" | ||
|} | |} | ||
| | | | ||
| Line 110: | Line 110: | ||
# To add a '''''Behavior''''', select the '''''Behaviors''''' property and press the ellipsis button at the end. | # To add a '''''Behavior''''', select the '''''Behaviors''''' property and press the ellipsis button at the end. | ||
# This will bring up a dialogue window to add various behaviors to the '''Content Model''', including the ''PDF Generate Behavior''. | # This will bring up a dialogue window to add various behaviors to the '''Content Model''', including the ''PDF Generate Behavior''. | ||
# Add | # Add ''PDF Data Mapping'' to the list using the "Add" button. | ||
# Select ''PDF | # Select ''PDF Data Mapping'' from the listed options. | ||
| | | | ||
[[File:Pdf-generate-about-01.png]] | [[File:Pdf-generate-about-01.png]] | ||
|- | |- | ||
|valign=top| | |valign=top| | ||
# Once added, you will see a ''PDF | # Once added, you will see a ''PDF Data Mapping'' item added to the '''''Behaviors''''' list. | ||
# Selecting this '''''Behavior''''', you will see property options to configure PDF creation. | # Selecting this '''''Behavior''''', you will see property options to configure PDF creation. | ||
| Line 126: | Line 126: | ||
Before we get into what these properties do, how to configure them, and how they effect the exported PDF, there's one key thing to keep in mind when using | Before we get into what these properties do, how to configure them, and how they effect the exported PDF, there's one key thing to keep in mind when using''PDF Data Mapping''. | ||
| | | | ||
[[File:Pdf-generate-about-02.png]] | [[File:Pdf-generate-about-02.png]] | ||
|- | |- | ||
|valign=top| | |valign=top| | ||
Along with the ''PDF | Along with the ''PDF Data Mapping'' '''''Behavior''''', you will also need an ''Export Behavior'' configured to export a PDF formatted file. The ''PDF Data Mapping'' '''''Behavior''''' does the job of configuring all the extra content (metadata, bookmarks and/or annotations) you want to add to the exported PDF. The ''Export Behavior'' does the job of actually creating the PDF (with the content configuration information supplied by the ''PDF Data Mapping'') and sending it off to an external storage platform. | ||
''Export Behaviors'' can be added to '''Content Types''', such as the '''Content Model''' here. | ''Export Behaviors'' can be added to '''Content Types''', such as the '''Content Model''' here. | ||
| Line 153: | Line 153: | ||
Once the ''Export Behavior is added'', you will need to add an '''''Export Definition'''''. This will control how the file is exported, most notably where the file is exported. Whether exporting to a Windows file system, or an IMAP email mailbox, or a CMIS content management system, Grooper needs to know where to put the file. An '''''Export Definition''''' is how Grooper knows where the file goes. | Once the ''Export Behavior is added'', you will need to add an '''''Export Definition'''''. This will control how the file is exported, most notably where the file is exported. Whether exporting to a Windows file system, or an IMAP email mailbox, or a CMIS content management system, Grooper needs to know where to put the file. An '''''Export Definition''''' is how Grooper knows where the file goes. | ||
'''Importantly for the ''PDF | '''Importantly for the ''PDF Data Mapping''''', you will also use an '''''Export Definition''''' to define what type(s) of file you want to export. For whichever '''''Export Definition''''' you choose, you will need to ensure you've configured an '''''Export Format''''' for a PDF formatted file in order to export the generated PDF. | ||
# To add an '''''Export Definition''''', select the property and press the ellipsis button at the end. | # To add an '''''Export Definition''''', select the property and press the ellipsis button at the end. | ||
| Line 168: | Line 168: | ||
== How To == | == How To == | ||
The following tutorials use a mock UNESCO Laura W. Bush Traveling Fellowship application to detail a more specific set up for a ''PDF | The following tutorials use a mock UNESCO Laura W. Bush Traveling Fellowship application to detail a more specific set up for a ''PDF Data Mapping''. This is a packet of documents from a single applicant containing five different kinds of documents. | ||
{|cellpadding="10" cellspacing="5" | {|cellpadding="10" cellspacing="5" | ||
| Line 185: | Line 185: | ||
This document consists of two pages. The first is a coversheet for the whole application packet. The second is the application form itself. | This document consists of two pages. The first is a coversheet for the whole application packet. The second is the application form itself. | ||
Primarily, this document will allow us to demonstrate the different kinds of annotations available when using a ''PDF | Primarily, this document will allow us to demonstrate the different kinds of annotations available when using a ''PDF Data Mapping'' to generate a PDF file (using its '''''Annotations''''' property configuration). We will see how to set up one example of each of the following annotation types available in Grooper: | ||
* Highlight Annotation | * Highlight Annotation | ||
* Checkbox Widget | * Checkbox Widget | ||
| Line 207: | Line 207: | ||
This application also includes an essay from the student. This document will demonstrate how to add keywords to the PDF's metadata. | This application also includes an essay from the student. This document will demonstrate how to add keywords to the PDF's metadata. | ||
We will use an extractor to count the number of words in the essay and configure the ''PDF | We will use an extractor to count the number of words in the essay and configure the ''PDF Data Mapping's'' '''''Metadata''''' properties to insert a keyword of "long essay", "medium essay", or "short essay" depending on the essay's length. | ||
| | | | ||
[[File:Pdf-generate-howto-docset-03.png|400px]] | [[File:Pdf-generate-howto-docset-03.png|400px]] | ||
| Line 221: | Line 221: | ||
* and a letter of recommendation. | * and a letter of recommendation. | ||
These documents (as well as the rest) will allow us to see how to insert bookmarks into the generated PDF, using the ''PDF | These documents (as well as the rest) will allow us to see how to insert bookmarks into the generated PDF, using the ''PDF Data Mapping's'' '''Bookmarking''' property configuration. | ||
| | | | ||
[[File:Pdf-generate-howto-docset-04.png]] | [[File:Pdf-generate-howto-docset-04.png]] | ||
| Line 234: | Line 234: | ||
# See here this document folder in the '''Batch''' is classified as an "UNESCO Application Packet" '''Document Type'''. This '''Batch Folder''' was created upon importing the original application packet file, named "UNESCO Packet.pdf", | # See here this document folder in the '''Batch''' is classified as an "UNESCO Application Packet" '''Document Type'''. This '''Batch Folder''' was created upon importing the original application packet file, named "UNESCO Packet.pdf", | ||
# The PDF document's pages were split out using the '''Split Pages''' activity to create child '''Batch Page''' objects. This allowed us to separate the pages into child document folder for each of the documents inside the imported application packet. | # The PDF document's pages were split out using the '''Split Pages''' activity to create child '''Batch Page''' objects. This allowed us to separate the pages into child document folder for each of the documents inside the imported application packet. | ||
# | # ''PDF Data Mapping'' can create a bookmark in the generated PDF for each of these five sub documents using the '''''Bookmarking''''' property. Each bookmark will be named after their classified '''Document Type''' (i.e. "Application", "Proposal Summery", "Resume", etc.). | ||
This means we can process the full imported application packet document, and export a single file with easily navigable bookmarks for its component documents. There's no need to export individual documents for each component document and figure out a way to index them, or put them in their own folder, or any other method you may come up with to relate them to each other in their final storage location. With the ''PDF | This means we can process the full imported application packet document, and export a single file with easily navigable bookmarks for its component documents. There's no need to export individual documents for each component document and figure out a way to index them, or put them in their own folder, or any other method you may come up with to relate them to each other in their final storage location. With the ''PDF Data Mapping's'' bookmarking capabilities, you can export just one file with each child '''Document Type''' bookmarked. | ||
|colspan=3| | |colspan=3| | ||
[[File:Pdf-generate-howto-02.png]] | [[File:Pdf-generate-howto-02.png]] | ||
| Line 250: | Line 250: | ||
{|cellpadding=10 cellspacing=5 | {|cellpadding=10 cellspacing=5 | ||
|style="width:40%" valign=top| | |style="width:40%" valign=top| | ||
''PDF Data Mapping'' has the capability of inserting various annotations and native pdf widgets into the generated PDF. This increases the document's readability and adds functionality for the reader to interact with the document through widgets such as radio group buttons, checkboxes and signature fields. | |||
We will demonstrate how to configure one example for each of the '''''Annotation Types'''''. | We will demonstrate how to configure one example for each of the '''''Annotation Types'''''. | ||
| Line 276: | Line 276: | ||
<tab name="Prereqs - Data Fields & Extracted Data" style="margin:20px"> | <tab name="Prereqs - Data Fields & Extracted Data" style="margin:20px"> | ||
=== Prereqs - Data Fields & Extracted Data === | === Prereqs - Data Fields & Extracted Data === | ||
Before a PDF annotation can be generated, a document's data must be extracted. Put another way, the '''Extract''' activity must run ''before'' the '''Export''' activity (when the ''PDF | Before a PDF annotation can be generated, a document's data must be extracted. Put another way, the '''Extract''' activity must run ''before'' the '''Export''' activity (when the ''PDF Data Mapping'' ultimately builds the PDF and exports it). | ||
Each of the '''''Annotation Types''''' point to a '''Data Field''' in a '''Data Model''' as part of their configuration. If the '''Data Field''' does not collect data during the '''Extract''' activity, the ''PDF | Each of the '''''Annotation Types''''' point to a '''Data Field''' in a '''Data Model''' as part of their configuration. If the '''Data Field''' does not collect data during the '''Extract''' activity, the ''PDF Data Mapping'' won't know where to place the annotation. | ||
{|cellpadding=10 cellspacing=5 | {|cellpadding=10 cellspacing=5 | ||
|style="width:40%" valign=top| | |style="width:40%" valign=top| | ||
# We will ultimately configure | # We will ultimately configure ''PDF Data Mapping'' using the '''''Behaviors''''' property of this '''Content Model''' which we've named "PDF Generate - UNESCO Packet" | ||
#* Before we do that, we will need to ensure we have '''Data Fields''' that correspond to the annotations we want to place. | #* Before we do that, we will need to ensure we have '''Data Fields''' that correspond to the annotations we want to place. | ||
# We've added the necessary '''Data Fields''' to the '''Content Model's''' '''Data Model'''. | # We've added the necessary '''Data Fields''' to the '''Content Model's''' '''Data Model'''. | ||
| Line 298: | Line 298: | ||
{|cellpadding=10 cellspacing=5 | {|cellpadding=10 cellspacing=5 | ||
|style="width:40%" valign=top| | |style="width:40%" valign=top| | ||
Annotations are one of the configuration options for the ''PDF | Annotations are one of the configuration options for the ''PDF Data Mapping'' '''''Behavior'''''. A '''Content Type''' '''''Behavior''''' can tell an activity (specifically the '''Export''' activity, in the case of ''PDF Data Mapping'') how to use the '''Content Type''' to do something (how to use the '''Content Model's''' collected '''Data Fields''' to insert additional content when generating a PDF upon export, in this case). | ||
# All '''''Behaviors''''' are added to a '''Content Type''' object. | # All '''''Behaviors''''' are added to a '''Content Type''' object. | ||
#* We will add the ''PDF | #* We will add the ''PDF Data Mapping'' '''''Behavior''''' to this '''Content Model''' named "PDF Generate - UNESCO Packet". | ||
# All '''''Behaviors''''' are added using the '''''Behaviors''''' property. Select the '''''Behaviors''''' property and press the ellipsis button at the end to add | # All '''''Behaviors''''' are added using the '''''Behaviors''''' property. Select the '''''Behaviors''''' property and press the ellipsis button at the end to add ''PDF Data Mapping''. | ||
# This will bring up the '''''Behaviors''''' editor window. | # This will bring up the '''''Behaviors''''' editor window. | ||
# Press the "Add" button to add a '''''Behavior'''''. | # Press the "Add" button to add a '''''Behavior'''''. | ||
# Choose "PDF | # Choose "PDF Data Mapping" from the list. | ||
| | | | ||
[[File:Pdf-generate-howto-06.png]] | [[File:Pdf-generate-howto-06.png]] | ||
|- | |- | ||
|valign=top| | |valign=top| | ||
# Once added, you will see | # Once added, you will see ''PDF Data Mapping'' added to the list on the left. Select it to add an '''''Annotation'''''. | ||
# In the right panel, select the '''''Annotations''''' property and press the ellipsis button at the end. | # In the right panel, select the '''''Annotations''''' property and press the ellipsis button at the end. | ||
# This will bring up an '''''Annotations''''' collection editor. | # This will bring up an '''''Annotations''''' collection editor. | ||
| Line 388: | Line 388: | ||
The ''Radio Group Widget'' annotation allows you to add radio buttons to the document. Radio buttons are common PDF elements used to indicate a single choice from multiple options in a list. This ''PDF | The ''Radio Group Widget'' annotation allows you to add radio buttons to the document. Radio buttons are common PDF elements used to indicate a single choice from multiple options in a list. This ''PDF Data Mapping'' '''''Annotation Type''''' uses OMR extraction techniques (such as ''Labeled OMR'' and ''Zonal OMR'') to find existing checkboxes on the document. A group of radio buttons are then overlaid on top of the checkboxes when the ''PDF Data Mapping'' '''''Behavior''''' builds the PDF file. | ||
For example, we will create a ''Radio Group Widget'' annotation from the "US Citizen" '''Data Field's''' result. We have two choices, either "Yes" or "No". Only one or the other can be chosen. So, this is well suited for a radio button group. | For example, we will create a ''Radio Group Widget'' annotation from the "US Citizen" '''Data Field's''' result. We have two choices, either "Yes" or "No". Only one or the other can be chosen. So, this is well suited for a radio button group. | ||
| Line 448: | Line 448: | ||
{|cellpadding=10 cellspacing=5 | {|cellpadding=10 cellspacing=5 | ||
|valign=top style="width:50%"| | |valign=top style="width:50%"| | ||
In the case of every '''''Annotation Type''''', | In the case of every '''''Annotation Type''''', ''PDF Data Mapping'' inserts the annotation by overlaying it on top of the document. This can be important to keep in mind for all annotations but is often particularly relevant when inserting radio buttons using the ''Radio Group Widget''. | ||
Notice the original image for this document used checkboxes, not radio buttons. We see an "X" inside of a square box. | Notice the original image for this document used checkboxes, not radio buttons. We see an "X" inside of a square box. | ||
| Line 482: | Line 482: | ||
''PDF Data Mapping'' also has the capability to insert form-fillable checkboxes as well, using the ''Checkbox Widget'' '''''Annotation Type'''''. This '''''Annotation Type''''' also uses OMR extraction techniques (such as ''Labeled OMR'' and ''Zonal OMR'') to find existing checkboxes on the document. It works a lot like the ''Radio Group Widget'' annotation, just instead of radio buttons, editable checkboxes are overlaid on the document. | |||
For example, we will create a ''Checkbox Widget'' annotation for the checkboxes in the "Checklist" section of this document, the "Application", "Proposal Summary", "Essay", "Resume" and "Recommendation Letter" '''Data Fields'''. These are Boolean OMR checkboxes, returning "true" if the box next to the corresponding label is checked, and "false" if unchecked. In either case, checked or not, the ''Checkbox Widget'' will insert an editable checkbox element into the generated PDF. | For example, we will create a ''Checkbox Widget'' annotation for the checkboxes in the "Checklist" section of this document, the "Application", "Proposal Summary", "Essay", "Resume" and "Recommendation Letter" '''Data Fields'''. These are Boolean OMR checkboxes, returning "true" if the box next to the corresponding label is checked, and "false" if unchecked. In either case, checked or not, the ''Checkbox Widget'' will insert an editable checkbox element into the generated PDF. | ||
| Line 586: | Line 586: | ||
#* This allows us to set the size ('''''Adjustment''''') and location ('''''Translation''''') of the extraction zone (and ultimately the signature widget) relative to the '''''Text Extractor's''''' result. | #* This allows us to set the size ('''''Adjustment''''') and location ('''''Translation''''') of the extraction zone (and ultimately the signature widget) relative to the '''''Text Extractor's''''' result. | ||
#* The extraction zone is the green rectangle in the document viewer. | #* The extraction zone is the green rectangle in the document viewer. | ||
# When the ''PDF | # When the ''PDF Data Mapping'' '''''Behavior''''' builds the PDF, using the ''Signature Widget'' annotation, the extraction zone's size and location forms the inserted signature widget. | ||
|valign=top| | |valign=top| | ||
[[File:Pdf-generate-howto-27.png]] | [[File:Pdf-generate-howto-27.png]] | ||
| Line 602: | Line 602: | ||
The ''Textbox Widget'' '''''Annotation Type''''' will insert editable text boxes into the generated PDF. One simple way to use this functionality is to use the ''Highlight Zone'' extractor type to place a blank zone where you want to place an empty text box on the PDF. However, any extractor type can be used to define the textbox's location. Furthermore, if the '''Data Field''' used to create the annotation collects a valued during the '''Extract''' activity, not only will a textbox be inserted into the generated PDF, but it will be prefilled with the '''Data Field's''' extracted value upon export. | The ''Textbox Widget'' '''''Annotation Type''''' will insert editable text boxes into the generated PDF. One simple way to use this functionality is to use the ''Highlight Zone'' extractor type to place a blank zone where you want to place an empty text box on the PDF. However, any extractor type can be used to define the textbox's location. Furthermore, if the '''Data Field''' used to create the annotation collects a valued during the '''Extract''' activity, not only will a textbox be inserted into the generated PDF, but it will be prefilled with the '''Data Field's''' extracted value upon export. | ||
For example, we will use the ''Textbox Widget'' functionality to fill out the blank coversheet on the first page of our application packet. We will end up using a ''Highlight Zone'' extractor to define the size and location of the text box. However, we're going to go one step further and populate the '''Data Field's''' used with some information from other '''Data Field's''' in our '''Data Model'''. By the end of it, | For example, we will use the ''Textbox Widget'' functionality to fill out the blank coversheet on the first page of our application packet. We will end up using a ''Highlight Zone'' extractor to define the size and location of the text box. However, we're going to go one step further and populate the '''Data Field's''' used with some information from other '''Data Field's''' in our '''Data Model'''. By the end of it, ''PDF Data Mapping'' will not only insert editable textboxes into the generated PDF, but fill them in with text. By the end of it, we end up with this blank coversheet automatically populated with some information collected during the '''Extract''' activity. | ||
| | | | ||
{| | {| | ||
| Line 689: | Line 689: | ||
=== About === | === About === | ||
Bookmarks in PDFs aid readers when navigating through multipage documents. | Bookmarks in PDFs aid readers when navigating through multipage documents. ''PDF Data Mapping'' can insert bookmarks into the generated PDF to take advantage of this functionality. This can be done in one of two ways (or both): | ||
# Using a '''Batch Folder's''' child document folders. | # Using a '''Batch Folder's''' child document folders. | ||
| Line 726: | Line 726: | ||
{|cellpadding=10 cellspacing=5 | {|cellpadding=10 cellspacing=5 | ||
|valign=top style="width:40%"| | |valign=top style="width:40%"| | ||
In order to accomplish this goal, we're going to have to do some things to this application packet before we configure | In order to accomplish this goal, we're going to have to do some things to this application packet before we configure ''PDF Data Mapping''. | ||
By the end of it, we're looking for a '''Batch''' whose documents have a structure like this. The documents in this batch consist of two '''Batch Folder''' levels. | By the end of it, we're looking for a '''Batch''' whose documents have a structure like this. The documents in this batch consist of two '''Batch Folder''' levels. | ||
| Line 792: | Line 792: | ||
{|cellpadding=10 cellspacing=5 | {|cellpadding=10 cellspacing=5 | ||
|style="width:40%" valign=top| | |style="width:40%" valign=top| | ||
Bookmarking is one of the configuration options for the ''PDF Generate Behavior''. A '''Content Type''' '''''Behavior''''' can tell an activity (specifically the '''Export''' activity, in the case of | Bookmarking is one of the configuration options for the ''PDF Generate Behavior''. A '''Content Type''' '''''Behavior''''' can tell an activity (specifically the '''Export''' activity, in the case of ''PDF Data Mapping'') how to use the '''Content Type''' to do something (in this case, how to use the '''Content Model's''' '''Document Types''' to insert bookmarks into the PDF upon export). | ||
# All '''''Behaviors''''' are added to a '''Content Type''' object. | # All '''''Behaviors''''' are added to a '''Content Type''' object. | ||
#* We will add the ''PDF | #* We will add the ''PDF Data Mapping'' '''''Behavior''''' to this '''Content Model''' named "PDF Generate - UNESCO Packet". | ||
# All '''''Behaviors''''' are added using the '''''Behaviors''''' property. Select the '''''Behaviors''''' property and press the ellipsis button at the end to add the ''PDF Generate Behavior''. | # All '''''Behaviors''''' are added using the '''''Behaviors''''' property. Select the '''''Behaviors''''' property and press the ellipsis button at the end to add the ''PDF Generate Behavior''. | ||
# This will bring up the '''''Behaviors''''' editor window. | # This will bring up the '''''Behaviors''''' editor window. | ||
# Press the "Add" button to add a '''''Behavior'''''. | # Press the "Add" button to add a '''''Behavior'''''. | ||
# Choose "PDF | # Choose "PDF Data Mapping" from the list. | ||
| | | | ||
[[File:Pdf-generate-howto-06.png]] | [[File:Pdf-generate-howto-06.png]] | ||
|- | |- | ||
|valign=top| | |valign=top| | ||
# Once added, you will see | # Once added, you will see ''PDF Data Mapping'' added to the list on the left. Select it. | ||
# To enable the bookmarking functionality, in the right panel, select the '''''Bookmarking''''' property. | # To enable the bookmarking functionality, in the right panel, select the '''''Bookmarking''''' property. | ||
# Change it from ''Disabled'' to ''Enabled''. | # Change it from ''Disabled'' to ''Enabled''. | ||
| Line 831: | Line 831: | ||
{|cellpadding=10 cellspacing=5 | {|cellpadding=10 cellspacing=5 | ||
|style="width:50%" valign=top| | |style="width:50%" valign=top| | ||
The ''PDF | The ''PDF Data Mapping'' ''''''Behavior''''' has the ability to create and insert additional metadata into the generated PDF as well, using information collected during Grooper's document processing. The metadata you are able to create falls into one of three categories: | ||
# Editing the PDF's default metadata fields. | # Editing the PDF's default metadata fields. | ||
| Line 873: | Line 873: | ||
{|cellpadding=10 cellspacing=5 | {|cellpadding=10 cellspacing=5 | ||
|style="width:40%" valign=top| | |style="width:40%" valign=top| | ||
Metadata is one of the configuration options for the ''PDF | Metadata is one of the configuration options for the ''PDF Data Mapping'' '''''Behavior'''''. A '''Content Type''' '''''Behavior''''' can tell an activity (specifically the '''Export''' activity, in the case of ''PDF Data Mapping'') how to use the '''Content Type''' to do something (how to use the '''Content Model's''' collected '''Data Fields''' and other information to edit the generated PDF's metadata, in this case). | ||
# All '''''Behaviors''''' are added to a '''Content Type''' object. | # All '''''Behaviors''''' are added to a '''Content Type''' object. | ||
#* We will add the ''PDF | #* We will add the ''PDF Data Mapping'' '''''Behavior''''' to this '''Content Model''' named "PDF Generate - UNESCO Packet". | ||
# All '''''Behaviors''''' are added using the '''''Behaviors''''' property. Select the '''''Behaviors''''' property and press the ellipsis button at the end to add the ''PDF Generate Behavior''. | # All '''''Behaviors''''' are added using the '''''Behaviors''''' property. Select the '''''Behaviors''''' property and press the ellipsis button at the end to add the ''PDF Generate Behavior''. | ||
# This will bring up the '''''Behaviors''''' editor window. | # This will bring up the '''''Behaviors''''' editor window. | ||
# Press the "Add" button to add a '''''Behavior'''''. | # Press the "Add" button to add a '''''Behavior'''''. | ||
# Choose "PDF | # Choose "PDF Data Mapping" from the list. | ||
| | | | ||
[[File:Pdf-generate-howto-06.png]] | [[File:Pdf-generate-howto-06.png]] | ||
|- | |- | ||
|valign=top| | |valign=top| | ||
# Once added, you will see | # Once added, you will see ''PDF Data Mapping'' added to the list on the left. Select it. | ||
# To enable the metadata functionality, in the right panel, select the '''''Metadata''''' property. | # To enable the metadata functionality, in the right panel, select the '''''Metadata''''' property. | ||
# Change it from ''Disabled'' to ''Enabled''. | # Change it from ''Disabled'' to ''Enabled''. | ||
| Line 979: | Line 979: | ||
# '''Content Models''' and '''Document Types''' can have their own '''Data Models''' as one of their children. Configuring the ''PDF Generate Behavior'' on the '''Content Model''', we will utilize ''its'' '''Data Model''' to export this custom metadata. | # '''Content Models''' and '''Document Types''' can have their own '''Data Models''' as one of their children. Configuring the ''PDF Generate Behavior'' on the '''Content Model''', we will utilize ''its'' '''Data Model''' to export this custom metadata. | ||
# This '''Data Model''' is configured with several '''Data Fields'''. These '''Data Fields''' will collect information about the "UNESCO Application Packet" and its component documents, such as the applicant's name and information about the proposal. | # This '''Data Model''' is configured with several '''Data Fields'''. These '''Data Fields''' will collect information about the "UNESCO Application Packet" and its component documents, such as the applicant's name and information about the proposal. | ||
#* This will be done during the '''Extract''' activity. Once collected, | #* This will be done during the '''Extract''' activity. Once collected, ''PDF Data Mapping'' can insert the results into the generated PDF, creating one custom metadata field and corresponding result for each '''Data Field''' and its extracted result. | ||
| | | | ||
[[File:Pdf-generate-howto-50.png]] | [[File:Pdf-generate-howto-50.png]] | ||
|- | |- | ||
| | | | ||
To do this, we will use the '''''Export Data Fields''''' option of | To do this, we will use the '''''Export Data Fields''''' option of ''PDF Data Mapping's'' '''''Metadata''''' properties. | ||
# In the '''''Metadata''''' sub-properties, change '''''Export Data Fields''''' from ''False'' to ''True'' | # In the '''''Metadata''''' sub-properties, change '''''Export Data Fields''''' from ''False'' to ''True'' | ||
# By default, once you enable this property, Grooper will export all available '''Data Fields''' to the '''Content Type''' on which | # By default, once you enable this property, Grooper will export all available '''Data Fields''' to the '''Content Type''' on which ''PDF Data Mapping'' is configured. | ||
#* You can be more selective about what you want to include using the '''''Field Filter''''' property. | #* You can be more selective about what you want to include using the '''''Field Filter''''' property. | ||
#* This will give you a drop down list of all the '''Data Field''' nodes available for custom PDF metadata creation. You can check the box next to which ones you wish to include, leaving those '''Data Fields''' you wish to exclude unchecked. | #* This will give you a drop down list of all the '''Data Field''' nodes available for custom PDF metadata creation. You can check the box next to which ones you wish to include, leaving those '''Data Fields''' you wish to exclude unchecked. | ||
| Line 1,022: | Line 1,022: | ||
=== Export the Generated PDF === | === Export the Generated PDF === | ||
There's one last crucial step to using the ''PDF | There's one last crucial step to using the ''PDF Data Mapping'' '''''Behavior'''''. Exporting the generated PDF. | ||
There's no point in generating the PDF with all this additional metadata, bookmarks and annotations if you don't get it out of Grooper and into some kind of external storage platform. That's the job of the '''Export''' activity. To properly export the PDF generated by the ''PDF | There's no point in generating the PDF with all this additional metadata, bookmarks and annotations if you don't get it out of Grooper and into some kind of external storage platform. That's the job of the '''Export''' activity. To properly export the PDF generated by the ''PDF Data Mapping'' there are some specific requirements to keep in mind. | ||
<tabs style="margin:20px"> | <tabs style="margin:20px"> | ||
| Line 1,030: | Line 1,030: | ||
=== Add an Export Behavior === | === Add an Export Behavior === | ||
In order to export any document from Grooper, you need to configure an ''Export Behavior'' for the '''Export''' activity to know how you want to export document folders in a '''Batch''' and what external storage platform you're exporting them to. So, it makes sense this would be part of exporting the PDF generated by the ''PDF | In order to export any document from Grooper, you need to configure an ''Export Behavior'' for the '''Export''' activity to know how you want to export document folders in a '''Batch''' and what external storage platform you're exporting them to. So, it makes sense this would be part of exporting the PDF generated by the ''PDF Data Mapping''. | ||
* And if you want to be really technical, the '''Export''' activity (using an ''Export Behavior'' configuration) will truly "generate" the PDF file, in terms of creating it. | * And if you want to be really technical, the '''Export''' activity (using an ''Export Behavior'' configuration) will truly "generate" the PDF file, in terms of creating it. ''PDF Data Mapping'' gives the ''Export Behavior'' additional information about how to generate it, utilizing the additional metadata, bookmarking, and annotation elements. | ||
{|cellpadding=10 cellspacing=5 | {|cellpadding=10 cellspacing=5 | ||
| Line 1,097: | Line 1,097: | ||
|- | |- | ||
|valign=top| | |valign=top| | ||
The last piece of the puzzle is just telling the ''Export Behavior'' what file format you want to use for the exported documents. To take advantage of | The last piece of the puzzle is just telling the ''Export Behavior'' what file format you want to use for the exported documents. To take advantage of ''PDF Data Mapping'', we will want to tell it to export the documents as PDFs. | ||
# Next, you will want to find the '''''Export Formats''''' property. | # Next, you will want to find the '''''Export Formats''''' property. | ||
| Line 1,116: | Line 1,116: | ||
{|cellpadding=10 cellspacing=5 | {|cellpadding=10 cellspacing=5 | ||
|valign=top style="width:40%"| | |valign=top style="width:40%"| | ||
As far as the ''PDF Format'' property configuration goes, there are two properties that are particularly relevant to how it interacts with | As far as the ''PDF Format'' property configuration goes, there are two properties that are particularly relevant to how it interacts with ''PDF Data Mapping''. | ||
# If you are using | # If you are using ''PDF Data Mapping'' to insert bookmarks, you will need to enable the '''''Bookmarks'''''' property. | ||
#* You will find this property under the '''''Build Options''''' set of properties. | #* You will find this property under the '''''Build Options''''' set of properties. | ||
#* Enable '''''Bookmarks''''' by changing this property from ''False'' to ''True'' | #* Enable '''''Bookmarks''''' by changing this property from ''False'' to ''True'' | ||
# Depending on how your documents were sourced on import, you may need to enable the '''''Always Build''''' mode as well. | # Depending on how your documents were sourced on import, you may need to enable the '''''Always Build''''' mode as well. | ||
#* Such is actually the case in the workflow we've simulated in this tutorial. We processed these UNESCO study abroad application packets from an imported PDF, which we split out into individual page objects, so we could separate out the component '''Document Types''' that comprised the full file. The ''original'' PDF file from import lives on the parent document's '''Batch Folder''' object. If you leave '''''Always Build''''' set to ''False'', that imported file living on the parent document folder is what will get exported, ''not'' the PDF built by the ''PDF | #* Such is actually the case in the workflow we've simulated in this tutorial. We processed these UNESCO study abroad application packets from an imported PDF, which we split out into individual page objects, so we could separate out the component '''Document Types''' that comprised the full file. The ''original'' PDF file from import lives on the parent document's '''Batch Folder''' object. If you leave '''''Always Build''''' set to ''False'', that imported file living on the parent document folder is what will get exported, ''not'' the PDF built by the ''PDF Data Mapping'' with additional metadata, bookmarking and annotation elements. | ||
#* If you run into a situation where the output PDF does not reflect the ''PDF Generate Behavior'' configurations you've set up, a good first troubleshooting step is changing '''''Always Build''''' to ''True'' to ensure the exported file is the ''PDF | #* If you run into a situation where the output PDF does not reflect the ''PDF Generate Behavior'' configurations you've set up, a good first troubleshooting step is changing '''''Always Build''''' to ''True'' to ensure the exported file is the ''PDF Data Mapping'' built PDF and not the original (also called "native") pre-processed PDF. | ||
The remaining ''PDF Format'' property configurates apply more generally to PDF file creation. While they may be important to your end goals, they are independent from ''PDF | The remaining ''PDF Format'' property configurates apply more generally to PDF file creation. While they may be important to your end goals, they are independent from ''PDF Data Mapping'' concerns. | ||
| | | | ||
[[File:Pdf-generate-howto-60.png]] | [[File:Pdf-generate-howto-60.png]] | ||
| Line 1,135: | Line 1,135: | ||
== Version Differences == | == Version Differences == | ||
'''''Behaviors''''' are a new functionality in '''Grooper 2021'''. Much of the ''PDF | '''''Behaviors''''' are a new functionality in '''Grooper 2021'''. Much of the ''PDF Data Mapping'' '''''Behavior's''''' functionality was not available in previous versions. Prior to version '''2021''', only annotation creation was possible using the '''[[Generate PDF]]''' activity. In version '''2021''', this activity has been replaced by the ''PDF Data Mapping''' '''''Behavior''''', expanding its capabilities to generate bookmarks and document metadata as well. | ||
Revision as of 11:13, 30 August 2021
|
2021 |
This article is in development for the upcoming version of Grooper, Grooper 2021. PDF Data Mapping is a new Content Type Behavior option in 2021. This information is incomplete and/or may change by the time of release. |
PDF Data Mapping is a Content Type Behavior designed to create an exportable PDF file with additional native PDF elements, using the classification and extraction content of a Batch Folder. This includes capabilities to export extracted data as PDF metadata, inserting bookmarks, and creating PDF annotations, such as highlighting, checkbox and signature widgets.
About
| ⍗ |
You may download and import the file below into your own Grooper environment (version 2021). This contains a Batch with example document(s) discussed in this article and a Content Model configured according to its instructions. |
PDF Data Mapping Behavior allows Grooper users to more fully leverage the capabilities of the PDF file type. The standard PDF Export Format in Grooper will use the page image files and their text data to create a multipage PDF file for each document folder upon Export. However, this is just the "display information" required to open and read the document. There's a lot more to what a PDF can be than just a multipage document with page images and machine readable text. PDF content can also include metadata, keywords, bookmarks, annotations, and more!
PDF Data Mapping creates an exportable PDF file that includes some of this additional content available to the PDF format. This is part of Grooper's evolving "Smart PDF Architecture". This is a design philosophy striving to more fully utilize the capabilities of the PDF file type and merge them with Grooper's own document processing capabilities.
The expanded PDF Data Mapping functionality can be divided into three categories:
- Annotations
- Bookmarks
- Metadata
Annotations
|
Annotations are additional objects you can add to PDF documents. Grooper uses information from Data Elements in a Data Model collected during the Extract activity to add these annotations (also called "widgets"). These annotations can increase the readability and add components for the reader to interact with the document, such as checkboxes and signature boxes. The kinds of annotations you can add are:
Grooper uses the data instance information from extracted Data Fields to insert these annotations. For example, here we set up a Content Model with a Data Field named "Last Name". After the document's data was collected during the Extract activity, Grooper has a data instance it can associate with the "Last Name" Data Field, including its size and location coordinates on the document. We then used the Highlight Annotation to highlight the extracted last name on the document in yellow. The size of all these annotations can also be adjusted using a Padding property if the size of the extracted data instance is too small for your needs. |
Bookmarks
|
Bookmarks allow easy navigation for multipage PDF documents. When exporting a single PDF comprised of multiple child sub-documents, you can create bookmarks for each child document. This way, you can keep all the documents together in a single PDF file, easily navigating from one section of the document to another. For example, this document is an application packet for a study abroad program. Each document in the packet was separated and classified as a child document folder of one Document Type or another. PDF Data Mapping was used to export the packet as a single PDF and a bookmark was inserted for each sub-document and named after its Document Type. Grooper can create bookmarks from extracted Data Fields in the document as well. |
Metadata
|
Metadata refers to a PDF file's content beyond the information required to display the document (the page images and encoded text data). Prior to implementing the PDF Data Mapping functionality, Grooper only had access to edit minimal PDF metadata, notably the file's name upon export. PDF Data Mapping allows Grooper to alter and store additional collected metadata as well, including Data Field values collected during the Extract activity. This means Grooper can now create a viewable document with all the extracted data associated with the document itself, independent of that data being stored elsewhere (such as a database table or content management system). This metadata can be accessed by opening a PDF in a PDF viewer application, such as Adobe Acrobat, and opening the "Document Properties" window from the File menu. |
|||
|
There are several pieces of metadata Grooper has access to.
|
|||
|
|
As a Behavior, PDF Generate is configured on a Content Type object, commonly a Content Model or a Document Type.
|
|||
|
|||
|
Along with the PDF Data Mapping Behavior, you will also need an Export Behavior configured to export a PDF formatted file. The PDF Data Mapping Behavior does the job of configuring all the extra content (metadata, bookmarks and/or annotations) you want to add to the exported PDF. The Export Behavior does the job of actually creating the PDF (with the content configuration information supplied by the PDF Data Mapping) and sending it off to an external storage platform. Export Behaviors can be added to Content Types, such as the Content Model here.
|
|||
|
Once the Export Behavior is added, you will need to add an Export Definition. This will control how the file is exported, most notably where the file is exported. Whether exporting to a Windows file system, or an IMAP email mailbox, or a CMIS content management system, Grooper needs to know where to put the file. An Export Definition is how Grooper knows where the file goes. Importantly for the PDF Data Mapping, you will also use an Export Definition to define what type(s) of file you want to export. For whichever Export Definition you choose, you will need to ensure you've configured an Export Format for a PDF formatted file in order to export the generated PDF.
We will review some specifics of the PDF Format option's configuration later. For now, just be aware adding a PDF Export Format is a necessary step to export the PDF file generated by the PDF Generate Behavior. |
How To
The following tutorials use a mock UNESCO Laura W. Bush Traveling Fellowship application to detail a more specific set up for a PDF Data Mapping. This is a packet of documents from a single applicant containing five different kinds of documents.
| ⍗ |
You may download and import the file below into your own Grooper environment (version 2021). This contains a Batch with the example document(s) discussed in this tutorial and a Content Model configured according to its instructions. |
Application
|
This document consists of two pages. The first is a coversheet for the whole application packet. The second is the application form itself. Primarily, this document will allow us to demonstrate the different kinds of annotations available when using a PDF Data Mapping to generate a PDF file (using its Annotations property configuration). We will see how to set up one example of each of the following annotation types available in Grooper:
Importantly for any annotation type, a Data Field must be extracted in order to place the annotation. How does Grooper know what you want to highlight? It uses the extraction result of a Data Field, which includes information about where that value is located on the page. Even if the extraction result is just a blank zone without returning any actual information, Grooper needs some kind of coordinates to know where to place the annotation. Since we're going to end up extracting some data in order to place these annotations, this will also give us the opportunity to see some of the collected data inserted as PDF metadata as well. |
Essay
|
This application also includes an essay from the student. This document will demonstrate how to add keywords to the PDF's metadata. We will use an extractor to count the number of words in the essay and configure the PDF Data Mapping's Metadata properties to insert a keyword of "long essay", "medium essay", or "short essay" depending on the essay's length. |
Other Documents
|
This packet contains three other kinds of documents as well:
These documents (as well as the rest) will allow us to see how to insert bookmarks into the generated PDF, using the PDF Data Mapping's Bookmarking property configuration. |
|||
|
The original document, imported as a single multipage PDF file, has been processed a bit to facilitate this.
This means we can process the full imported application packet document, and export a single file with easily navigable bookmarks for its component documents. There's no need to export individual documents for each component document and figure out a way to index them, or put them in their own folder, or any other method you may come up with to relate them to each other in their final storage location. With the PDF Data Mapping's bookmarking capabilities, you can export just one file with each child Document Type bookmarked. |
|||
Configure PDF Generation for Annotations
About
|
PDF Data Mapping has the capability of inserting various annotations and native pdf widgets into the generated PDF. This increases the document's readability and adds functionality for the reader to interact with the document through widgets such as radio group buttons, checkboxes and signature fields. We will demonstrate how to configure one example for each of the Annotation Types.
|
|
|
We will also use the Textbox Widget to insert editable text boxes into the document's coversheet. These text boxes will also be populated with some corresponding information from the rest of the document.
|
Prereqs - Data Fields & Extracted Data
Before a PDF annotation can be generated, a document's data must be extracted. Put another way, the Extract activity must run before the Export activity (when the PDF Data Mapping ultimately builds the PDF and exports it).
Each of the Annotation Types point to a Data Field in a Data Model as part of their configuration. If the Data Field does not collect data during the Extract activity, the PDF Data Mapping won't know where to place the annotation.
|
Add the Behavior
|
Annotations are one of the configuration options for the PDF Data Mapping Behavior. A Content Type Behavior can tell an activity (specifically the Export activity, in the case of PDF Data Mapping) how to use the Content Type to do something (how to use the Content Model's collected Data Fields to insert additional content when generating a PDF upon export, in this case).
|
|
We will detail collection and configuration of the various Annotation Types in the next tabs of this tutorial. |
Highlight Annotation
|
In this example, we will use the Highlight Annotation to highlight the extracted "Last Name", "First Name" and "Middle Initial" fields from the application form. |
|
|
|
|
|
|
Optionally, you can control how the highlight looks. Its color, size, opacity and whether or not there's a stroke around the highlighted rectangle.
|
Radio Group Widget
|
For example, we will create a Radio Group Widget annotation from the "US Citizen" Data Field's result. We have two choices, either "Yes" or "No". Only one or the other can be chosen. So, this is well suited for a radio button group. |
|
|
|
|
|
|
Let's briefly look at this "US Citizen" Data Field and see what's happening behind the scenes when the PDF Generate Behavior creates the radio buttons.
|
| FYI |
|
Checkbox Widget
| WIP | The Checkbox Widget documentation needs to be finalized after getting some guidance from dev. If it seems incomplete or images don't match up with text, that is why. |
|
For example, we will create a Checkbox Widget annotation for the checkboxes in the "Checklist" section of this document, the "Application", "Proposal Summary", "Essay", "Resume" and "Recommendation Letter" Data Fields. These are Boolean OMR checkboxes, returning "true" if the box next to the corresponding label is checked, and "false" if unchecked. In either case, checked or not, the Checkbox Widget will insert an editable checkbox element into the generated PDF. |
|
|
|
|
Signature Widget
|
For example, we will create a Signature Widget annotation for the signature line on the application form, using the "Signature" Data Field of our Data Model. The Checkbox Widget will insert an interactable signature element into the generated PDF. |
|
|
|
|
Just like any Annotation Type, the extraction result from the Data Field is critical for placing the signature annotation on the generated PDF. Let's look at the "Signature" Data Field's result to understand a little better how these results are used to create the signature widget.
In our case, we're using the Detect Signature extractor type to supply these results. The Detect Signature extractor is perfectly suited for the Signature Widget Annotation Type.
- It actually combines both Zonal and OMR based extraction techniques to determine if a signature is present in the zone. It sets the boundaries of where you expect to find a signature using Zonal based methods and detects if the signature is present by counting the percentage of filled pixels in the zone, which is the basis of OMR based extraction methods. You can then output different values if the zone is filled above or below a certain percentage. In this case, the extractor returns "Not Signed" because there aren't enough pixels present in the extraction zone to count as filled. If there were a signature present, there'd be more pixels present, accounting for a higher filled percentage.
This is great for our purposes because it gives us the exact information we need for the Signature Widget, which is an extraction zone. Grooper needs a data instance indicating the size and location for the generated signature widget.
- But wait there's more! We also get some bonus information about whether or not there's a signature present. Does the Signature Widget Annotation Type need to know if there's a signature present? No. It does not. It will place the widget no matter what the result is. But might that information be otherwise useful to you? Probably.
|
Textbox Widget
|
The Textbox Widget Annotation Type will insert editable text boxes into the generated PDF. One simple way to use this functionality is to use the Highlight Zone extractor type to place a blank zone where you want to place an empty text box on the PDF. However, any extractor type can be used to define the textbox's location. Furthermore, if the Data Field used to create the annotation collects a valued during the Extract activity, not only will a textbox be inserted into the generated PDF, but it will be prefilled with the Data Field's extracted value upon export. For example, we will use the Textbox Widget functionality to fill out the blank coversheet on the first page of our application packet. We will end up using a Highlight Zone extractor to define the size and location of the text box. However, we're going to go one step further and populate the Data Field's used with some information from other Data Field's in our Data Model. By the end of it, PDF Data Mapping will not only insert editable textboxes into the generated PDF, but fill them in with text. By the end of it, we end up with this blank coversheet automatically populated with some information collected during the Extract activity. |
|
|
|||
|
|||
|
The Textbox Widget annotation has some additional configuration options as well.
|
|||
|
As far as looking behind the scenes, there's at least two things going on with how we've set up these Data Fields' extraction, ultimately supplying the result used to insert the Textbox Widget annotation. First, we used the Highlight Zone extractor type to draw the textbox, defining the size and location of the annotation upon generating the PDF.
Second, we used an expression to return a value, using the results of other Data Fields in our Data Model.
|
|||
|
Configure PDF Generation for Bookmarks
About
Bookmarks in PDFs aid readers when navigating through multipage documents. PDF Data Mapping can insert bookmarks into the generated PDF to take advantage of this functionality. This can be done in one of two ways (or both):
- Using a Batch Folder's child document folders.
- Using the document's extracted Data Fields.
|
We will focus on the bookmarking method (as it is more common). Often it is the case you will import a file into Grooper that has multiple documents inside you want to separate and classify, but otherwise all belong together in one way or another. Such is the case with our study abroad application packet. The application packet as a whole consists of five separate and distinguishable documents.
|
|
|
Our goal is to create a bookmark in the generated PDF file for each of these component documents (or child documents as we will come to call them). Rather than exporting five separate PDF files for each component document, we will export a single PDF for the whole packet with navigable bookmarks corresponding to each component document.
|
Prereqs - Split Pages, Separation and Classification
|
In order to accomplish this goal, we're going to have to do some things to this application packet before we configure PDF Data Mapping. By the end of it, we're looking for a Batch whose documents have a structure like this. The documents in this batch consist of two Batch Folder levels.
This is what we want to end up with. How did we get there? Long story short, we have some document separation and classification requirements before we can insert bookmarks in the generated PDF. The bookmarks are inserted for each child document folder and named after their classified Document Type's name. In order to do that, we need to split out the pages of the imported document, separate them into child document folders, and classify them first. |
|||
|
The full application document came into Grooper like this. A 7 page PDF file with each of these 5 component documents was imported into a new Batch. This is now the parent document folder at Folder Level 1. But there's documents in them there document! How do we get them out? |
|||
|
First, we need to use the Split Pages activity to create child Batch Page objects. This will split out the pages of the imported PDF file, creating one child Batch Batch for each page in PDF on the parent document folder. Now we have page objects we can manipulate in our Batch. |
|||
|
Now that we have Batch Page objects in our Batch, we can use the Separate activity to insert the second folder level. This is the first step in organizing these pages into child documents. We need to distinguish between one collection of pages as a document and another collection of pages as a document. Creating a folders is the first part of that equation. Now, we have child document folders for this parent document folder, but they are just blank folders. There is nothing to distinguish one folder from the next.
|
|||
|
And, that's the second part of the organization equation, classification. Next, these folders will be assigned a Document Type from our Content Model using the Classify activity.
Now, we have everything we need to configure the bookmarking functionality of the PDF Generate Behavior. Bookmarks will be created every time a new child document is encountered and named after the Document Type assigned to that folder. When the full PDF is generated, a bookmark named "Application" will be inserted at the first page of the PDF. That child document is two pages long. The third page of the full PDF will be the proposal summary. So a bookmark named "Proposal Summery" will be inserted at page three. A "Resume" bookmark will be inserted at page four. And so on. |
| FYI |
There are many ways to separate and classify documents, including ESP Auto Separation which both separates and classifies documents with a single activity (just Separate). But this is the general idea to get us where we need to go. One way or another, create classified child document folders from a parent document folder. That way when we generate the PDF for the parent document folder upon export, bookmarks will be created for the classified child document folders. |
Add the Behavior and Configure It for Bookmarking
|
Bookmarking is one of the configuration options for the PDF Generate Behavior. A Content Type Behavior can tell an activity (specifically the Export activity, in the case of PDF Data Mapping) how to use the Content Type to do something (in this case, how to use the Content Model's Document Types to insert bookmarks into the PDF upon export).
|
|
|
|
|
For our purposes, this is all we need to configure at this point. However, be aware of the Bookmarking configuration options.
|
Configure PDF Generation for Metadata
About
|
The PDF Data Mapping 'Behavior has the ability to create and insert additional metadata into the generated PDF as well, using information collected during Grooper's document processing. The metadata you are able to create falls into one of three categories:
|
Prereqs - Data Extraction
If we're going to insert some metadata into these PDFs, that data has to come from somewhere. In broad terms, the metadata creation is done in one of two ways (or a combination of the two):
- Using expression based creation
- In the case of the default PDF metadata fields and keywords, expressions can be used to populate the metadata. This gives you access to system data, classification information, extracted Data Field results, and various .NET functions to manipulate it.
- Using Data Field results
- In the case of the custom PDF metadata, the custom fields are generated from Data Fields in the document's Data Model and their collected results from the Extract activity.
- This means the document must be processed by the Extract activity in order to create and populate these custom fields.
Add the Behavior and Enable Metadata
|
Metadata is one of the configuration options for the PDF Data Mapping Behavior. A Content Type Behavior can tell an activity (specifically the Export activity, in the case of PDF Data Mapping) how to use the Content Type to do something (how to use the Content Model's collected Data Fields and other information to edit the generated PDF's metadata, in this case).
|
|
|
Edit Default PDF Metadata
Once enabled, the first six Metadata sub-properties all pertain to the default PDF metadata fields Grooper can edit: Title, Author, Subject, Creation Date, Modified Date, and Creator
These are edited with code expressions.
|
|
|
Add Keywords
Grooper can add keywords into the PDF's "Keywords" field in one of two ways, either using an expression or a referenced extractor's results.
|
In our case, we're going to use an expression to determine if the word count of the "Essay" document in the application packet is "Long", "Short", or "Normal".
If the word count is above 600 words, we'll call that a long essay. If it's below 400 words, we'll call that a short essay. And if it's anywhere in between, we'll call it a normal essay. The expression below uses a series of nested conditional statements using the IIf() function to accomplish this.
If the result is greater than 600 the keyword will evaluate to "Long Essay". Otherwise, if the result is less than 400, the keyword will evaluated to "Short Essay". If neither condition is met, the keyword evaluates to "Normal Essay". |
|
|
To use this expression to add the keyword to the generated PDF's metadata, we will configure the Keywords property.
|
|
|
Add Custom Metadata
|
Last but not least, you can add custom metadata fields to the generated PDF using extraction results from the document's Data Model. A custom metadata field is generated for every Data Field you choose in the Content Type's Data Model.
|
|||
|
To do this, we will use the Export Data Fields option of PDF Data Mapping's Metadata properties.
|
|||
|
|||
|
Export the Generated PDF
There's one last crucial step to using the PDF Data Mapping Behavior. Exporting the generated PDF.
There's no point in generating the PDF with all this additional metadata, bookmarks and annotations if you don't get it out of Grooper and into some kind of external storage platform. That's the job of the Export activity. To properly export the PDF generated by the PDF Data Mapping there are some specific requirements to keep in mind.
Add an Export Behavior
In order to export any document from Grooper, you need to configure an Export Behavior for the Export activity to know how you want to export document folders in a Batch and what external storage platform you're exporting them to. So, it makes sense this would be part of exporting the PDF generated by the PDF Data Mapping.
- And if you want to be really technical, the Export activity (using an Export Behavior configuration) will truly "generate" the PDF file, in terms of creating it. PDF Data Mapping gives the Export Behavior additional information about how to generate it, utilizing the additional metadata, bookmarking, and annotation elements.
|
There are two ways you can configure an Export Behavior: "Locally" to the Export activity or "Shared" with a Behavior configuration for a Content Type object (typically a Content Model) Configured "locally", you will set the Export Behavior configuration using the Export activity's properties.
|
|||
|
Configured as a "shared" Behavior, the Export Behavior is configured using a Content Type's Behaviors property.
|
|||
The Export activity needs to know how to export your documents. That's what an Export Behavior is for. If one is not configured locally, you need to let the Export activity know you want to use an Export Behavior configured on the Content Type or Document Types of the documents in the Batch. Furthermore, if you configure both a local Export Behavior and a shared Export Behavior, Grooper needs to know what one takes priority. That's why there are four different Shared Behavior Mode options.
These options will allow you to configure the order of execution logic for instances where you have both local and shared Export Behaviors configured. However, be aware configuring both is less common. Most Grooper users will elect to configure the Export Behavior on either the Content Type or the Export activity, but not both. But, if you are electing to configure the Export Behavior on a Content Type, you still must choose one of these four Shared Behavior Mode options. |
Configure the Export Behavior to Export PDFs
|
Whether you elect to use a local or shared Export Behavior, the next step is to configure it to export the document folders in the Batch as PDF files.
|
|
|
The last piece of the puzzle is just telling the Export Behavior what file format you want to use for the exported documents. To take advantage of PDF Data Mapping, we will want to tell it to export the documents as PDFs.
|
|
|
Additional Formatting Considerations
|
As far as the PDF Format property configuration goes, there are two properties that are particularly relevant to how it interacts with PDF Data Mapping.
The remaining PDF Format property configurates apply more generally to PDF file creation. While they may be important to your end goals, they are independent from PDF Data Mapping concerns. |
Version Differences
Behaviors are a new functionality in Grooper 2021'. Much of the PDF Data Mapping Behavior's functionality was not available in previous versions. Prior to version 2021, only annotation creation was possible using the Generate PDF activity. In version 2021, this activity has been replaced by the PDF Data Mapping Behavior, expanding its capabilities to generate bookmarks and document metadata as well.








































































