2.90:Content Action

From Grooper Wiki

Content Action is an Activity providing additional functionality for multi-page file formats (PDF and TIF files) through one of five actions: Split, Merge, ClearChildren, ClearContent, and RepairPDF.

Most commonly, this Activity is used to split multipage documents in a Batch, creating child Batch Page objects for each page in the file (using the Split action). This Activity is also used to merge child Batch Pages and Batch Folders into a multipage file, stored on the parent Batch Folder (using the Merge action). Though less common, there is additional functionality to delete child objects (using the ClearChildren action), remove a PDF file from a Batch Folder (using the ClearContent action), and repair a PDF file (using the RepairPDF action).

About

The Content Action activity manipulates the content files of a Batch Folder in a Batch, either a native file stored on the Batch Folder (typically a multipage PDF or TIF file) or the Batch Folder's child folder and page objects. What happens is determined by its Action property. This can be one of five choices:

  • Split
  • Merge
  • ClearChildren
  • ClearContent
  • RepairPDF

Each Action option has its own property configuration options (with ClearChildren, ClearContent, and RepairPDF having no further configuration required).


Split

The Split Action is the most commonly used functionality for the Content Action activity. It will split out the pages of an imported multipage PDF or TIF file.

When PDF files are imported into a new Batch, a Batch Folder object is created for each multipage PDF file imported. The PDF file is stored as the "native file version" of the Batch Folder. It lives in the file store location associated with the Batch Folder (or in layman's terms it lives "on the document folder").

However, what if Grooper needs to process each page instead of the full document? It needs a page-level object to do page-level processing. This is what the Split Action accomplishes. It takes that native file living on the Batch Folder and creates child Batch Page objects from it, one Batch Page object for each page in the native multipage file.

For this Batch, a single PDF was imported when the Batch was created.

  1. For each imported PDF, a new Batch Folder object is created. The PDF file is stored as the "native version" of this object.
    • It is the original content file for the document folder, the content foundation for all future document processing.
    • This is a three page PDF file. But notice this Batch Folder object is a single object in the Batch. There are no Batch Page objects for the document folder.
  2. After the Content Action activity runs, with the Split Action selected, one child Batch Page object is created for each page in the native PDF.
    • The native PDF file was three pages long. So, we get three Batch Page child objects.
    • Each page is split from the PDF file in numerical order. The first child Batch Page object is the first in the PDF, the second the second, and so on.

You may further configure the Split Action with the Split Options properties. These properties control how the child page objects are created, including their resolution, file format, and color depth settings.

Merge

The Merge Action will create a multipage PDF or TIF file, stored on the Batch Folder, created from the Batch Folder's child objects.

Some people think of the Merge Action as Split in reverse. Whereas Split creates child Batch Page objects out of a PDF or TIF file on the parent document folder, Merge does the opposite. Merge creates a PDF or TIF file, stored on the parent document folder, out of child Batch Page objects.

Note: A PDF or TIF file will be merged from all child content for a Batch Folder. A single file will be created even if the the Batch Folder has its own subfolders with their own child pages. One common use of the Merge action is to create a single PDF from an email, merging the email message text file with an attachment document (often itself a PDF file).

In this case, the Batch Folder in this Batch is just a generic folder with three child Batch Page objects.

  1. The Merge Action will use the child content of the Batch Folder to form the PDF (or TIF) file.
    • In this case these three Batch Page objects will be merged into a single PDF file.
  2. After the Content Action activity runs, with the Merge Action selected, a multipage PDF (or TIF) is created and stored on the Batch Folder.
    • There are three child Batch Pages. So, we get a PDF file three pages long.
    • Each child page is merged in numerical order. The first child Batch Page object is the first in the PDF, the second the second, and so on.
    • Again, if you want to be technical about it, the PDF is stored in the file store location associated with the Batch Folder object (much like the image file for a Batch Page is stored in the file store location associated with that Batch Page object).

The child content can be merged into a PDF or TIF file.

Depending on which format you choose, there are additional options for creating the merged file. For example, the PDF format includes an option to include the OCR text data for each page in the merged PDF via the Make Searchable property.

ClearChildren

The ClearChildren Action is a destructive action. Whereas the Split action creates objects, the ClearChildren action deletes them. When applied to a Batch Folder, ClearChildren will delete all child pages and folders below it.

Furthermore, it deletes all child objects. If you have a hierarchy of Batch Folder and Batch Pages with their own child Batch Folders and Batch Pages, all of them are deleted. Not just the Batch Pages. Not just the Batch Pages at the first child level. ClearChildren clears all children.

The ClearChildren Action has no further configuration options.

ClearContent

The ClearContent Action is a destructive action. Whereas the Merge action creates a file stored on a document folder, the ClearContent action deletes it. When applied to a Batch Folder, ClearContent will delete a native PDF (or TIF) version, if present.

The ClearContent Action has no further configuration options.

RepairPDF

Changes to Content Action in Version 2021

There is a big change to the Content Action activity in Grooper Version 2021. It doesn't exist anymore.

Don't fret! It's functionality is still accessible depending on the Action type.

  • For the Split action, a new activity named Split Pages replaces and supplements its functionality.
  • For the Merge action, a new activity named Merge replaces and supplements its functionality.
  • The ClearChildren, ClearContent and RepairPDF actions are replaced by different commands using the Execute activity.

Why did we do this? This has to do with our evolving "Smart PDF Architecture". In version 2021, we started digging into ways we can more fully utilize the capabilities of the PDF file format. The big ticket item for this is the PDF Generate Behavior. However, anything PDF related ended up getting touched, including splitting and merging PDFs. As the split and merge capabilities grew in version 2021 with increased attention to the PDF file format, it made more sense to isolate these two document processing functions as whole activities (Split Pages and Merge respectively) than as two different property configurations of a single activity (the Split and Merge action types for the Content Action activity).

As for the remaining actions (ClearChildren, ClearContent, and RepairPDF), there was always a way of accomplishing the exact same thing in a Batch Process using the Execute activity. What's the difference between the Content Action activity set to ClearChildren and the Execute activity set to a Clear Children command for a Batch Folder? Nothing. There is no difference. They do the exact same thing. They delete the child objects of a Batch Folder in both cases.

Why have two activities that do the same thing? To simplify things, we just got rid of the Content Action activity entirely. In version 2021, you'll use the analogous Execute activity command for the ClearChildren, ClearContent, and RepairPDF actions.

See below for more information on each Content Action action type changes in version 2021.

Split

The Content Action Split action is replaced by the Split Pages activity.


Changes

The Content Action Split action is now its own activity, named Split Pages.

Some properties have had their names changed and/or moved around in the property grid.

  1. Render Resolution is now under PDF Options > Rendering > Resolution.
  2. Target Image Format is now under PDF Options > Rendering > Color Format.
  3. Auto Color Depth Settings can now be enabled or disabled and has moved to PDF Options > Rendering > Color Depth Detection.
  4. Flag Conversion Issues has been renamed to Flag Issues.

New Properties

  1. Page Filter – Lets you specify which pages the activity will run on.
  2. Overwrite – Defines how documents which already have children will be handled.
  3. PDF Page Extraction – Specifies options that can assist with how PDF resources are handled.
    • Note: If you're wanting to split the PDF pages as images only, set this property to Disabled.
  4. Image Bursting – Enables or disables extraction of images from image-based PDF pages.
  5. Replicate Bookmarks – Allows bookmarked PDF’s to be organized in subfolders that replicate the bookmark hierarchy in the PDF file.

Merge

The Content Action Merge action is replaced by the Merge activity.


The Content Action Merge action is now its own activity, named Split Pages.

You still have two options for merging child content into a multipage filetype, controlled by the Merge Format property.

  1. PDF Format
  2. TIF Format

Changes - PDF Format

The TIF Format option configuration is unchanged. However, there is a lot more you can do with the PDF Format with the Merge activity in 2021 than you could with the Merge Content Action in 2.90.

  1. The Make Searchable property has moved to Build Options > Searchable
  2. The Linearized property has moved to Build Options > Linearized.
  3. The Jpeg Quality property has been replaced with the Boolean Compressed property.
  4. The PDF Page Source and Prefer Child Versions properties have been normalized to a single Always Build property.
    • Setting this property to True will always generate the PDF from the document folder's child Batch Page and Batch Folder objects, even if a PDF version already exists on the document folder.


New Properties - PDF Format

Additional Build Options Display Mode Viewer Preferences Generate Mode

New Properties - All Formats

  1. Clear on Completion – Lets you delete or keep the pages that were merged.
  2. Output Filename – Lets you specify a filename for the merged file

ClearChildren

Changes

The only change is that ClearChildren is now a Command Entry in the Execute activity. There are no properties to set.

ClearContent

Changes

ClearContent is now Remove PDF Version as a Command Entry in the Execute activity. There are no properties to set.

RepairPDF

Changes

RepairPDF is now Repair as a Command Entry in the Execute activity when the Object Type is set to PDF Document.