2021:Split Pages (Activity)

From Grooper Wiki
Revision as of 11:02, 25 April 2022 by Dgreenwood (talk | contribs) (Created page with "<blockquote style="font-size:125%"> '''Split Pages''' is an activity that will split a multi-page PDF or TIF document into individual pages. </blockquote> When applied to a '...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Split Pages is an activity that will split a multi-page PDF or TIF document into individual pages.

When applied to a Batch Folder with an attached PDF or TIF file, the Split Pages activity will create a Batch Page object for each page in the file, which are created as children of the Batch Folder.

About

Split Pages if often a critical component to a Batch Process where documents are imported into new Batches from a digital source (as opposed to scanned paper documents). When a digital file is imported into Grooper, two things happen:

  1. A Batch Folder object is created in the Batch
  2. The digital file is attached to the Batch Folder.


At this point, the document's content is accessible at the folder level only.

  • For example, we can select this folder and we can navigate through the pages in the attached multipage PDF using the page navigator in a Document Viewer.

We can also process this document at this point. We can apply Grooper activities at the folder level, to this Batch Folder (by setting a Batch Process Step's Scope property to Folder). An activity running on the folder level can manipulate the content in the attached file. For example, if we ran the Recognize activity at the folder level, it would obtain text data from the attached PDF file.


The Split Pages activity allows us to process the document's content at the page level.

  1. When Split Pages is applied to a Batch Folder it will create child Batch Page objects from an attached PDF or TIF file.
    • One Batch Page for each page in the multipage PDF or TIF.
  2. Now that we have individual objects in the Batch for each page in the PDF or TIF, we can then select and process each page individually.

Why Split Pages?

There are two reasons to use the Split Pages activity to split out pages from a multipage document.

  1. To apply activities that require Batch Page objects to function.
    • Chiefly the Image Processing and Separate activities.
  2. To increase compute efficiency.
    • A Batch Folder is a single object, which can be processed by a single processing thread. If you split out the attached document's pages, each page becomes its own object in the Batch. Each page can also only be processed by a single thread, but with multiple page objects now present, multiple threads can now be used to process the document (one for each page).

Bursting and Rendering PDFs