2021:Split Pages (Activity)
Split Pages is an activity that will split a multi-page PDF or TIF document into individual pages.
When applied to a Batch Folder with an attached PDF or TIF file, the Split Pages activity will create a Batch Page object for each page in the file, which are created as children of the Batch Folder.
About
|
Split Pages if often a critical component to a Batch Process where documents are imported into new Batches from a digital source (as opposed to scanned paper documents). When a digital file is imported into Grooper, two things happen:
|
|
|
We can also process this document at this point. We can apply Grooper activities at the folder level, to this Batch Folder (by setting a Batch Process Step's Scope property to Folder). An activity running on the folder level can manipulate the content in the attached file. For example, if we ran the Recognize activity at the folder level, it would obtain text data from the attached PDF file. |
|
|
|
Why Split Pages?
There are two reasons to use the Split Pages activity to split out pages from a multipage document.
- To apply activities that require Batch Page objects to function.
- Namely the Image Processing and Separate activities.
- To increase compute efficiency.
- A Batch Folder is a single object, which can be processed by a single processing thread. If you split out the attached document's pages, each page becomes its own object in the Batch. Each page can also only be processed by a single thread, but with multiple page objects now present, multiple threads can now be used to process the document (one for each page).
Splitting Pages for Specific Activities
Certain Grooper activities require Batch Page objects by design.
- The Separate activity separates loose pages into folders. If there's no Batch Page objects, there's nothing to separate.
- The Image Processing activity applies an IP Profile to mutate a page's image in order to clean it up before OCR processing during the Recognize activity. In all but the narrowest of use cases, the Image Processing activity must process Batch Page objects, not Batch Folders. If there's no Batch Page objects, there's nothing for the IP Profile to clean up.
|
So, first we need to run the Split Pages activity to add page objects we can manipulate. Then, we can run the Separate activity to separate those pages into folders. |
|
|
|
|
|
|





