2023.1:Merge (Activity)

From Grooper Wiki
Revision as of 11:46, 28 August 2024 by Randallkinard (talk | contribs) (Randallkinard moved page Merge (Activity) to 2023.1:Merge (Activity))

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252023.1

file_save Merge is an Activity that creates a PDF, TIF, XML or ZIP file from the page and data content of a Batch Folder and saves it to that Batch Folder.

You may download the ZIP(s) below and upload it into your own Grooper environment (version 2023.1). The first contains one or more Batches of sample documents. The second contains one or more Projects with resources used in examples throughout this article.

About

So, you've gone through the first 4 phases of Grooper: Aquire, Condition, Organize, and Collect.

You have extracted data from your documents and you want to now build a document. The Export activity can build the document and export it outside of grooper all in one step. However, you might have a desire to build the document before executing the Export step in your Batch Process. You can do this with the Merge activity.

The Merge activity builds a document from the Pages in a Batch and saves that document to the Batch Folder. You can choose to build your document in a specific format such as a PDF or TIFF file. You can also automatically rename the documents as they're built.

Three Ways to Name

There are three different ways to name the final Merged file: Use the Custom Filename property, use the Attachment Name property, or use the default naming settings.

The most important difference among these methods is whether or not Grooper creates a new file and attaches that new file to the Batch Folder leaving the original content intact, or whether it overwrites the main content of the Batch Folder.



Which method is best depends on your specific company needs, so choose the option that best applies to your use case. We will go over how to set all of this up in the How To section of this article.

How To

The Merge activity is applied to a Batch via a Batch Process Step. The following tutorial will show you how to add the Batch Process Step and how to configure the Merge activity.

Creating the Batch Process Step

The first thing we need to do is add a Merge Batch Process Step to our Batch Process and set the Scope at which the step will run.

  1. Right-click on the Batch Process.
  2. Hover over "Add Activity", then hover over "Transform". Finally, Click on "Merge..."
  3. When the "Add Activity" window pops up, feel free to change the name of the step in the Step Name property. For this tutorial we will be using the default of "Merge".
  4. Click "EXECUTE" located in the top right hand corner of the pop up window.


  1. Now you should have a Merge step in your Batch Process.
  2. Your scope should be set to the Folder Level containing the pages that will be combined into a document. In our case that is Folder Level 1, so we have set the Scope property to Folder and the Folder Level property to 1.


Configuring the Merge Activity

Now that we have our Batch Process Step in our Batch Process we need to configure the Step to tell Grooper how to Merge the pages.

  1. Click the hamburger icon to the right of the Merge Format property in the rightmost property grid and select the desired file type from the drop down list. For this tutorial, we will select PDF.


  1. You can set a name for the resulting merged document using either the Custom Filename or Attachment Name property. Ignore the Save As property as it is currently broken in 2023 and 2023.1.

The Custom Filename Property

The first of the two naming properties is the Custom Filename property. When using this to name your merged file, the file will be named literally what you type into the property, including the file extension. It will then create a copy of the file and attach it as a separate attachment to the Batch Folder. The original file will remain intact and unchanged on the Batch Folder.

  1. Enter in a name you want to name the PDFs in the Custom Filename property located in the rightmost property grid. Include the file extension in the file name.
  2. Click the save icon at the top right of the middle property grid.

You MUST include the file extension in the naming of your file. If you do not, you will not get a useable file once the documents are merged.


  1. Click over to the "Activity Tester" tab.
  2. Select the Batch Folder containing the document you want to Merge.
  3. Click the Test icon at the top right of the Batch Viewer.


At first glance, it might appear that the Merge activity didn't actually do anything. This is because it has not altered the main content of the Batch. If you would like to view the newly created file, follow these additional steps:

  1. Navigate through your Test Batches in the node tree to the Batch Folder you tested the Merge activity on.
  2. Click on the "Advanced" tab.
  3. Under "FILES" on the right-hand panel, you should see the new merged PDF file.


Why You Need to Include the File Extension

  1. If you do not include the file extension in your Custom Filename...


  1. ... you will still get an attached file, but it will not be a PDF or other useable file type.


The Attachment Name Property

The Attachment Name property is a little more complicated than the Custom Filename property. The Attachment Name property uses expressions to name your file. In this tutorial, we will be using String Interpolation expressions to name our files.

Also, the Merged file simply overwrites the main content attached to the Batch Folder rather than creating a brand new file. So, in our examples you will see the file name on the Batch Folder itself change.

  1. Click the ellipsis icon to the right of the Attachment Name property.
  2. When the "Attachment Name" window pops up, insert an expression for how you want to name the attachments. In our example we have used a String Interpolation expression to name the document with a random GUID:
    $"{Guid.NewGuid}.pdf
  3. Click "OK" located in the top right of the pop-up window.


  1. Click the save icon at the top right of the middle property grid to save your changes.


  1. Click over to the "Activity Tester" tab.
  2. Select the Batch Folder containing the document to be merged in the Batch Viewer.
  3. Click the Test icon at the top right of the Batch Viewer.


  1. The file has now been merged, named, and replaced the existing file in the Batch. You can see the new name of the file under the Batch Folder name.


Selecting the Content Type

If you want to use Data Elements from your Content Model to name your files, you can do this by selecting your Content Model in the Content Type property. This will provide the scope from which you can call Data Elements in your String Interpolation code.

  1. Under our Content Model, we have a Data Model with two Data Fields: Company Name and Contact Name. Let's say we want to use these two Data Fields to name the file.
  2. Click on the hamburger icon to the right of the Content Type property to access the drop down.
  3. Navigate to and select the Content Type that contains the information you want to include in your naming expression. In this case, we're selecting the Content Model containing the Data Fields we want to use to name the file.


  1. Click the ellipsis to the right of the Attachment Name property.
  2. Start your String Interpolation expression with $" then add a { to bring up the intellisense and choose the first part of the code to call.
  3. The intellisense will show you what you can use in your expression within the scope of your specified Content Type. We can see that we have access to both of the Data Fields we want to use for the naming convention.


  1. Finish your expression. In our example, we used:
    $"{Company_Name}.{Contact_Name}.pdf"
  2. Click "OK" located in the top right of the "Attachment Name" pop-up window.


  1. Click the save icon at the top right of the middle property grid.


  1. Click over to the "Activity Tester" tab.
  2. Select the Batch Folder containing the document you want to merge in the Batch Viewer.
  3. Click the Test icon in the top right of the Batch Viewer.


  1. The file has been merged, named, and replaced the original file. You can see the new name under the Batch Folder name in the Batch Viewer.


Using the Defaults

If you choose to not enter a Custom Filename or Attachment Name, Grooper will still run the Merge activity. However, the Merged file will overwrite the main content of the folder and simply be named the same as the Batch Folder.

  1. If you leave the naming properties as their defaults, the Step will still Merge the document.
  2. Click the save icon at the top of the middle property grid if needed.


  1. Click over to the "Activity Tester" tab.
  2. Select the Batch Folder containing the document you want to Merge in the Batch Viewer.
  3. Click the Test icon in the top right of the Batch Viewer.


  1. The merged PDF file is named the same as the Batch Folder.


The Clear On Completion Property

Enabling the Clear on Completion property will delete the Page objects from your Batch Folder once the Merge activity has been completed.

  1. Set the Clear on Completion property to True.
  2. Click the save icon located at the top of the middle property grid.


  1. Click over to the "Activity Tester" tab.
  2. We have expanded out the Batch Folder in the Batch Viewer to show that this document has one Page inside.
  3. Make sure the Batch Folder is selected.
  4. Click the Test icon located in the top right of the Batch Viewer.


  1. After testing, you can see that the file has been built and named (notice the change in the file name underneath the name of the Batch Folder).
  2. However, there are no more Page objects in the Batch Folder. They have been cleared once the Merge Activity completed.



Glossary

Activity: Grooper Activities define specific document processing operations done to a inventory_2 Batch, folder Batch Folder, or contract Batch Page. In a settings Batch Process, each edit_document Batch Process Step executes a single Activity (determined by the step's "Activity" property).

  • Batch Process Steps are frequently referred by the name of their configured Activity followed by the word "step". For example: "Classify step".

Batch: inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

Batch Folder: The folder Batch Folder is an organizational unit within a inventory_2 Batch, allowing for a structured approach to managing and processing a collection of documents. Batch Folder nodes serve two purposes in a Batch. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other Batch Folders and/or contract Batch Page nodes as children.

  • Batch Folders are frequently referred to simply as "documents" or "folders" depending on how they are used in the Batch.

Batch Process Step: edit_document Batch Process Steps are specific actions within a settings Batch Process sequence. Each Batch Process Step performs an "Activity" specific to some document processing task. These Activities will either be a "Code Activity" or "Review" activities. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Process Steps are frequently referred to as simply "steps".
  • Because a single Batch Process Step executes a single Activity configuration, they are often referred to by their referenced Activity as well. For example, a "Recognize step".

Batch Process: settings Batch Process nodes are crucial components in Grooper's architecture. A Batch Process is the step-by-step processing instructions given to a inventory_2 Batch. Each step is comprised of a "Code Activity" or a Review activity. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Processes by themselves do nothing. Instead, they execute edit_document Batch Process Steps which are added as children nodes.
  • A Batch Process is often referred to as simply a "process".

Content Model: stacks Content Model nodes define a classification taxonomy for document sets in Grooper. This taxonomy is defined by the collections_bookmark Content Categories and description Document Types they contain. Content Models serve as the root of a Content Type hierarchy, which defines Data Element inheritance and Behavior inheritance. Content Models are crucial for organizing documents for data extraction and more.

Content Type: Content Types are a class of node types used used to classify folder Batch Folders. They represent categories of documents (stacks Content Models and collections_bookmark Content Categories) or distinct types of documents (description Document Types). Content Types serve an important role in defining Data Elements and Behaviors that apply to a document.

Data Element: Data Elements are a class of node types used to collect data from a document. These include: data_table Data Models, insert_page_break Data Sections, variables Data Fields, table Data Tables, and view_column Data Columns.

Data Field: variables Data Fields represent a single value targeted for data extraction on a document. Data Fields are created as child nodes of a data_table Data Model and/or insert_page_break Data Sections.

  • Data Fields are frequently referred to simply as "fields".

Data Model: data_table Data Models are leveraged during the Extract activity to collect data from documents (folder Batch Folders). Data Models are the root of a Data Element hierarchy. The Data Model and its child Data Elements define a schema for data present on a document. The Data Model's configuration (and its child Data Elements' configuration) define data extraction logic and settings for how data is reviewed in a Data Viewer.

Export: output Export is an Activity that transfers documents and extracted information to external file systems and content management systems, completing the data processing workflow.

Merge: file_save Merge is an Activity that creates a PDF, TIF, XML or ZIP file from the page and data content of a Batch Folder and saves it to that Batch Folder.

Scope: The Scope property of a edit_document Batch Process Step, as it relates to an Activity, determines at which level in a inventory_2 Batch hierarchy the Activity runs.

Test Batch: "Test Batch" is a specialized Import Provider designed to facilitate the import of content from an existing inventory_2 Batch in the test environment. This provider is most commonly used for testing, development, and validation scenarios, and is not intended for production use.

  • Looking for information on "production" vs "test" Batches in Grooper? See here.