2024:Render (Activity): Difference between revisions

From Grooper Wiki
2024 updates // via Wikitext Extension for VSCode
No edit summary
Line 12: Line 12:
[[File:Asset 22@4x.png]]
[[File:Asset 22@4x.png]]
|
|
You may download and import the file(s) below into your own Grooper environment (version 2023).  There is a '''Batch''' with the example document(s) discussed in this tutorial.
You may download and import the file(s) below into your own Grooper environment (version 2024).  There is a '''Batch''' with the example document(s) discussed in this tutorial.
<br>
<br>
Please upload the '''Project''' to your '''Grooper''' environment before uploading the '''Batch'''. This will allow the documents within the '''Batch''' to maintain their classification status.
Please upload the '''Project''' to your '''Grooper''' environment before uploading the '''Batch'''. This will allow the documents within the '''Batch''' to maintain their classification status.

Revision as of 18:10, 19 November 2024

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

2025202420232.80
The Render activity's property panel

print Render is an Activity that converts files of various formats to PDF. It does this by digitally printing the file to PDF using the Grooper Render Printer. This normalizes electronic document content from file formats Grooper cannot read natively to PDF (which it can read natively), allowing Grooper to extract the text via the format_letter_spacing_wide Recognize Activity.

You may download and import the file(s) below into your own Grooper environment (version 2024). There is a Batch with the example document(s) discussed in this tutorial.
Please upload the Project to your Grooper environment before uploading the Batch. This will allow the documents within the Batch to maintain their classification status.

About

Render effectively "prints" the document as a PDF file, outputting a PDF document containing a scanned image of that document and any native text data. This is done using the Grooper Render Printer. In order to do this, four conditions must be met:

  1. The Grooper Render Printer must be installed on the machine running the Render activity. For information on how to install the Grooper Render Printer visit the How To section of this article.
  2. The Grooper Render Printer must be set as the default printer under for the user account running the Render activity.
  3. The native application for the file type must be installed on the machine running the Render activity. For example, Word must be installed in order for Render to render .docx files as PDFs.
  4. The native application must provide a shell print verb. In other words, when you right click the file in Windows File Explorer, a "Print" option is present.

Once the document is processed with the Render activity, you will be able to manipulate the document as if it is a PDF document in Grooper. You will be able to use other PDF related activities, such as Split Pages to split out individual pages of multi-page PDFs and the native PDF text extraction functionality of the Recognize activity.


Before rendering to PDF using Render


After rendering to PDF using Render

Render and Activity Processing

For Batch processing, the Render activity can only run on one individual machine at a time, using a single thread of processing resources. This means a Grooper Activity Processing service must be created using a Processing Queue with its Concurrency Mode set to PerMachine.

Processing Queues control how much of your system's resources are allocated to processing Activities. Most activities can take full advantage of your system's parallel processing resources and grab multiple threads at a time. Furthermore, multiple instances of an activity can generally run on multiple machines.

  • For example, four workstations working in the same Grooper Repository could each run two instances of the Recognize activity as long as there are the threads available to process each task.

However, the Render activity is different. It can only run on one machine at a time, using a single thread. To reiterate, that means only one Render task can perform at a time on a single machine at a time.

  • Each document rendered by the Render activity must be converted to a PDF one at a time.
  • Furthermore, if multiple machines are attempting to execute the Render activity, they have to wait their turn until the machine ahead of them is finished.

For more information on how to set this up, visit the How To section of this article.

How To

There are several components to consider for configuration of the Render activity.

Install the Grooper Render Printer

1. Download the Grooper Render Printer installer

First, download the Grooper Render Printer Installer. Click this link to download the installer files.

2. Extract the installer files and run the installer

Locate the downloaded zip file in Windows File Explorer. Right click the zip file and select "Extract All..." In the next window, choose the location to extract the installer files and press the "Extract" button.


Navigate through the extracted folders until you locate the "Setup.exe" file, seen below. Right click the "Setup.exe" file and select "Run as administrator"

3. Step through the installer prompts

Upon opening the "Setup.exe" file, the Grooper Render Printer Installer will run. Click the "Next" button to continue through the installer's prompts.


Press the "Finish" button to finish installing the Grooper Render Printer.

4. Set the Grooper Render Printer as the default printer

Open the Devices panel in your machine's Windows Settings. You will see the Grooper Render Printer listed in the "Printers & scanners" tab.


To make the Grooper Render Printer your default printer, select the Grooper Render Printer from the list of printers and scanners and press the "Manage" button.


On the following screen, press the "Set as default" button. Ensure that the "Printer status:" is listed as "Default"

Set up Activity Processing for Render

1. Add a new Processing Queue

In order to use the Render activity in a Batch Process, you will need to create a new Processing Queue.


  1. Processing Queues are created in the Queues folder.
  2. Right click the Queues folder and select "Add" then "Processing Queue..."
  3. Name the Processing Queue whatever you'd like. Press "Execute" to finish naming the Processing Queue and create it.
    • Since we are configuring an Activity Processing Service to run the Render activity, it makes sense to name this Processing Queue "Render".


  1. This creates a new Processing Queue object with the given name in the Node Tree.
  2. Change the Concurrency Mode property from Multiple to PerMachine
    • You may see a popup window saying "You must run Grooper as an Administrator to modify services." This is letting you know you will need to run Grooper Command Console as an administrator to create and edit the Activity Processing service that will use this Processing Queue. However, you must still create the Processing Queue from the Design page. Just press the "OK" button to get rid of this window.
  3. Press the "Save" button when finished.

Render must run using a PerMachine Concurrency Mode


2. Set up the Activity Processing service

Next, you must Install an Activity Processing Service using the Tread Pool created in the previous step. Grooper Services are installed through the Grooper Command Console application. You must run Grooper Command Console as an administrator to install and edit services.

You must run Grooper Command Console as an administrator to install or edit a Service.

  1. In Grooper Command Console enter the following command:
services install <connectionNo> <typeName> <userName> <password> [threadCount] [queueName]
  • <connectionNo> (required):
  • Replace this with the integer representing the appropriate connection. Use the connections list command to get a list of your connections.
  • <typeName> (required):
  • Since we are installing an Activity Processing service, replace this with ActivityProcessing.
  • <userName> and <password> (required):
  • Replace these with the appropriate Active Directory credentials of your Grooper Service User.
  • [threadCount] and [queueName] (optional):
  • If you want to specify a specific thread count, replace [threadCount] with an appropriate integer. This will need to be set to an integer of 1 for Render to function properly.
  • If you want to specify a queue name, replace [queueName] with the name of an appropriate Processing Queue object. Enter the name here of the Processing Queue that was created in the Queues folder object on the "Design" page.

3. Configure the Activity Processing service

Please refer to the command from the previous section. The configuration requirements for this service are set upon installation of the service in Grooper Command Console. The key considerations for configuration for Render is making sure the [threadCount] is set to 1 and the [queueName] is set to the name of the Processing Queue that was created in the Queues folder object on the "Design" page.

4. Set the Processing Queue in your Batch Process Step

Last, the Render step of your Batch Process must be told which Processing Queue to use.


  1. Select the Render step of your Batch Process.
  2. In the "Step Properties" panel, select the Queue Name property. Using the dropdown menu, select the Processing Queue created in Step 1 (Here, named "Render").


Set up Render Printer for multiple users and service accounts

Grooper Render Printer installs under the context of a single user: The account/user that is logged in when installing it for the first time.

  • In typical scenarios, this is the service account running the Activity Processing service processing Render tasks.

If only one account/user is ever going to process Render tasks, this is no problem. However, if multiple users need to process Render tasks, additional steps need to be taken to ensure multiple users can run the Grooper Render Printer.

To ensure a user/account who did not install the Grooper Render Printer can use it to execute Render tasks, perform the following configuration tasks:

  1. Log in as the user/service account that will execute Render activity tasks.
  2. Ensure the Grooper Render Printer is the default printer.
    • Open the "Printers & Scanners" settings in Windows. Select the Grooper Render Printer. If it is not already listed as "Default", select "Manage". Then, select "Set as default".
  3. Go to the local user/account's "Documents" folder in Windows File Explorer.
    • C:\Users\<loggedInUser>\Documents\
  4. Create a folder named GrooperRender
  5. Next, you will need to edit the environment variables for the logged in user (not the system variables).
    • An easy way to find these is to hit the Windows Start button and search for "Edit environment variables for your account".
  6. In the "Environment Variables" window, click "New..." under the "User variables for <loggedInUser>".
    • A "New User Variable" window will appear.
  7. For the "Variable Name:" enter GROOPER_RENDERPATH
  8. For the "Variable Value:" enter the path for the folder created in Step 3.
    • C:\Users\<loggedInUser>\Documents\GrooperRender
  9. Click "OK" in the "New User Variables" window.
  10. You should see the new variable added to the "User variables for <loggedInUser>" list. Click "OK" to finish.

Glossary

Activity Processing Concept: Activity Processing is the execution of a sequence of configured tasks which are performed within a settings Batch Process to transform raw data from documents into structured and actionable information. Tasks are defined by Grooper Activities, configurated to perform document classification, extraction, or data enhancement.

Activity Processing Service: Activity Processing is a Grooper Service that executes Activities assigned to edit_document Batch Process Steps in a settings Batch Process. This allows Grooper to automate Batch Steps that do not require a human operator.

Activity: Grooper Activities define specific document processing operations done to a inventory_2 Batch, folder Batch Folder, or contract Batch Page. In a settings Batch Process, each edit_document Batch Process Step executes a single Activity (determined by the step's "Activity" property).

  • Batch Process Steps are frequently referred by the name of their configured Activity followed by the word "step". For example: "Classify step".

Batch Process Step: edit_document Batch Process Steps are specific actions within a settings Batch Process sequence. Each Batch Process Step performs an "Activity" specific to some document processing task. These Activities will either be a "Code Activity" or "Review" activities. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Process Steps are frequently referred to as simply "steps".
  • Because a single Batch Process Step executes a single Activity configuration, they are often referred to by their referenced Activity as well. For example, a "Recognize step".

Batch Process: settings Batch Process nodes are crucial components in Grooper's architecture. A Batch Process is the step-by-step processing instructions given to a inventory_2 Batch. Each step is comprised of a "Code Activity" or a Review activity. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Processes by themselves do nothing. Instead, they execute edit_document Batch Process Steps which are added as children nodes.
  • A Batch Process is often referred to as simply a "process".

Batch: inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

Execute: tv_options_edit_channels Execute is an Activity that runs one or more specified object commands. This gives access to a variety of Grooper commands in a settings Batch Process for which there is no Activity, such as the "Sort Children" command for Batch Folders or the "Expand Attachments" command for email attachments.

Extract: export_notes Extract is an Activity that retrieves information from folder Batch Folder documents, as defined by Data Elements in a data_table Data Model. This is how Grooper locates unstructured data on your documents and collects it in a structured, usable format.

Grooper Repository: A Grooper Repository is the environment used to create, configure and execute objects in Grooper. It provides the framework to "do work" in Grooper. Fundamentally, a Grooper Repository is a connection to a database and file store location, which store the node configurations and their associated file content. The Grooper application interacts with the Grooper Repository to automate tasks and provide the Grooper user interface.

Grooper Service:

Machine: computer Machine nodes represent servers that have connected to the Grooper Repository. They are essential for distributing task processing loads across multiple servers. Grooper creates Machine nodes automatically whenever a server makes a new connection to a Grooper Repository's database. Once added, Machine nodes can be used to view server information and to manage Grooper Service instances.

Node Tree: The Node Tree is the hierarchical list of Grooper node objects found in the left panel in the Design Page. It is the basis for navigation and creation in the Design Page.

Processing Queue: memory Processing Queues help automate "machine performed tasks" (Those are Code Activity tasks performed by computer Machines and their Activity Processing services). Processing Queues are assigned to Batch Process Steps to distribute tasks, control the maximum processing rate, and set the "concurrency mode" (specifying if and how parallelism can occur across one or more servers).

  • Processing Queues are used to dedicate Activity Processing services with a capped number of processing threads to resource intensive activities, such as Recognize. That way, these compute hungry tasks won't gobble up all available system resources.
  • Processing Queues are also used to manage activities, such as Render, who can only have one activity instance running per machine (This is done by changing the queue's Concurrency Mode from "Maximum" to "Per Machine").
  • Processing Queues are also used to throttle Export tasks in scenarios where the export destination can only accept one document at a time.

Project: package_2 Projects are the primary containers for configuration nodes within Grooper. The Project is where various processing objects such as stacks Content Models, settings Batch Processes, profile objects are stored. This makes resources easier to manage, easier to save, and simplifies how node references are made in a Grooper Repository.

Recognize: format_letter_spacing_wide Recognize is an Activity that obtains machine-readable text from contract Batch Pages and folder Batch Folders. When properly configured with an library_booksOCR Profile, Recognize will selectively perform OCR for images and native-text extraction for digital text in PDFs. Recognize can also reference an perm_mediaIP Profile to collect "layout data" like lines, checkboxes, and barcodes. Other Activities then use this machine-readable text and layout data for document analysis and data extraction.

Render: print Render is an Activity that converts files of various formats to PDF. It does this by digitally printing the file to PDF using the Grooper Render Printer. This normalizes electronic document content from file formats Grooper cannot read natively to PDF (which it can read natively), allowing Grooper to extract the text via the format_letter_spacing_wide Recognize Activity.

Service: Grooper Services are various executable programs that run as a Windows Service to facilitate Grooper processing. Service instances are installed, configured, started and stopped using Grooper Command Console (or in older Grooper versions, Grooper Config).

Split Pages: Multi-page PDF and TIF files come into Grooper as files attached to single folder Batch Folders. Split Pages is an Activity that creates child contract Batch Pages for each page in the PDF or TIF. This allows Grooper to process and handle these pages as individual objects.

Split: Split is a Collation Provider option for pin Data Type extractors. Split separates a data instance at each match returned by the Data Type. The results are used as anchor points to "split" text into one or more smaller parts.

Thread: A Thread is the smallest unit of processing that can be performed within an operating system. In Grooper, threads are allocated for processing by Activity Processing services.