2.72:XML Transform (Activity)

From Grooper Wiki

This article is about an older version of Grooper.

Information may be out of date and UI elements may have changed.

20252.72
The XML Transform activity's property panel.

code_blocks XML Transform is an Activity that applies XSLT stylesheets to XML data to modify or reformat the output structure for various purposes.


Glossary

Activity: Grooper Activities define specific document processing operations done to a inventory_2 Batch, folder Batch Folder, or contract Batch Page. In a settings Batch Process, each edit_document Batch Process Step executes a single Activity (determined by the step's "Activity" property).

  • Batch Process Steps are frequently referred by the name of their configured Activity followed by the word "step". For example: "Classify step".

Batch Folder: The folder Batch Folder is an organizational unit within a inventory_2 Batch, allowing for a structured approach to managing and processing a collection of documents. Batch Folder nodes serve two purposes in a Batch. (1) Primarily, they represent "documents" in Grooper. (2) They can also serve more generally as folders, holding other Batch Folders and/or contract Batch Page nodes as children.

  • Batch Folders are frequently referred to simply as "documents" or "folders" depending on how they are used in the Batch.

Batch Process: settings Batch Process nodes are crucial components in Grooper's architecture. A Batch Process is the step-by-step processing instructions given to a inventory_2 Batch. Each step is comprised of a "Code Activity" or a Review activity. Code Activities are automated by Activity Processing services. Review activities are executed by human operators in the Grooper user interface.

  • Batch Processes by themselves do nothing. Instead, they execute edit_document Batch Process Steps which are added as children nodes.
  • A Batch Process is often referred to as simply a "process".

Batch: inventory_2 Batch nodes are fundamental in Grooper's architecture. They are containers of documents that are moved through workflow mechanisms called settings Batch Processes. Documents and their pages are represented in Batches by a hierarchy of folder Batch Folders and contract Batch Pages.

Execute: tv_options_edit_channels Execute is an Activity that runs one or more specified object commands. This gives access to a variety of Grooper commands in a settings Batch Process for which there is no Activity, such as the "Sort Children" command for Batch Folders or the "Expand Attachments" command for email attachments.

Export: output Export is an Activity that transfers documents and extracted information to external file systems and content management systems, completing the data processing workflow.

Test Batch: "Test Batch" is a specialized Import Provider designed to facilitate the import of content from an existing inventory_2 Batch in the test environment. This provider is most commonly used for testing, development, and validation scenarios, and is not intended for production use.

  • Looking for information on "production" vs "test" Batches in Grooper? See here.

XML Transform: code_blocks XML Transform is an Activity that applies XSLT stylesheets to XML data to modify or reformat the output structure for various purposes.

Version Differences

As of 2.72, Grooper uses XSLT 1.0 to apply XML transformations. XSLT (or eXtensible Stylesheet Language Transformations) is a language for transforming XML documents into other formats. Using XSLT, Grooper can output XML in virtually any layout.

A good example of how XSLT transforms an XML document into HTML can be found following this link.

Examples

For example, this is the XML data from an extremely simple batch process getting the invoice number off various documents.

<Document Id="94b9a646-f926-4a7e-9df3-3f70d88fcd80" Name="Generic Invoice (1)" TypeId="22684b3b-e266-4350-bfdb-96bf90bae207" TypeName="Acme">
  <Field Name="Invoice Number" Confidence="1.00" Page="1" Valid="True" Location="5.103, 7.030, 1.927, 0.093">74449788</Field>
</Document>

XML is designed to be both machine-readable and human readable. You can parse through the information here. You can track down the invoice number by locating Field Name="Invoice Number" and going down the line until you get to the actual number. But maybe we don't need or want all that extra metadata. What if it's just junk to our end process?

We can edit our XSL stylesheet to do just that. Under the "General" heading, selecting XML Transform transform property will bring up an editor to write the stylesheet according to our needs.

Configure

Select the XML Transform property and press the ellipsis button.



Customize

This is the boilerplate XSL stylesheet. From here you can edit the stylesheet to output whatever format you desire.



Set File Name

Give the transformation a filename under "Output Filename".



Observing Results

If you select a document's Batch Folder within its Batch, a transformed XML file will appear in the "Files" sub-tab under the "Advanced" tab.



Given our example, if we were to type the following XSL stylesheet into the editor...

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
  <xsl:output method="xml" indent="yes" />
  <xsl:template match="@*|node()">
    <Invoice>
      <xsl:for-each select="Field">
        <xsl:element name="{translate(@Name, ' ', '_')}">
          <xsl:value-of select="self::node()" />
        </xsl:element>
      </xsl:for-each>
    </Invoice>
  </xsl:template>
</xsl:stylesheet>

...this is how our original XML data would be transformed.

<Invoice>
  <Invoice_Number>74449788</Invoice_Number>
</Invoice>

All that extra junk we didn't want is gone. Now imagine you had forty data elements you wanted to extract and just wanted the field information and nothing else. XML transform using XSLT is a way to accomplish just that.

Here's a slightly more complex example, with the stylesheet on the left and the input/output files on the right.


XML Transform Tester

Once you add an "XML Transform" step to your batch process, you can use the "XML Transform Tester". This is an exceptionally handy tool to transform XML data using XSL stylesheets. This way you can see if your transform works and how it will output, all within Grooper.

Location

Navigate to the "XML Transform" step in your "Batch Process".



Testing Tab

Switch to the "XML Transform Tester" tab.



Select a Batch

Select a Test Batch from the Batch dropdown.



Write Customizated Transform

Write your transformation under the "XSL Transform" panel.



Testing

Press the "Execute" button to see the fruits of your labor, or, if it fails, the anti-fruits of your labor under the "Output File" panel.



Save

Press the "Save" button.



Exporting Transformed XML

Once you run the "XML Transform" activity, the transformed metadata is saved to a file on the document at the Batch Folder level of the Batch. You can export the metadata using "Document Export". To do this, in the Document Export Settings, change the "Metadata Format" property to "Custom" and type the filename you gave in the XML Transform step under "Custom File". This will export the transformed version of the metadata rather than the original metadata.