XML Transform

From Grooper Wiki
Jump to navigation Jump to search
The XML Transform activity's property panel.

XML Transform is an activity that transforms XML data from a Grooper document into another format, such as HTML or text.

Version Differences

As of 2.72, Grooper uses XSLT 1.0 to apply XML transformations. XSLT (or eXtensible Stylesheet Language Transformations) is a language for transforming XML documents into other formats. Using XSLT, Grooper can output XML in virtually any layout.

A good example of how XSLT transforms an XML document into HTML can be found following this link.

Examples

For example, this is the XML data from an extremely simple batch process getting the invoice number off various documents.

<Document Id="94b9a646-f926-4a7e-9df3-3f70d88fcd80" Name="Generic Invoice (1)" TypeId="22684b3b-e266-4350-bfdb-96bf90bae207" TypeName="Acme">
  <Field Name="Invoice Number" Confidence="1.00" Page="1" Valid="True" Location="5.103, 7.030, 1.927, 0.093">74449788</Field>
</Document>

XML is designed to be both machine-readable and human readable. You can parse through the information here. You can track down the invoice number by locating Field Name="Invoice Number" and going down the line until you get to the actual number. But maybe we don't need or want all that extra metadata. What if it's just junk to our end process?

We can edit our XSL stylesheet to do just that. Under the "General" heading, selecting XML Transform transform property will bring up an editor to write the stylesheet according to our needs.

Configure

Select the XML Transform property and press the ellipsis button.


1558116732257-673.png


Customize

This is the boilerplate XSL stylesheet. From here you can edit the stylesheet to output whatever format you desire.


1558116737974-509.png


Set File Name

Give the transformation a filename under "Output Filename".


1559233469411-730.png


Observing Results

If you select a document's Batch Folder within its Batch, a transformed XML file will appear in the "Files" sub-tab under the "Advanced" tab.


1559233307381-520.png


Given our example, if we were to type the following XSL stylesheet into the editor...

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0">
  <xsl:output method="xml" indent="yes" />
  <xsl:template match="@*|node()">
    <Invoice>
      <xsl:for-each select="Field">
        <xsl:element name="{translate(@Name, ' ', '_')}">
          <xsl:value-of select="self::node()" />
        </xsl:element>
      </xsl:for-each>
    </Invoice>
  </xsl:template>
</xsl:stylesheet>

...this is how our original XML data would be transformed.

<Invoice>
  <Invoice_Number>74449788</Invoice_Number>
</Invoice>

All that extra junk we didn't want is gone. Now imagine you had forty data elements you wanted to extract and just wanted the field information and nothing else. XML transform using XSLT is a way to accomplish just that.

Here's a slightly more complex example, with the stylesheet on the left and the input/output files on the right.


1558120786930-562.png 1558120795171-471.png

XML Transform Tester

Once you add an "XML Transform" step to your batch process, you can use the "XML Transform Tester". This is an exceptionally handy tool to transform XML data using XSL stylesheets. This way you can see if your transform works and how it will output, all within Grooper.

Location

Navigate to the "XML Transform" step in your "Batch Process".


1559232072814-878.png


Testing Tab

Switch to the "XML Transform Tester" tab.


1559227369025-119.png


Select a Batch

Select a Test Batch from the Batch dropdown.


1559227492218-974.png


Write Customizated Transform

Write your transformation under the "XSL Transform" panel.


1559228078018-894.png


Testing

Press the "Execute" button to see the fruits of your labor, or, if it fails, the anti-fruits of your labor under the "Output File" panel.


1559228133407-190.png


Save

Press the "Save" button.


1559228191955-313.png


Exporting Transformed XML

Once you run the "XML Transform" activity, the transformed metadata is saved to a file on the document at the Batch Folder level of the Batch. You can export the metadata using "Document Export". To do this, in the Document Export Settings, change the "Metadata Format" property to "Custom" and type the filename you gave in the XML Transform step under "Custom File". This will export the transformed version of the metadata rather than the original metadata.