XML Transform (Activity)

From Grooper Wiki
(Redirected from XML Transform)

This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025 2.72

code_blocks XML Transform is an Activity that applies XSLT stylesheets to XML data to modify or reformat the output structure for various purposes.

About

XML Transform is an activity that transforms XML data to a new file using an XSLT (eXtensible Stylesheet Language Transformations) stylesheet. A source XML file is used to generate a different text-based file format (XML, TXT, HTML, etc) according to the transformation rules in the XSLT code.

The general setup process is:

  1. Configure the "Source" setting. The XML source can be:
    • Data - A document's extracted Data Model (serialized as XML)
    • Attachment - A document's primary attachment file (if it is an XML file)
    • File - An XML file attached to the document by a custom Activity or Command.
  2. Configure the "XML Transform" setting. This is where your XSLT goes (Supports XSLT 1.0 syntax only).
    • You can paste the XSLT directly in this property's editor.
    • Or, you can use an XML Transform step's "XSLT Editor" tab to edit and test the XSLT. The XSLT you save in the XSLT Editor will be saved to the XML Transform setting.
  3. Configure the "Target" setting. This determines how the output file generated by XML Transform is saved. This can be:
    • File - The output XML will be saved to a named file.
    • Attachment - The output XML will be saved as the document's attachment.
    • Data - The output XML will be saved into the document's index data.
    • LastChild - The output XML will be appended as the last child document.
    • FirstChild - The output XML will be prepended as the first child document.


What are some XML Transform use cases?

XML Transform can be used to:

  • Convert XML data into different XML schemas for interoperability.
  • Generate human-readable formats (such as HTML) from XML for reporting or review.
  • Prepare XML for downstream integration, export, or ingestion by other systems (including AI knowledge bases).
  • Normalize or restructure XML from various sources to a standard schema.

Example XSLT stylesheets

Create an HTML report from a documents extracted data

This XSLT uses the Data Model below as its starting point.

data_table Data Model
variables Invoice Number
variables Invoice Date
variables Due Date
variables Supplier Name
variables Supplier Address
variables Customer Name
variables Customer Address
table Invoice Line Items
view_column Description
view_column Quantity
view_column Unit Price
view_column Total Price
variables Subtotal
variables Tax Amount
variables Discount Amount
variables Total Amount Due
variables Payment Status


The style sheet below takes data extracted by Grooper that has been serialized as XML, converts it to HTML and uses the XML data to create HTML elements. The end result is a more human-readable document detailing the extracted data.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="html" encoding="UTF-8" />
  <xsl:template match="/">
    <html>
      <head>
        <title>Invoice Report</title>
        <style>
          body { font-family: Arial, sans-serif; }
          h1 { background-color: #f2f2f2; padding: 10px; }
          .section { margin-bottom: 20px; }
          table { border-collapse: collapse; width: 100%; }
          th, td { border: 1px solid #ccc; padding: 8px; text-align: left; }
          th { background-color: #e2e2e2; }
        </style>
      </head>
      <body>
        <h1>Invoice Report</h1>
        <div class="section">
          <h2>Invoice Details</h2>
          <table>
            <tr><th>Invoice Number</th><td><xsl:value-of select="Document/Field[@Name='Invoice Number']"/></td></tr>
            <tr><th>Invoice Date</th><td><xsl:value-of select="Document/Field[@Name='Invoice Date']"/></td></tr>
            <tr><th>Due Date</th><td><xsl:value-of select="Document/Field[@Name='Due Date']"/></td></tr>
            <tr><th>Payment Status</th><td><xsl:value-of select="Document/Field[@Name='Payment Status']"/></td></tr>
          </table>
        </div>
        <div class="section">
          <h2>Supplier Information</h2>
          <table>
            <tr><th>Supplier Name</th><td><xsl:value-of select="Document/Field[@Name='Supplier Name']"/></td></tr>
            <tr><th>Supplier Address</th><td><xsl:value-of select="Document/Field[@Name='Supplier Address']"/></td></tr>
          </table>
        </div>
        <div class="section">
          <h2>Customer Information</h2>
          <table>
            <tr><th>Customer Name</th><td><xsl:value-of select="Document/Field[@Name='Customer Name']"/></td></tr>
            <tr><th>Customer Address</th><td><xsl:value-of select="Document/Field[@Name='Customer Address']"/></td></tr>
          </table>
        </div>
        <div class="section">
          <h2>Invoice Line Items</h2>
          <table>
            <tr>
              <th>Description</th>
              <th>Quantity</th>
              <th>Unit Price</th>
              <th>Total Price</th>
            </tr>
            <xsl:for-each select="Document/Table[@Name='Invoice Line Items']/TableRow">
              <tr>
                <td><xsl:value-of select="TableCell[@Name='Description']"/></td>
                <td><xsl:value-of select="TableCell[@Name='Quantity']"/></td>
                <td><xsl:value-of select="TableCell[@Name='Unit Price']"/></td>
                <td><xsl:value-of select="TableCell[@Name='Total Price']"/></td>
              </tr>
            </xsl:for-each>
          </table>
        </div>
        <div class="section">
          <h2>Summary</h2>
          <table>
            <tr><th>Subtotal</th><td><xsl:value-of select="Document/Field[@Name='Subtotal']"/></td></tr>
            <tr><th>Tax Amount</th><td><xsl:value-of select="Document/Field[@Name='Tax Amount']"/></td></tr>
            <tr><th>Discount Amount</th><td><xsl:value-of select="Document/Field[@Name='Discount Amount']"/></td></tr>
            <tr><th>Total Amount Due</th><td><xsl:value-of select="Document/Field[@Name='Total Amount Due']"/></td></tr>
          </table>
        </div>
      </body>
    </html>
  </xsl:template>
</xsl:stylesheet>

The result of XML Transform using this XSLT is described by the image below:

Strip out unwanted XML attributes from Grooper's XML Metadata format

Grooper's when Grooper saves a document's extracted Data Model in an XML format, it saves a lot of information about the field's data instance as XML attributes (Name, Confidence, Page, etc.). If you've ever used the XML Metadata format during Merge or Export, you're probably familiar with this.


The following XSLT would simplify the XML by striping out all attributes except "Name" for each XML element.

  • The result of this transformation would be something like this.
    • Before transformation: <Field Name="fieldName" Confidence="1" Page="1" Valid="True" Location="1.1111, 2.22, .3333, .4444">fieldValue</Field>
    • After transformation: <Field Name="fieldName">fieldValue</Field>


<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>
  <!-- Identity transform for elements -->
  <xsl:template match="*">
    <xsl:element name="{name()}">
      <xsl:if test="@Name">
        <xsl:attribute name="Name">
          <xsl:value-of select="@Name"/>
        </xsl:attribute>
      </xsl:if>
      <xsl:apply-templates select="node()"/>
    </xsl:element>
  </xsl:template>
  <!-- Copy text nodes -->
  <xsl:template match="text()|comment()|processing-instruction()">
    <xsl:copy/>
  </xsl:template>
</xsl:stylesheet>