XML Lookup (Lookup Specification): Difference between revisions

From Grooper Wiki
Line 158: Line 158:
# XML Lookup runs its "Record Selector" to locate one or more XML nodes. ''In this case the selector selects a <book> node whose isbn13 attribute matches the value Grooper extracted.''
# XML Lookup runs its "Record Selector" to locate one or more XML nodes. ''In this case the selector selects a <book> node whose isbn13 attribute matches the value Grooper extracted.''
# XML Lookup runs its "Value Selectors" against the record and maps the selected attribute or element's text to a Grooper Data Field. ''Data from the <book> node is mapped to corresponding Data Fields, like "Book Title" and "Author".''
# XML Lookup runs its "Value Selectors" against the record and maps the selected attribute or element's text to a Grooper Data Field. ''Data from the <book> node is mapped to corresponding Data Fields, like "Book Title" and "Author".''
{|
|valign=top|
[[File:2025Wiki-XMLLookup-03-01-Population-01.png]]
|valign=top|
[[File:2025Wiki-XMLLookup-03-01-Population-02.png]]
|valign=top|
[[File:2025Wiki-XMLLookup-03-01-Population-03.png]]
|}


=== Use XML Lookup to validate lookup fields ===
=== Use XML Lookup to validate lookup fields ===


=== Use XML Lookup to populate List Values ===
=== Use XML Lookup to populate List Values ===

Revision as of 09:21, 23 June 2025

ARTICLE UNDER CONSTRUCTION


This article is about the current version of Grooper.

Note that some content may still need to be updated.

2025

XML Lookup is a Lookup Specification that performs a lookup against an XML file stored as a draft Resource File in the package_2 Project. XML Lookups use XPath expressions to select XML nodes and map XML attributes or an XML element's text to Grooper fields.

About

XML Lookup was designed to be a sort of middle-ground between Grooper's Lexicon Lookups and Database Lookup. Lexicon Lookup and Database Lookups have their own pros and cons.

Lexicon Lookup

  • Pro: Portability - Lexicon Lookups use a Grooper Lexicon to perform a lookup operation. Lexicons can be easily shared between Grooper users (or since a Lexicon is essentially just a text-based list, their text can be easily copied and pasted).
  • Pro: Handle simple data relationships well - Lexicon Lookups are the simplest lookup type. They are essentially key-value lists where the lookup field is the key and the target fields are parsed from comma separated values.
  • Con: Handle complex data relationships poorly - Because they are so simple, they cannot begin to express complex data structures like a relational database can.

Database Lookup

  • Pro: Handle both simple and complex data relationships well - Database are effective at describing simple data relationships, such as simple key-value pairs but excel at organizing complex data structures.
  • Con: Portability - Databases require at least some hardware and software infrastructure to support them. Sharing them from one environment to another is not always easy (certainly not as easy as passing a file around).


Thus, the XML Lookup was born. XML can describe more complex data relationships than a Grooper Lexicon through XML node hierarchy and attributes. But, an XML file is just as portable as a Lexicon (if not more so). If you have a fairly complex (but also static) data structure you want to use to perform a lookup, consider using XML Lookup.

The general setup

  1. Import the XML file into a Grooper Project by dragging it onto the Project (or a folder in the Project). This will create a Resource File for the XML file.
  2. Add a XML Lookup to Data Model (or Data Section or Data Table if appropriate)
  3. Point to the XML file by configuring the XML Lookup's "Source" property.
  4. Configure the XML Lookup's "Record Selector".
    • This uses an XPath expression to select the XML nodes (records) you want to retrieve.
    • Use the % character to insert field variables in your XPath expression (e.g. %GrooperFieldName).
    • These are the "lookup fields" for XML Lookup. This will insert insert values from Grooper fields into the XPath expression at runtime.
    • Example: /Root/Record[xmlElement='%GrooperFieldName']
  5. Configure the XML Lookup's "Value Selectors".
    • This is how you map data in the XML file to Grooper fields (XML Lookup's "target fields").
    • One or more Value Selector may be added.
    • Each Value Selector specifies an XPath expression and maps the result to a Grooper field.
      • The Value Selector XPath is relative to the record node returned by the Record Selector. The path should start at the root of the record not the root of the XML itself.

The general execution

XML Lookup (like all lookups) execute when a document's data is collected by the Extract activity. The process goes like this:

  1. The Data Model executes its Data Field/Data Section/Data Table extractors.
  2. XML Lookup loads the "Source" XML file.
  3. The Record Selector XPath is evaluated (with lookup variables replaced by field values). The lookup will either:
    • Hit - If the XPath selects exactly one record node, this results in a successful lookup.
    • Miss - If your XPath does not return any nodes, no data will be populated and the "Miss Disposition" will determine what happens next.
    • Conflict - If multiple record nodes are returned, the "Conflict Disposition" will determine how multiple results are handled.
  4. For successful lookups, the Value Selectors evaluate their XPath relative to the record node and populate the corresponding Grooper fields.

Example Source, Record Selector and Value Selector

FYI

New to XPath? Check out w3schools XPath Tutorial for a primer on XPath.

Need to test an XPath expression? There are several XPath testers online. Just copy the XML and paste it into the tester and enter the XPath expression you want to test. These are some popular XPath testers:

Example Source

The XML data below describes data you might find in a bookstore.

  • There are three <book> nodes in this XML. Each node, its attributes and its child XML elements describe a book sold by the bookstore.
  • Each <book> node has a collection of child XML elements. Each XML element contains field data related to the book:
    • <title> and its lang attribute
    • <author>
    • <listPrice>
  • The <book> nodes also have additional data stored as attributes (notably the isbn13 which will be used to lookup each book in this example).
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
  <book category="Fiction" isbn13="9780307796844">
    <title lang="en">The Complete Stories of Robert Louis Stevenson</title>
    <author>Robert Louis Stevenson</author>
    <listPrice>13.99</listPrice>
  </book>
  <book category="Fiction" isbn13="9781409086284">
    <title lang="en">A Study In Scarlet</title>
    <author>Arthur Conan Doyle</author>
    <listPrice>6.99</listPrice>
  </book>
  <book category="Fiction" isbn13="9789363185579">
    <title lang="en">The Murder on the Links</title>
    <author>Agatha Christie</author>
    <listPrice>0.23</listPrice>
  </book>
</bookstore>

Example Record Selector

This Record Selector would select a <book> node based on an isbn13 value using a Grooper Data Field named "ISBN": //book[@isbn13='%ISBN']

%ISBN is the lookup variable for our lookup field. If Grooper extracted "9780307796844" for the "ISBN" Data Field, the following node would be selected:

<book category="Fiction" isbn13="9780307796844">
  <title lang="en">The Complete Stories of Robert Louis Stevenson</title>
  <author>Robert Louis Stevenson</author>
  <listPrice>13.99</listPrice>
</book>

Example Value Selectors

Now that we have a record node selected, we can map data to Grooper fields using Value Selectors.


Configuring a Value Selector is done by configuring a Path (an XPath expression to the desired data in the XML file) and a Target Field (the Data Field you want to map that data to).

Imagine we have five target Data Fields. We would add one Value Selector for each:

  • Title
  • Author
  • List Price
  • Category
  • Language

Each Value Selector's XPath expression should select an attribute (i.e. @attributeName) or a child element (i.e. childElementName). If an attribute is selected, its value is returned. If a child element is selected, the element's inner text is returned (i.e. <childElementName>This is inner text</childElementName>). These values are mapped

The corresponding Value Selectors for the Data Fields in this example would be:

  • title
  • author
  • listPrice
  • @category
    • category is an attribute of the <book> element itself. Attributes are selected using the @ symbol.
  • title/@lang
    • Notice the <title> element's lang attribute was selected by first selecting title then pathing to its attribute with the @ symbol.

FYI

If the Record Selector itself selects the XML element you want to map to a target field, there will be no “child element” to select. Instead, enter the dot expression (.) to select it.

Examples

Use XML Lookup to populate target fields

Using a Lookup Specification to populate fields is the most common reason people add lookups to a Data Model.

In this example, Grooper will extract an ISBN 13 barcode and use this to look up information in an XML file using XML Lookup.

The process follows this order:

  1. Grooper executes the "lookup field's" extractor. In this case the "ISBN Lookup" Data Field in this Data Model.
  2. XML Lookup runs its "Record Selector" to locate one or more XML nodes. In this case the selector selects a <book> node whose isbn13 attribute matches the value Grooper extracted.
  3. XML Lookup runs its "Value Selectors" against the record and maps the selected attribute or element's text to a Grooper Data Field. Data from the <book> node is mapped to corresponding Data Fields, like "Book Title" and "Author".

Use XML Lookup to validate lookup fields

Use XML Lookup to populate List Values