XSLT processing of XML import files
The EMu XSLT processor uses the Microsoft XML libraries (MSXML). In order to use the XSLT processor it is necessary to have MSXML 3.0 or later installed (Windows 2000 SP4 or Internet Explorer 6 or later, Windows XP, Windows Vista, Windows Server 2003).
The EMu Import Wizard provides XSLT processing for XML-based import files. The extensions are only available for files with a .xml file suffix. If you have XML files with a .txt suffix, it will be necessary to rename them in order to use the XSLT processor.
Note: Details about how to perform an import with XSLT processing are available here.
The advent of XML (eXtensible Markup Language) has provided a standards based mechanism for exchanging data between computer systems. XML, as the name implies, is extensible; that is the format in which the data is stored can be adapted to suit the data source. While this is one of the strengths of XML it also causes problems when importing data from one system into another in which the data formats do not match exactly. For example, consider this XML snippet describing a work of art in an imaginary Catalogue:
<table name="ecatalogue">
<tuple>
<atom column="TitMainTitle">An imaginary work of Art</atom>
<atom column="CreDateCreated">1995-07-02<atom>
<table column="CreCreatorRef_tab">
<tuple>
<atom column="NamLast">Citizen</atom>
<atom column="NamFirst">John</atom>
</tuple>
</table>
</tuple>
</table>
You receive this data from another institution using EMu and want to import it into your system, but there is a mismatch between some of the column names in your system and those in the originating institution. For example, in your Catalogue the Title column may be called SumTitle and the Date Created column may be called SumDateCreated. Before you can load the XML into your system it is necessary to transform it so that it appears like:
<table name="ecatalogue">
<tuple>
<atom column="SumTitle">An imaginary work of Art</atom>
<atom column="SumDateCreated">1995-07-02</atom>
<table column="CreCreatorRef_tab">
<tuple>
<atom column="NamLast">Citizen</atom>
<atom column="NamFirst">John</atom>
</tuple>
</table>
</tuple>
</table>
One way to make the change is to use a text editor and replace all instances of TitMainTitle with SumTitle and CreDateCreated with SumDateCreated. If the amount of data is small or if the import is to occur only once then this solution is feasible. If, however, a number of imports will occur in which the data will be supplied in the same format, it makes sense to use XSLT (e Xtensible Stylesheet Language Transforms) to apply the changes before the data is loaded. XSLT is an XML-based scripting language used to manipulate XML.
For example, the following script can be used to perform the required column renaming outlined above:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:map="urn:map" version="1.0">
<!-- Output in XML format -->
<xsl:output method="xml" encoding="utf-8"/>
<!-- Mapping table of old names to new names -->
<map:entries>
<map:entry oldname="TitMainTitle" newname="SumTitle"/>
<map:entry oldname="CreDateCreated" newname="SumDateCreated"/>
</map:entries>
<xsl:variable name="map" select="document('')/*/map:entries/*"/>
<!-- For every node we copy it over. Note that attributes
are handled by the next template. -->
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- Special handling of attributes. -->
<xsl:template match="@*">
<xsl:variable name="entry" select="$map[@oldname = current()]"/>
<xsl:choose>
<xsl:when test="name() = 'column' and $entry">
<xsl:attribute name="column">
<xsl:value-of select="$entry/@newname"/>
</xsl:attribute>
</xsl:when>
<xsl:otherwise>
<xsl:copy/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
To execute the XSLT script an XSL engine is required. A number of products provide XSL engines that can be used to transform the XML for loading into EMu. When a file is received from an institution, it is only necessary to perform the transformation before importing the XML into EMu.
To streamline this process, XSLT processing has been added as part of the Import tool for XML files: it is possible to import an XML file and have it transformed as part of the Import process. The XSLT file used to transform the XML can be stored on your local machine (local file) or on the EMu server (pre-configured file). Files stored on the EMu server are available to all users. In general, the pre-configured files are "standard" transformations used to manipulate data from known sources. A known source can be:
- a standard format (e.g. Darwin Core or Dublin Core)
- a repeatable format (e.g. EMu export format, BRAHMS)
Using repeatable formats it is possible to define XSLT files that allow for easy import of data from other EMu clients for customized modules such as the Catalogue, Taxonomy and Collection Events.
It is possible to have pre-configured XSLT files stored on the EMu server. These files are accessible to all users and are listed in the drop list below the Pre-configured XSLT file option (see Custom import for details). The files are stored in a per table directory in one of two locations:
etc/import/table
Location of client independent XSLT scripts. These scripts typically load into the core EMu modules that do not vary from client to client (e.g. Parties, etc.). Clients should not add scripts to this location as these scripts are added by EMu's developers.
local/etc/import/table
Location of client specific XSLT scripts. Any scripts that transform data for institution specific modules should be kept in this location. All client scripts should be added to this location.
When installing a script on the EMu server the local/etc/import/table
directory may not exist, in which case it will be necessary to create it. For example, if you have a script called "BRAHMS.xslt" that transforms Brahms XML for loading into your EMu Catalogue module, you would store it under:
local/etc/import/ecatalogue/BRAHMS.xslt
The entry that appears in the drop-list in the Import wizard is the name of the file without its file suffix (e.g. BRAHMS for BRAHMS.xslt). The file name may contain spaces. XSLT scripts do not need to have an .xslt suffix, however this is the extension usually used.