Generate .NET Code With XSLT
Autogenerate classes that provide strongly typed access based on your database's XML Schema Definition.
by Kathleen Dollard

May 2003 Issue

Technology Toolbox: VB.NET, XML, XSLT

Extensible Stylesheet Language Transformations (XSLT) is a declarative language that defines a series of rules for how XML is processed. Most XSLT transformations convert XML to HTML, but you can also use XSLT to create any type of text output, including VB.NET and C# files. I'll show you how XSLT works to generate VB.NET code, and I'll offer some hints on performing XSLT processing in .NET. This approach offers you a way to generate code to your own specifications and update the generated code easily. Portions of your application then become resilient to changes in database structure or evolving requirements.

As an example, I'll show you the XSLT that re-creates the ADO.NET strongly typed DataSets (download the XSLT files and the code to run XSLT transforms in .NET). You can change the XSLT template (more on templates later) that generates the strongly typed DataSets to alter the code it generates. Your changes might be relatively simple ones, such as modifying the visibility of the columns in each DataTable. You can also make complex changes, such as altering the instantiation of DataTables so you can make effective use of derived classes that contain manual code such as validation. These techniques are also effective for generating code other than DataSets, such as strongly typed arrays or non–data-related code.

You need two things to start creating an XSLT transformation: an XML document for input and a clear idea of what you want the output to look like. The XML Schema Definition (XSD) file that's created when you add a new DataSet to a Visual Studio .NET project is an XML document. You can see it in XML format by selecting the XML tab from the bottom of the DataSet designer. This XML is perfectly valid input, but the XSLT to process it is difficult to read and understand because everything in an XSD is described in terms of types and elements. Your work with XSLT is much easier if you use an XML document that's designed to be friendly to that particular transform.

Figure 1. Transform XML in Steps.

To take this approach, use a preliminary XSLT that transforms the XSD into XML that's then easier to transform into generated code. The second XSLT file (the one that generates code) is far more readable, because you can use familiar terms such as DataSet and DataTable; you hide some of the ugliness of restructuring the data in the preliminary XSLT transformation. This results in a two-step process: Use the XSD to create the XML file containing metadata, then perform the code-generating transformation (see Figure 1).

A quick XSLT primer and a look at a simple transformation will help you understand these steps (see Listing 1). XSLT is a species of XML, and each XSLT instruction is technically an XML element that usually contains attributes necessary for processing. An XSLT stylesheet contains the instructions. Each time a stylesheet is processed, a single output is created. This output is usually a file or display, but you can also use streams in .NET to redirect the output.

Stylesheets Include Template Instructions
Most XSLT stylesheets contain template instructions. Each template contains the processing for a particular set of elements in the XML source document. You use XPath syntax to determine the scope of the elements a template processes. XPath is a separate standard that allows you to select elements within an XML document. XPath's extreme flexibility means it can sometimes be quite complex. (You can find several complex examples in the online code.) However, XPath is simple to use for simple things. The first template instruction states that the template should be processed for any DataSet element under the root in the source XML document:

<xsl:template match="/DataSet"> 
   <xsl:value-of 
select="@name" /> contains the tables
   <xsl:apply-templates 
select="DataTable" mode="ListTables"/>
<xsl:call-template name="DataSetEnd" /> 
</xsl:template>

The xsl:value-of instruction inserts data from the XML source document, and the @ prefix indicates that an attribute's value, rather than an element's value, is desired. Literal text in the template is output directly.

XSLT relies heavily on the concept of context. You can think of context as a position within the XML input. The XML file includes many name attributes; the @name in the xsl:value-of specifies insertion of the name attribute of the current node, which is the DataSet node.

The xsl:apply-templates instruction can be confusing, but it begins to unlock XSLT's strength. This instruction has the effect of telling the processor: Before you continue the current processing, go process the specified nodes following whatever instructions those elements match. In this case, XSLT processes the template for all DataTable nodes that are children of the current context (the current DataSet node). The optional mode attribute limits processing to specific templates:

<xsl:template match="DataTable" 
   mode="ListTables"> 
   <xsl:text>&#xa;&#x9;</xsl:text>
   <xsl:value-of select="@name" /> 
</xsl:template>

This template is processed for each DataTable node, and its context is the DataTable node. The name attribute it specifies is the name of the DataTable, not the name of the DataSet.

The xsl:text instruction specifies that what follows should be output directly. One of its uses is managing whitespace, which is a bit of a pain in XSLT. Specify whitespace characters' hexadecimal ASCII value—such as "&#xa;" for new line and "&#x9;" for tab —to insert them explicitly.

Figure 2. Choose a Template-Calling Mechanism.

The final template in Listing 1 is a named template. You call named templates with the xsl:call-template instruction. Named templates are processed exactly once when they're encountered, and they don't alter the context. They're similar to subroutines in .NET languages (see Figure 2).

This output results if you drop three Northwind tables onto the XSD design surface by dragging the tables from the Server Explorer onto a component, then run the transform on the resulting XSLT:

Northwind contains the tables
   Customers
   Orders
   Order_Details
End Northwind DataSet

Note the position of the list of tables and the "End Northwind DataSet" text to see how the xsl:apply-templates and xsl:call-template instructions define processing. The context- and node-processing concepts in XSLT can be difficult to grasp, but once you see how this works, several XSLT idiosyncrasies begin to make sense.

Work With XSLT in .NET
You use two tools to work with XSLT in .NET: an editor and an XSLT processor to perform the transformation. Several full-featured XSLT editors are available. You can also work in the VS.NET IDE. However, IntelliSense doesn't work with XSLT unless you do some tweaking. Declarative-language IntelliSense is based on XSD schema files, and the XSLT.xsd file is missing. You can find one at GotDotNet by searching for XSLT Schema, then place it in [Microsoft Visual Studio .NET]\Common7\Packages\schemas\xml. This enables context-based IntelliSense (see Additional Resources).

You can process XSLT by sending it to a capable browser or by using .NET. You must create an XslTransform object based on the XSLT file in order to process XSLT in .NET. The XslTransform class is located in the System.Xml.Xsl namespace. You also need a writer object for output and an XML document for input. Open the XML document as an XPathDocument to allow faster processing. You pass the input and output to the Transform method (see Listing 2):

XslTransform.Transform(docXML, _
   xslArgs, writer)

Output can go to an XmlTextWriter or a StreamWriter; which one you choose determines important elements of the processing. For example, the processing ignores the xsl:output instruction when the output stream is a StreamWriter. Remember to match the writer with the output you intend, or—if you're familiar with XSLT—match the xsl:output method attribute you're using. The class in Listing 2 also supports XSLT arguments, and the downloadable version includes methods for both XmlTextWriter and StreamWriter output. Remember to flush and close as you do when you work with other streams.

Using XSLT can cause plenty of frustration. It's case-sensitive. You must use difficult-to-read escape sequences to output special characters, including some printable characters (for example, < and > for less than and greater than). The .NET Framework help index is missing many XSLT elements, although if you type "xslt, reference" in the index, you can access a sufficiently complete and concise reference. Error handling is another problem. When XSLT errors occur, the exceptions the parser throws are extremely good, often giving the exact position of the exception within even complex XSLT. However, situations such as missing parameters and misspelled template match clauses aren't errors in XSLT—you just don't get any output. These sorts of issues make it a good idea to compare your output with projected output, such as the strongly typed DataSet you create by selecting Generate DataSet from the context menu of VS.NET's XSD designer, which opens when you add a DataSet to a project. You can use WinDiff or Word to perform the comparison.

Now you're prepared to do some effective code generation with XSLT in .NET. Tackling the strongly typed DataSet is a challenge—it might be the most difficult file you'll create using XSLT code generation. However, taking control of the code generation provides the best tactic for modifying strongly typed DataSets.

Download the resulting XSLT to see the full code, which is about 600 lines long and rather repetitive. It's split into six XSLT files for easier handling. The main STD.xslt is the entry point, similar to Sub Main in your .NET programs. This XSLT contains xsl:include statements that specify the inclusion of the five other files:

<xsl:include href="DataSet.xslt"/>
<xsl:include href="Table.xslt"/>
<xsl:include href="Row.xslt"/>
<xsl:include href="Event.xslt"/>
<xsl:include href="Support.xslt"/>

The xsl:include instruction combines the files so that they process as one unit.

Generate the DataRow Class
I'll focus on strongly typed DataSets' internal structure and skip the basics of how you use them. Each strongly typed DataSet is a single file containing multiple classes—a class for the DataSet itself, and three nested classes for each contained DataTable. The nested classes are the tablenameDataRow class, which contains the actual data; the tablenameDataTable class, which is a collection of tablenameDataRow objects; and a class used in raising change events. The first two classes, which are wrappers for the corresponding untyped classes in ADO.NET, are used to access data. Strongly typed DataSets have that name because they use strong typing to access the data. For example, if a property is a Boolean, you receive a compile-time error if you pass any other data type (assuming Option Strict is on). Strongly typed DataSets are great, but the inability to fine-tune them is a serious flaw. This XSLT gets you around that problem and lets you make any changes you want.

Take a close look at Row.xslt in the download code to see these techniques in action. The opening lines are XSLT goo that specifies the namespaces and some overall processing rules:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0"
xmlns:xsl=
"http://www.w3.org/1999/XSL/Transform"
xmlns:xs=
"http://www.w3.org/2001/XMLSchema">
<xsl:strip-space elements="*"/>
<xsl:output method="text" /> 

The DataTable template in DataSet.xslt calls templates such as the DerivedRow template in Row.xslt:

<xsl:template match="DataTable" 
   mode="DerivedTableRowEvent">
<xsl:call-template name="DerivedTable"/>
<xsl:call-template name="DerivedRow"/>
<xsl:call-template name="DerivedEvent"/>
</xsl:template>

The preceding template is processed once for each DataTable node, and it processes the DerivedRow template once each time. The context of the DerivedRow template is the current DataTable node.

The DerivedRow template introduces variables:

<xsl:template name="DerivedRow">
<xsl:variable name="Table" 
   select="@name" />
<Diagnostics.DebuggerStepThrough()> _
   Public Class <xsl:value-of 
   select="$Table"/>Row

The XSLT variables' behavior is unusual. You can set an XSLT variable's value only once, when you declare the variable. A new value is set each time the template is processed, but you can't change the value within the template once it's set. You access a variable by prefixing its name with the dollar sign ($).

You call named templates using xsl:call-template, and match templates with xsl:apply-templates. Both can use xsl:with-param to specify parameters:

<xsl:apply-templates select="DataColumn" 
   mode="RowColumnProperties">
   <xsl:with-param name="Table" 
   select="$Table"/>
</xsl:apply-templates>

You pass the Table variable ($Table) to the DataColumn template as a parameter that's also named Table.

You often need to include output based on a condition. The xsl:if and xsl:choose instructions allow this. The DataColumn/RowColumnProperties template uses xsl:choose to include .NET code if the AllowDBNulls attribute is present for the element in the input XML document:

Get<xsl:choose>
   <xsl:when test="not(@AllowDBNulls)">
   ' The download code tries the cast 
   ' and throws exception if it fails
   </xsl:when>
   <xsl:otherwise>
   ' Simple cast without Try block
   </xsl:otherwise>
</xsl:choose>

The template for outputting parent rows uses somewhat more complex XPath filters. XPath filters can include three items: an axis, a nodeset, and a predicate. The axis indicates the direction from the current node that XPath should explore in looking for matches. Some axes have shortcuts you can include in the match itself. For example, // indicates that XPath can find the match in any descendent position in the document, meaning anywhere in the document if it appears at the start of the nodeset. Other important shortcuts include . to indicate the current node and .. to indicate the parent node.

Filter an Element's or Attribute's Value
You identify the predicate with square brackets. It allows further filtering, typically on the value of a contained element or an attribute. You must remember that the match indicates the context for further processing, and the axes and predicate help to filter the matches based on either document position or contained information. This instruction indicates that the template should be processed for Relation nodes anywhere within the input XML document that contains a ChildTable element whose contents match the $Table variable:

<xsl:apply-templates 
select="//Relation[ChildTable=$Table]"
mode="RowGetParentRows" />

The preceding code finds the parent relations of the current DataTable, because it matches all the Relation nodes with a ChildTable element equal to the current DataTable name. The context within the RowGetParamRows template is the Relations node.

You'll notice that some of the download code uses the xsl:for-each instruction. Its role is to process shorter fragments of code; xsl:apply-templates is a much better solution for deeply nested processing. Shifting gears to the xsl:apply-templates approach is well worth the effort.

You should work carefully and do extensive testing while you work with code generation, because you'll autogenerate any errors you introduce. Complete isolation between autogenerated and manually generated code is critical. .NET provides isolation through a couple of mechanisms. The first is Regions; the Windows.Forms spit code that contains the information you enter in the Property Grid demonstrates their use in isolating generated code. The other new mechanism supporting code isolation is inheritance. Inheritance allows a single coherent class that comprises an autogenerated class and a manually created one internally. Either can be the base class in theory, but in practice, you usually have your base class autogenerated, while your derived class contains the manual code, such as validation.

You can integrate XSLT code generation in your projects in two different ways, depending on your goal. If you're eager to modify the strongly typed DataSet, you can download and modify the XSLT to provide those changes. Start by running your XSD through these transformations and use WinDiff or Word to compare the output file with the one created using Visual Studio's Generate DataSet. This is important because creating a strongly typed DataSet involves many details, and I haven't addressed uncommon ones such as XSD annotations. You can modify the XSLT once you know the transformation works for you. The other approach you can take is to create entirely new templates. For example, you could create business objects that are based on your data structure. I've included a variety of XSLT techniques in the downloadable files to help you do this. Code generation is a relatively novel approach for XSLT, so you might find it useful to review the full code, even if you plan to create a type of output different from the strongly typed DataSet.

About the Author
Kathleen Dollard is an independent consultant doing real-world development in .NET technologies. She's currently using XSLT techniques to generate 450 classes in a 300+ KLOC project. She's active in the Denver Visual Studio User Group and is a regular contributor to Visual Studio Magazine, a Microsoft MVP, and a VBITS/VSLive! speaker. Reach Kathleen at kathleen@mvps.org.