Generate .NET Code With XSLT Autogenerate classes that provide strongly typed
access based on your database's XML Schema
Definition. by Kathleen
Dollard
May 2003 Issue
Technology Toolbox: VB.NET, XML, XSLT
Extensible Stylesheet Language
Transformations (XSLT) is a declarative language that defines a
series of rules for how XML is processed. Most XSLT transformations
convert XML to HTML, but you can also use XSLT to create any type of
text output, including VB.NET and C# files. I'll show you how XSLT
works to generate VB.NET code, and I'll offer some hints on
performing XSLT processing in .NET. This approach offers you a way
to generate code to your own specifications and update the generated
code easily. Portions of your application then become resilient to
changes in database structure or evolving requirements.
As an example, I'll show you the XSLT that re-creates the ADO.NET
strongly typed DataSets (download
the XSLT files and the code to run XSLT transforms in .NET). You can
change the XSLT template (more on templates later) that generates
the strongly typed DataSets to alter the code it generates. Your
changes might be relatively simple ones, such as modifying the
visibility of the columns in each DataTable. You can also make
complex changes, such as altering the instantiation of DataTables so
you can make effective use of derived classes that contain manual
code such as validation. These techniques are also effective for
generating code other than DataSets, such as strongly typed arrays
or non–data-related code.
You need two things to start creating an XSLT transformation: an
XML document for input and a clear idea of what you want the output
to look like. The XML Schema Definition (XSD) file that's created
when you add a new DataSet to a Visual Studio .NET project is an XML
document. You can see it in XML format by selecting the XML tab from
the bottom of the DataSet designer. This XML is perfectly valid
input, but the XSLT to process it is difficult to read and
understand because everything in an XSD is described in terms of
types and elements. Your work with XSLT is much easier if you use an
XML document that's designed to be friendly to that particular
transform.
To take this approach, use a preliminary XSLT that transforms the
XSD into XML that's then easier to transform into generated code.
The second XSLT file (the one that generates code) is far more
readable, because you can use familiar terms such as DataSet and
DataTable; you hide some of the ugliness of restructuring the data
in the preliminary XSLT transformation. This results in a two-step
process: Use the XSD to create the XML file containing metadata,
then perform the code-generating transformation (see Figure 1).
A quick XSLT primer and a look at a simple transformation will
help you understand these steps (see Listing 1). XSLT is a species
of XML, and each XSLT instruction is technically an XML element that
usually contains attributes necessary for processing. An XSLT
stylesheet contains the instructions. Each time a stylesheet is
processed, a single output is created. This output is usually a file
or display, but you can also use streams in .NET to redirect the
output.
Stylesheets Include Template
Instructions Most XSLT stylesheets contain template
instructions. Each template contains the processing for a particular
set of elements in the XML source document. You use XPath syntax to
determine the scope of the elements a template processes. XPath is a
separate standard that allows you to select elements within an XML
document. XPath's extreme flexibility means it can sometimes be
quite complex. (You can find several complex examples in the online
code.) However, XPath is simple to use for simple things. The first
template instruction states that the template should be processed
for any DataSet element under the root in the source XML document: <xsl:template match="/DataSet">
<xsl:value-of
select="@name" /> contains the tables
<xsl:apply-templates
select="DataTable" mode="ListTables"/>
<xsl:call-template name="DataSetEnd" />
</xsl:template>
The xsl:value-of instruction inserts data from the XML source
document, and the @ prefix indicates that an attribute's value,
rather than an element's value, is desired. Literal text in the
template is output directly.
XSLT relies heavily on the concept of context. You can think of
context as a position within the XML input. The XML file includes
many name attributes; the @name in the xsl:value-of specifies
insertion of the name attribute of the current node, which is the
DataSet node.
The xsl:apply-templates instruction can be confusing, but it
begins to unlock XSLT's strength. This instruction has the effect of
telling the processor: Before you continue the current
processing, go process the specified nodes following whatever
instructions those elements match. In this case, XSLT processes
the template for all DataTable nodes that are children of the
current context (the current DataSet node). The optional mode
attribute limits processing to specific templates: <xsl:template match="DataTable"
mode="ListTables">
<xsl:text>
	</xsl:text>
<xsl:value-of select="@name" />
</xsl:template>
This template is processed for each DataTable node, and its
context is the DataTable node. The name attribute it specifies is
the name of the DataTable, not the name of the DataSet.
The xsl:text instruction specifies that what follows should be
output directly. One of its uses is managing whitespace, which is a
bit of a pain in XSLT. Specify whitespace characters' hexadecimal
ASCII value—such as "
" for new line and "	" for tab
—to insert them explicitly.
The final template in Listing
1 is a named template. You call named templates with the
xsl:call-template instruction. Named templates are processed exactly
once when they're encountered, and they don't alter the context.
They're similar to subroutines in .NET languages (see Figure 2).
This output results if you drop three Northwind tables onto the
XSD design surface by dragging the tables from the Server Explorer
onto a component, then run the transform on the resulting XSLT: Northwind contains the tables
Customers
Orders
Order_Details
End Northwind DataSet
Note the position of the list of tables and the "End Northwind
DataSet" text to see how the xsl:apply-templates and
xsl:call-template instructions define processing. The context- and
node-processing concepts in XSLT can be difficult to grasp, but once
you see how this works, several XSLT idiosyncrasies begin to make
sense.
Work With XSLT in .NET You use two
tools to work with XSLT in .NET: an editor and an XSLT processor to
perform the transformation. Several full-featured XSLT editors are
available. You can also work in the VS.NET IDE. However,
IntelliSense doesn't work with XSLT unless you do some tweaking.
Declarative-language IntelliSense is based on XSD schema files, and
the XSLT.xsd file is missing. You can find one at GotDotNet by
searching for XSLT Schema, then place it in [Microsoft Visual Studio
.NET]\Common7\Packages\schemas\xml. This enables context-based
IntelliSense (see Additional
Resources).
You can process XSLT by sending it to a capable browser or by
using .NET. You must create an XslTransform object based on the XSLT
file in order to process XSLT in .NET. The XslTransform class is
located in the System.Xml.Xsl namespace. You also need a writer
object for output and an XML document for input. Open the XML
document as an XPathDocument to allow faster processing. You pass
the input and output to the Transform method (see Listing 2): XslTransform.Transform(docXML, _
xslArgs, writer)
Output can go to an XmlTextWriter or a StreamWriter; which one
you choose determines important elements of the processing. For
example, the processing ignores the xsl:output instruction when the
output stream is a StreamWriter. Remember to match the writer with
the output you intend, or—if you're familiar with XSLT—match the
xsl:output method attribute you're using. The class in Listing 2 also supports XSLT
arguments, and the downloadable version includes methods for both
XmlTextWriter and StreamWriter output. Remember to flush and close
as you do when you work with other streams.
Using XSLT can cause plenty of frustration. It's case-sensitive.
You must use difficult-to-read escape sequences to output special
characters, including some printable characters (for example, <
and > for less than and greater than). The .NET Framework help
index is missing many XSLT elements, although if you type "xslt,
reference" in the index, you can access a sufficiently complete and
concise reference. Error handling is another problem. When XSLT
errors occur, the exceptions the parser throws are extremely good,
often giving the exact position of the exception within even complex
XSLT. However, situations such as missing parameters and misspelled
template match clauses aren't errors in XSLT—you just don't
get any output. These sorts of issues make it a good idea to compare
your output with projected output, such as the strongly typed
DataSet you create by selecting Generate DataSet from the context
menu of VS.NET's XSD designer, which opens when you add a DataSet to
a project. You can use WinDiff or Word to perform the comparison.
Now you're prepared to do some effective code generation with
XSLT in .NET. Tackling the strongly typed DataSet is a challenge—it
might be the most difficult file you'll create using XSLT code
generation. However, taking control of the code generation provides
the best tactic for modifying strongly typed DataSets.
Download the resulting XSLT to see the full code, which is about
600 lines long and rather repetitive. It's split into six XSLT files
for easier handling. The main STD.xslt is the entry point, similar
to Sub Main in your .NET programs. This XSLT contains xsl:include
statements that specify the inclusion of the five other files: <xsl:include href="DataSet.xslt"/>
<xsl:include href="Table.xslt"/>
<xsl:include href="Row.xslt"/>
<xsl:include href="Event.xslt"/>
<xsl:include href="Support.xslt"/>
The xsl:include instruction combines the files so that they
process as one unit.
Generate the DataRow Class I'll
focus on strongly typed DataSets' internal structure and skip the
basics of how you use them. Each strongly typed DataSet is a single
file containing multiple classes—a class for the DataSet itself, and
three nested classes for each contained DataTable. The nested
classes are the tablenameDataRow class, which contains the
actual data; the tablenameDataTable class, which is a
collection of tablenameDataRow objects; and a class used in
raising change events. The first two classes, which are wrappers for
the corresponding untyped classes in ADO.NET, are used to access
data. Strongly typed DataSets have that name because they use strong
typing to access the data. For example, if a property is a Boolean,
you receive a compile-time error if you pass any other data type
(assuming Option Strict is on). Strongly typed DataSets are great,
but the inability to fine-tune them is a serious flaw. This XSLT
gets you around that problem and lets you make any changes you want.
Take a close look at Row.xslt in the download code to see these
techniques in action. The opening lines are XSLT goo that specifies
the namespaces and some overall processing rules: <?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0"
xmlns:xsl=
"http://www.w3.org/1999/XSL/Transform"
xmlns:xs=
"http://www.w3.org/2001/XMLSchema">
<xsl:strip-space elements="*"/>
<xsl:output method="text" />
The DataTable template in DataSet.xslt calls templates such as
the DerivedRow template in Row.xslt: <xsl:template match="DataTable"
mode="DerivedTableRowEvent">
<xsl:call-template name="DerivedTable"/>
<xsl:call-template name="DerivedRow"/>
<xsl:call-template name="DerivedEvent"/>
</xsl:template>
The preceding template is processed once for each DataTable node,
and it processes the DerivedRow template once each time. The context
of the DerivedRow template is the current DataTable node.
The DerivedRow template introduces variables: <xsl:template name="DerivedRow">
<xsl:variable name="Table"
select="@name" />
<Diagnostics.DebuggerStepThrough()> _
Public Class <xsl:value-of
select="$Table"/>Row
The XSLT variables' behavior is unusual. You can set an XSLT
variable's value only once, when you declare the variable. A new
value is set each time the template is processed, but you can't
change the value within the template once it's set. You access a
variable by prefixing its name with the dollar sign ($).
You call named templates using xsl:call-template, and match
templates with xsl:apply-templates. Both can use xsl:with-param to
specify parameters: <xsl:apply-templates select="DataColumn"
mode="RowColumnProperties">
<xsl:with-param name="Table"
select="$Table"/>
</xsl:apply-templates>
You pass the Table variable ($Table) to the DataColumn template
as a parameter that's also named Table.
You often need to include output based on a condition. The xsl:if
and xsl:choose instructions allow this. The
DataColumn/RowColumnProperties template uses xsl:choose to include
.NET code if the AllowDBNulls attribute is present for the element
in the input XML document: Get<xsl:choose>
<xsl:when test="not(@AllowDBNulls)">
' The download code tries the cast
' and throws exception if it fails
</xsl:when>
<xsl:otherwise>
' Simple cast without Try block
</xsl:otherwise>
</xsl:choose>
The template for outputting parent rows uses somewhat more
complex XPath filters. XPath filters can include three items: an
axis, a nodeset, and a predicate. The axis indicates the direction
from the current node that XPath should explore in looking for
matches. Some axes have shortcuts you can include in the match
itself. For example, // indicates that XPath can find the match in
any descendent position in the document, meaning anywhere in the
document if it appears at the start of the nodeset. Other important
shortcuts include . to indicate the current node and .. to indicate
the parent node.
Filter an Element's or Attribute's
Value You identify the predicate with square brackets. It
allows further filtering, typically on the value of a contained
element or an attribute. You must remember that the match indicates
the context for further processing, and the axes and predicate help
to filter the matches based on either document position or contained
information. This instruction indicates that the template should be
processed for Relation nodes anywhere within the input XML document
that contains a ChildTable element whose contents match the $Table
variable: <xsl:apply-templates
select="//Relation[ChildTable=$Table]"
mode="RowGetParentRows" />
The preceding code finds the parent relations of the current
DataTable, because it matches all the Relation nodes with a
ChildTable element equal to the current DataTable name. The context
within the RowGetParamRows template is the Relations node.
You'll notice that some of the download code uses the
xsl:for-each instruction. Its role is to process shorter fragments
of code; xsl:apply-templates is a much better solution for deeply
nested processing. Shifting gears to the xsl:apply-templates
approach is well worth the effort.
You should work carefully and do extensive testing while you work
with code generation, because you'll autogenerate any errors you
introduce. Complete isolation between autogenerated and manually
generated code is critical. .NET provides isolation through a couple
of mechanisms. The first is Regions; the Windows.Forms spit code
that contains the information you enter in the Property Grid
demonstrates their use in isolating generated code. The other new
mechanism supporting code isolation is inheritance. Inheritance
allows a single coherent class that comprises an autogenerated class
and a manually created one internally. Either can be the base class
in theory, but in practice, you usually have your base class
autogenerated, while your derived class contains the manual code,
such as validation.
You can integrate XSLT code generation in your projects in two
different ways, depending on your goal. If you're eager to modify
the strongly typed DataSet, you can download and modify the XSLT to
provide those changes. Start by running your XSD through these
transformations and use WinDiff or Word to compare the output file
with the one created using Visual Studio's Generate DataSet. This is
important because creating a strongly typed DataSet involves many
details, and I haven't addressed uncommon ones such as XSD
annotations. You can modify the XSLT once you know the
transformation works for you. The other approach you can take is to
create entirely new templates. For example, you could create
business objects that are based on your data structure. I've
included a variety of XSLT techniques in the downloadable files to
help you do this. Code generation is a relatively novel approach for
XSLT, so you might find it useful to review the full code, even if
you plan to create a type of output different from the strongly
typed DataSet.
About the Author Kathleen
Dollard is an independent consultant doing real-world development in
.NET technologies. She's currently using XSLT techniques to generate
450 classes in a 300+ KLOC project. She's active in the Denver
Visual Studio User Group and is a regular contributor to Visual
Studio Magazine, a Microsoft MVP, and a VBITS/VSLive! speaker.
Reach Kathleen at kathleen@mvps.org.
|