Type-safe XML programming in .NET

Posted by Freek Paans on 20 October 2010

Tag(s): Infi, Open Source

In most of our projects, we store our application data in a RDBMS. For a recent desktop application, however, we decided to store the data in an XML file. The reason for this being that there were no ACID requirements and the notion of using files to store data is of course more or less the standard in desktop applications. XML was chosen because of the wide range of tools available to manipulate that format.

One of the problems with XML, however, is that it is all text-based. Now, for manipulating the file manually, this is all great. In our programs, however, we would like to work in a type-safe manner. That is, if a certain element or attribute is supposed to be an integer, we’d like our program to treat it as an integer, that is; it should be exposed as something (variable, property, function, etc…) of type int. We can extend this notion to complex (compound) elements; in our program we would like to be able to work with data types that resemble the structure of those complex elements, which means we need to define corresponding classes.

To make this possible we need a mapping between the XML and the classes as they are defined in our program. This mapping is similar to the mapping between an RDBMS and objects (ORM). For XML mapping, however, we are not concerned with difficulties such as identity management and concurrency. In this blog we’ll describe a specifically created tool, called XSDMapper, which can automatically generate such mapper code(outputting C# code) from an XML Schema Definition (XSD) file.

Note: Microsoft used to offer a similar tool, called “LINQ to XSD”. While development by Microsoft was aborted, it is now continued as an open source project on http://linqtoxsd.codeplex.com/. For certain reasons (for one the object logic being tied too much with XML, that is, not being Persistent Ignorant), it was decided to brew our own tool. Our tool, in turn, does not handle as much of XSD as “LINQ to XSD” does.

XSD Primer

As the name implies, an XSD describes the way an XML file is structured. That is; it describes the elements, attributes, and relationships between elements that are valid in an XML file which adheres to the XSD. XSD files are XML files themselves (whose structure itself is also described by an XSD).
While it is beyond the scope of this article to completely describe the XSD syntax and semantics, we will explore some of its features by looking at an example. In this example, we will use the complete subset of XSD that we’ve implemented in XSDMapper. This subset is listed in table 1. Further information on XSD can be found at W3C or w3schools.org.

Table 1. XSD subset  currently supported by XSDMapper. Of the xs:type attribute, we currently support xs:int, xs:string, xs:double and xs:decimal.

Element

Attributes

Subelements

xs:schema

xmlns

xs:element

xs:complexType

xs:element

type

name

minOccurs

maxOccurs

 

xs:complexType

name

xs:sequence

xs:attribute

xs:sequence

 

xs:element

xs:attribute

type

name

use ([required|optional])

 

XSD Example

For this example we want to use XML to store data for a GUI application that stores ‘user information’, consisting of a username, password and optionally the age of the user.  As well as this user data, the application also uses the XML to store application preferences about the location of the window on the screen. The XSD is listed in Code Listing 1.

Code listing 1. Sample XSD to describe a configuration XML for a GUI application which manages user accounts

<?xml version="1.0" encoding="utf-8"?>

<xs:schema id="test"

targetNamespace="http://infi.nl/appconfig.xsd"

elementFormDefault="qualified"

xmlns="http://infi.nl/appconfig.xsd"

xmlns:mstns="http://infi.nl/appconfig.xsd"

xmlns:xs="http://www.w3.org/2001/XMLSchema"

> 

       <xs:element name="ApplicationData" type="data" />

       <xs:complexType name="data">

             <xs:sequence>

                    <xs:element type="UIConfig" name="UI" maxOccurs="1" minOccurs="1" />

                    <xs:element type="AppData" name ="App" maxOccurs="1" minOccurs="1" />

             </xs:sequence>

       </xs:complexType>

       <xs:complexType name="AppData">

             <xs:sequence>

                    <xs:element name="Users" type="User" minOccurs="0" maxOccurs="unbounded" />

             </xs:sequence>

       </xs:complexType>

       <xs:complexType name="User">

             <xs:attribute name="Username" type="xs:string" use="required" />

             <xs:attribute name="Password" type="xs:string" use="required" />

             <xs:attribute name="Age" type="xs:int" />

       </xs:complexType>

       <xs:complexType name="UIConfig">

             <xs:sequence>

                    <xs:element minOccurs="0" maxOccurs="1" type="Rect" name="WindowPosition" />

             </xs:sequence>

             <xs:attribute type="xs:string" name="title" use="required"/>

       </xs:complexType>

       <xs:complexType name="Rect">

             <xs:attribute type="xs:int" name="left" use="required"/>

             <xs:attribute type="xs:int" name="top" use="required"/>

             <xs:attribute type="xs:int" name="width" use="required"/>

             <xs:attribute type="xs:int" name="height" use="required"/>

       </xs:complexType>

</xs:schema>

The root element is of type xs:schema, it uses the xmlns argument to denote the namespace in which the defined elements will live. The first child of xs:schema is going to be the root element of our XML. For each element, we need to at least specify a name and a type. Names can be anything we like. The type, however, must be an XSD internal type (which are prefixed with “xs:”) or a custom type we are going to define ourselves. For our root element, we’ll call it ApplicationData and it will be of type data.

Since data is obviously not an XSD internal type, we need to somehow define this type. This is done using the xs:complexType element, whose name attribute specifies the xs:elementtype attribute value by which we can refer to it. An xs:complexTypecan specify possible child elements and/or attributes. To specify child elements, we need to enclose their definitions in an xs:sequenceelement. For our data example, we specify 2 child elements, named UI and App, being of types UIConfig and AppData respectively. For both elements, we also specify that they are required and cannot appear multiple times, using the minOccurs and maxOccurs attributes. The minOccurs and maxOccurs attributes are not required, and both default to 1. To specify that a certain element can appear an infinite number of times, you can set maxOccurs to “unbounded”.

To define an attribute, the xs:attribute element is used. A collection of these arguments can either be used instead or after the xs:sequence element in a complex type. They have a name, type and optionally useattribute. Name can once again be anything you like, while type should be an XSD primitive (such as xs:int, xs:double, xs:string, etc…). The use attribute is optional and is used to specify whether the attribute is required or optional, this is specified by setting it to required and optional, respectively. The default is optional.

The Usertype is an example of a type with attributes only, while the UIConfig specifies both child elements and attributes.

Code listing 2 shows valid XML for this XSD.

Code listing 2. Example XML for the XSD specified in code listing 1.

<?xml version="1.0" encoding="utf-8"?>

<ApplicationData xmlns="http://infi.nl/appconfig.xsd">

       <UI title="Test Window">

              <WindowPosition left="10" top="100" width="100" height="10" />

       </UI>

       <App>

              <Users Username="freek" Password="freekpassword" />

              <Users Username="bram" Password="brampass" Age="23" />

       </App>

</ApplicationData>

The desired programming model

Even though I am a big fan of Domain Driven Design, in the application XSDMapper was targeted for, the domain logic was not that complex and basically a one-to-one mapping between the XML and our domain classes sufficed. Therefore, we wanted our programming model to resemble the XSD schema as closely as possible. Luckily, .NET has the programming constructs to handle most (if not all) of these features in a very natural way. Specifically, this means we want:

  1. A one-to-one mapping between the XML and our .NET classes. This means that for every complexType in the XSD, we want a corresponding class in .NET.
  2. XSD primitive types should be mapped to corresponding .NET primitive types.
  3. Non-repeatable, non-optional elements should be exposed as a property of corresponding .NET type.
  4. Depending on the corresponding .NET type of the element, non-repeatable, optional elements should be of type
    • Nullable<T>, with T the .NET type if the .NET type is a Value Type.
    • T, with T the .NET type if the .NET type is a Class.
  5. Repeatable elements should be exposed as a property of type IList<T>, T again being the .NET type.
  6. Attributes should be exposed analogously to elements in point 3 and 4.

For more advanced applications, it might be argued that we should provide for another layer of indirection between the classes and the XML. But as said, for our needs a one-to-one mapping was sufficient. Also, all the properties will be defined public since this eases the persistence process by not having to rely on reflection while still maintaining a sense of Persistence Ignorance, as discussed in the next section.

Maintaining Persistence Ignorance

The holy grail of DDD is Persistence Ignorance (PI) of domain classes. PI basically means that the classes you use to handle your domain logic shouldnot be concerned with how they are persisted to non-volatile storage. This is important since the choice of a persistence method should not influence the design of your domain logic.

Unfortunately, true PI is unattainable in current mainstream programming languages such as .NET, C++, PHP and Java. The problem is manifold, but some of the main issues (which are applicable to XSDMapper) are:

  • Non-public property access rules; since our objects are obviously going to be persisted by other classes, these other classes somehow need to access the non-public properties of the objects that are to be persisted. In current languages this is not possible. A workaround many ORM layers employ is the use of reflection to access these properties. Reflection is, however, a bit of a hairy technique since you sacrifice a lot of protection the language usually offers you.
  • Lazy loading; to lazy load means we load the data only when it is actually accessed. To implement this, a property needs to know where to get its data from, which in turn means that it needs to know about the datasource which contains the data. This directly violates the principle of PI.A common pattern to circumvent this restriction is for persistence layers to subclass the base class and override the properties that need lazy loading, creating a so-called proxy class. This, however, requires that the base class denotes each property to be virtual. Then, whenever we request an object of certain type from the datasource, we actually retrieve an object of the proxy type.

XSDMapper currently does not handle non-public properties for persistence. It does support lazy loading by employing the proxy class pattern. Currently, the tool generates both the base class (also called the Plain Old CLR Object, POCO, since it is PI) and the proxy class. You are, however, free to substitute the POCO by your own implementation. In general, this won’t be necessary though, since the POCO is defined as being partial which allows you to extends the functionality of the class in a different file (this is a really nice feature of C#, which is employed by a lot of .NET tools that do code generation).

In XSDMapper, the XML can be loaded using the XSDXMLLoader class, while persistence-enabled classes can be written using the XSDXMLWriter class.

Writing XSDMapper

Now that we know the desired input and output, we can start writing XSDMapper. The overall structure and ideas of the program are depicted in Figure 1.

Figure 1. Schematic overview of all the parts involved in XSDMapper

The first step is to read the XSD file. Since an XSD is just an XML file, we can leverage any .NET technique for reading XML. XML support is quite mature in .NET and can choose from a variety of options. For XSDMapper, we chose LINQ to XML. After parsing the XSD with LINQ to XML, we have the XSD available in a generic, non-type-safe tree structure of elements and attributes.

At this point it directly becomes apparent why there is a need for XSDMapper; although the XML is correctly parsed in terms of elements and attributes, the resulting tree exposes none of the semantics specified in the XSD. Therefore, our next step is to extract the semantics from the tree structure. This means we are mapping the data in the elements and attributes to specific, self-defined classes such as XSDComplexType, XSDElement and XSDAttribute. Since XSDMapper itself is not yet available at this stage, we need to do this manually. This is done in the file xsdparser.cs.

Now code generation can begin, this happens in the file SchemaWriter.cs. For each complexType defined in the XML, we generate a new POCO class definition. For every attribute or sub-element in the complexType definition, we generate a property according to the rules in the “Desired Programming Model” section. The actual routines are a bit tedious but straight-forward.

After the generation of the POCO class definitions, we generate the proxy class definition. The proxy classes also use LINQ to XML to access the underlying XML. The proxy classes offer lazy loading on all the properties. They define a constructor with an argument of type XElement which is the basic type for XML elements in LINQ to XML. Whenever a property is accessed for the first time, the class loads the property from the specified XElement, parses it into the desired C# data type and assigns it to the property.

For each complexType a static Write method is defined on the XSDXMLWriter class. This Write method serializes an instance of a .NET class corresponding to the complexType to an XElement, using the element name specified as the first argument.

Finally, the XSDXMLLoader class is generated. This class provides methods to load an XML conforming to the XSD from either a file system pathor a string.
All code is generated using templates which are specified in the file SchemawriterTemplates.cs.

All generated classes are by default defined in the DataAccessLayer namespace, though this can be changed using a command line parameter.

Using XSDMapper

Basic usage:

xsdmapper<filename>

This loads the XSD <filename> and generates a C# code file with the same name as the XSD.

XSDMapper offers the following (optional) command line arguments:

  • /out:<filename>: specify the output filename
  • /noPOCO: don’t generate the base types
  • /ns:<namespace>: use namespaceas the namespace for the mapper and classes. (Default is DataAccessLayer)

Limitations and room for future improvements

XSDMapper is far from complete, among others, the following limitations spring to mind:

  • Far from complete XSD support
  • No support for non-public properties
  • No error handling whatsoever
  • No clue whether current lazy loading support is actually efficient (all depends how the XML is parsed by LINQ to XML)
  • Hardly tested.

Aside from these limitations, the tool is actually very useful; even with the limited subset of XSD available, you can still define pretty complex data structures. With XSDMapper, you can write the XSD and then manipulate it in a type-safe manner from within your application in just a matter of minutes.

Note on the example

The example program ‘testXSDMapper’ is a Windows Forms 2.0 example that uses the DataGridView control to edit users. On quitting the application, the list and UI settings will be persisted to config.xml.

To compile the example you first need to generate the mapper code by running XSDMapper on appconfig.xsd.
 

Dowload related files:

Comments:

Posted by Patrick on 20 October 2010, 16:10:
I thought .NET came with a tool like this called xsd, http://www.xml.com/lpt/a/1003.

Also, LINQ should work against most arbitrary types, http://www.claassen.net/geek/blog/2007/11/searching-tree-of-objects-with-linq-2.html
Posted by Freek Paans on 20 October 2010, 16:10:
@Patrick
I didn't know about that tool before writing that and it seems that you can indeed use that to generate classes that are serializable to xml, which is of course pretty useful. However, a major goal of our tool is to make sure that we can work with POCOs. The code generated by xsd.exe (from what I've seen from your link) has a lot of metadata purely for the purpose of serialization. This is not what we want, since it makes switching to another persistence framework (like Entity Framework or NHibernate) later on very hard.
Posted by Sean Hederman on 22 October 2010, 14:10:
What about the XML attributes and the XmlSerializer? XmlElementAttribute, XmlAttributeAttribute and suchlike. Allows fully POCO classes.

Oh, and BTW, EF will NO work with POCO classes, MS have lied through their noses about that one.

Post a comment:

Name:
E-mail*:
Comment:
*optional, will not be published.