Process XML in C#.net

By George Zheng

XML has been used in a lot of places for transferring data because of its platform independence and its simple, text-based, self-describing format. It is a common requirement for preparing and consuming XML data. Microsoft provides several convenience tools to help developers deal with XML under .NET platform.

  1. System.Xml.XmlTextReader/System.Xml.XmlTextWriter
  2. System.Xml.Serialization.XMLSerializer
  3. System.Xml.Linq

XmlTextReader/XmlTextWriter
XmlDocument objects are easy to use for navigating the DOM copying, modifying, or inserting nodes. However, they can also use a large amount of memory to store the entire DOM of a large XML string in memory. Because of this, the XmlTextReader and XmlTextWriter classes can be used for stream based manipulation of an XML string.
Let’s say you have an XML as:
<?xml version=1.0 encoding=utf-8 ?>
<user id=1>
<firstname>George</firstname>
<lastname>Zheng</lastname>
</user>

Following code will read it into a user object.

User user = new User();
XmlTextReader reader = new XmlTextReader(“user.xml”);

while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element)
{
if (reader.Name == “user”)
{
user.Id = Convert.ToInt32(reader.GetAttribute(“id”));
}
if (reader.Name == “firstname”)
{
user.Firstname = reader.ReadElementContentAsString();
}
if (reader.Name == “lastname”)
{
user.Lastname = reader.ReadElementContentAsString();
}
}
}
With XmlTextWriter, following code will generate XML string:

StringBuilder sb = new StringBuilder();
StringWriter sw = new StringWriter(sb);
XmlTextWriter writer = new XmlTextWriter(sw);

writer.WriteStartElement(“user”);
writer.WriteAttributeString(“id”, user.Id.ToString());
writer.WriteElementString(“firstname”, user.Firstname);
writer.WriteElementString(“lastname”, user.Lastname);
writer.WriteEndElement();

return sb.ToString();

XMLSerializer

If you don’t want to deal with the XML element by element, .NET provides an alternative approach to convert object instance to XML and read the data back – XML Serialization. This is more convenience for a complex object.

TextReader tr = new StringReader(xml);
XmlSerializer s = new XmlSerializer(typeof(User));
User user = (User)s.Deserialize(tr);

Following code can be used to serialize the object:

XmlSerializer s = new XmlSerializer(typeof(User));
TextWriter writer = new StringWriter();
s.Serialize(writer, user);
writer.Flush();
string xml = writer.ToString();
writer.Close();

The object can be a complex object like:

public class User
{
[XmlAttribute("id")]
public int Id { get; set; }

[XmlElement("firstname")]
public string Firstname { get; set; }
[XmlElement("lastname")]
public string Lastname { get; set; }

[XmlArray("projects")]
[XmlArrayItem("project", typeof(Project))]
public List<Project> Projects { get; set; }
}

public class Project
{
[XmlAttribute("id")]
public int Id { get; set; }
[XmlElement("name")]
public string Name { get; set; }
}

As you can see, serializer attributes can be used to specify how the object’s properties map to XML entities. I’m not going to discuss these attributes here. Details can be found from Useful Links below.

LINQ

In some case, convert the whole XML may not be a good idea. Linq provides a convenience approach to query a complex XML.

Here is the example account feed response from Google Analytics API

<?xml version=1.0 encoding=UTF-8?>
<feed xmlns=http://www.w3.org/2005/Atom xmlns:openSearch=http://a9.com/-/spec/opensearchrss/1.0/ xmlns:dxp=http://schemas.google.com/analytics/2009>
<id>http://www.google.com/analytics/feeds/accounts/abc@test.com</id>
<updated>2009-06-25T03:55:22.000-07:00</updated>
<title type=text>Profile list for abc@test.com</title>
<link rel=self type=application/atom+xml href=http://www.google.com/analytics/feeds/accounts/default/>
<author>
<name>Google Analytics</name>
</author>
<generator version=1.0>Google Analytics</generator>
<openSearch:totalResults>12</openSearch:totalResults>
<openSearch:startIndex>1</openSearch:startIndex>
<openSearch:itemsPerPage>12</openSearch:itemsPerPage>
<entry>
<id>http://www.google.com/analytics/feeds/accounts/ga:1174</id>
<updated>2009-06-25T03:55:22.000-07:00</updated>
<title type=text>www.googlestore.com</title>
<link rel=alternate type=text/html href=http://www.google.com/analytics/>
<dxp:tableId>ga:1174</dxp:tableId>
<dxp:property name=ga:accountId value=30481/>
<dxp:property name=ga:accountName value=Google Store/>
<dxp:property name=ga:profileId value=1174/>
<dxp:property name=ga:webPropertyId value=UA-30481-1/>
<dxp:property name=ga:currency value=USD/>
<dxp:property name=ga:timezone value=America/Los_Angeles/>
</entry>
<entry>
<id>http://www.google.com/analytics/feeds/accounts/ga:6284812</id>
<updated>2009-01-06T17:39:33.000-08:00</updated>
<title type=text>www.googlestore.com (Test Team)</title>
<link rel=alternate type=text/html href=http://www.google.com/analytics/>
<dxp:tableId>ga:6284812</dxp:tableId>
<dxp:property name=ga:accountId value=30481/>
<dxp:property name=ga:accountName value=Google Store/>
<dxp:property name=ga:profileId value=6284812/>
<dxp:property name=ga:webPropertyId value=UA-30481-1/>
<dxp:property name=ga:currency value=USD/>
<dxp:property name=ga:timezone value=America/Los_Angeles/>
</entry>
</feed>

With following code, you will get ID, Title and AccountName for each entry.

XDocument doc = XDocument.Parse(xml);
XNamespace dxpSpace = doc.Root.GetNamespaceOfPrefix(“dxp”);
XNamespace defaultSpace = doc.Root.GetDefaultNamespace();

IEnumerable<Account> entries =
from en in doc.Root.Descendants(defaultSpace + “entry”)
select new Account
{
ID = en.Element(defaultSpace + “id”).Value,
Title = en.Element(defaultSpace + “title”).Value,
AccountName =
en.Elements(dxpSpace + “property”).Where(
xe => xe.Attribute(“name”).Value == “ga:accountName”).First().
Attribute(“value”).Value
};

Useful Links

Insert XML Nodes Using XmlTextReader and XmlTextWriter
Using the XmlSerializer Attributes
Added Google Analytics Reader for .NET


download Download source code

This entry was posted on Tuesday, July 14th, 2009 at 8:57 am and is filed under .NET Development, C#. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

One Response to “Process XML in C#.net”

  1. Bill Conniff Says:

    You might be interested in CAX, a caching xml parser, I developed. It sits on top of the XmlReader and buffers and caches all parsed xml so you can go back and look at what has been parsed. It is intended for transforming large xml that is too big to fit in memory for classes that do xsl transformation.

Leave a Reply