Recently I came across a problem with whitespace in xml that I was serializing from a custom entity class. The situation is this – I create my custom object, apply the XmlSerializer to it, generate the xml and put it in a memory stream without a hitch. I then convert to a string and save to a database. While verifying the data being saved I find that about %30 of the xml is whitespace, which can become quite considerable when you take into account hundreds of thousands of transactions. My first thought was to use good old regular expressions, but I was concerned about unintentionally removing whitespace that I may want to keep – such as inside an element or attribute. What I finally came up with was to load the xml string into an XmlDocument and set the PreserveWhitespace to false. See below for a simplified example.
Create and populate a custom object:
Car c = new Car();
c.Make = "Jeep";
c.Model = "Wrangler";
c.Year = "1981";
Use the XmlSerializer class to serialize the object as xml to a System.IO.MemoryStream:
System.IO.MemoryStream ms = new System.IO.MemoryStream();
System.Xml.Serialization.XmlSerializer xs = new System.Xml.Serialization.XmlSerializer(c.GetType());
xs.Serialize(ms, c);
Convert to string for storage in database or other use:
string str = System.Text.ASCIIEncoding.ASCII.GetString(ms.ToArray());
You will now have the following xml in your string variable (it actually has tabs too, but when publishing from Word, blogspot doesn't render the html quite as I expected) So this is nice, right?
<?xml version="1.0"?>
<Car xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Make>Jeep</Make>
<Model>Wrangler</Model>
<Year>1981</Year>
</Car>
Yes and no. The XmlSerializer class is very useful in that it is easy to implement, but it includes whitespace by default. This is good for presentation, but bad for data storage and transmission. The easiest way I have found to strip the white space is to do the following:
Create an XmlDocument and load the string into it:
System.Xml.XmlDocument xmlDoc = new System.Xml.XmlDocument();
xmlDoc.LoadXml(str);
Then set the PreserveWhitespace property to false:
xmlDoc.PreserveWhitespace = false;
Now the .OuterXml property of the XmlDocument will have this:
<?xml version="1.0"?><Car xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><Make>Jeep</Make><Model>Wrangler</Model><Year>1981</Year></Car>
All done! Xml without the whitespace!