Recipe15.7.Handling Invalid Characters in an XML String


Recipe 15.7. Handling Invalid Characters in an XML String

Problem

You are creating an XML string. Before adding a tag containing a text element, you want to check it to determine whether the string contains any of the following invalid characters:

 < > " ' & 

If any of these characters are encountered, you want them to be replaced with their escaped form:

 &lt; &gt; &quot; &apos; &amp; 

Solution

There are different ways to accomplish this, depending on which XML-creation approach you are using. If you are using XmlWriter, the WriteCData, WriteString, WriteAttributeString, WriteValue, and WriteElementString methods take care of this for you. If you are using XmlDocument and XmlElements, the XmlElement.InnerText method will handle these characters.

The two ways to handle this using an XmlWriter work like this. The WriteCData method will wrap the invalid character text in a CDATA section, as shown in the creation of the InvalidChars1 element in the example that follows. The other method, using XmlWriter, is to use the WriteElementString method that will automatically escape the text for you, as shown while creating the InvalidChars2 element.

 // Set up a string with our invalid chars. string invalidChars = @"<>\&'"; XmlWriterSettings settings = new XmlWriterSettings(); settings.Indent = true; using (XmlWriter writer = XmlWriter.Create(Console.Out, settings)) {     writer.WriteStartElement("Root");     writer.WriteStartElement("InvalidChars1");     writer.WriteCData(invalidChars);     writer.WriteEndElement();     writer.WriteElementString("InvalidChars2", invalidChars);     writer.WriteEndElement(); } 

The output from this is:

 <?xml version="1.0" encoding="IBM437"?> <Root>     <InvalidChars1><![CDATA[<>\&']]></InvalidChars1>     <InvalidChars2>&lt;&gt;\&amp;'</InvalidChars2> </Root> 

There are two ways you can handle this problem with XmlDocument and XmlElement. The first way is to surround the text you are adding to the XML element with a CDATA section and add it to the InnerXML property of the XmlElement:

 // Set up a string with our invalid chars. string invalidChars = @"<>\&'"; XmlElement invalidElement1 = xmlDoc.CreateElement("InvalidChars1"); invalidElement1.AppendChild(xmlDoc.CreateCDataSection(invalidChars)); 

The second way is to let the XmlElement class escape the data for you by assigning the text directly to the InnerText property like this:

 // Set up a string with our invalid chars. string invalidChars = @"<>\&'"; XmlElement invalidElement2 = xmlDoc.CreateElement("InvalidChars2"); invalidElement2.InnerText = invalidChars; 

The whole XmlDocument is created with these XmlElements in this code:

 public static void HandlingInvalidChars( ) {     // Set up a string with our invalid chars.     string invalidChars = @"<>\&'";     XmlDocument xmlDoc = new XmlDocument( );     // Create a root node for the document.     XmlElement root = xmlDoc.CreateElement("Root");     xmlDoc.AppendChild(root);     // Create the first invalid character node.     XmlElement invalidElement1 = xmlDoc.CreateElement("InvalidChars1");     // Wrap the invalid chars in a CDATA section and use the     // InnerXML property to assign the value as it doesn't     // escape the values, just passes in the text provided.     invalidElement1.InnerXml = "<![CDATA[" + invalidChars + "]]>";     // Append the element to the root node.     root.AppendChild(invalidElement1);     // Create the second invalid character node.     XmlElement invalidElement2 = xmlDoc.CreateElement("InvalidChars2");     // Add the invalid chars directly using the InnerText     // property to assign the value as it will automatically     // escape the values.     invalidElement2.InnerText = invalidChars;     // Append the element to the root node.     root.AppendChild(invalidElement2);     Console.WriteLine("Generated XML with Invalid Chars:\r\n{0}",xmlDoc.OuterXml);      Console.WriteLine( );  } 

The XML created by this procedure (and output to the console) looks like this:

 Generated XML with Invalid Chars:  <Root><InvalidChars1><![CDATA[<>\&']]></InvalidChars1><InvalidChars2>&lt;&gt;\  &amp;'</InvalidChars2></Root> 

Discussion

The CDATA node allows you to represent the items in the text section as character data, not as escaped XML, for ease of entry. Normally these characters would need to be in their escaped format (&lt; for < and so on), but the CDATA section allows you to enter them as regular text.

When the CDATA tag is used in conjunction with the InnerXml property of the XmlElement class, you can submit characters that would normally need to be escaped first. The XmlElement class also has an InnerText property that will automatically escape any markup found in the string assigned. This allows you to add these characters without having to worry about them.

See Also

See the "XmlDocument Class," "XmlWriter Class," "XmlElement Class," and "CDATA Sections" topics in the MSDN documentation.



C# Cookbook
Secure Programming Cookbook for C and C++: Recipes for Cryptography, Authentication, Input Validation & More
ISBN: 0596003943
EAN: 2147483647
Year: 2004
Pages: 424

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net