Avoiding Canonical Representation Issues | .NET Development Security Solutions

Data is in a canonical representation when it conforms to the requirements of a particular standard, published or not. The precise representation depends on the standard. For example, the HyperText Markup Language (HTML) is a loose standard with a lot of room for interpretation. (Technologies such as HTML and XML rely on a document type definition, DTD, to define the tags and attributes, and the order in which those elements can appear.) XML defines the format of the data, but not the presentation or content. Adding a schema to XML (which is standard) further defines the data ordering and data types. The Simple Object Access Protocol (SOAP) is a little stricter interpretation of data transfer requirements. In short, the representation varies by standard and the room for error varies by the strictness of the standard.

Tip

You can find many documents and standards that discuss canonical representation issues, but one of the more revealing documents is a W3C document entitled, “Exclusive XML Canonicalization.” This document discusses issues such as how a digital signature should affect a subdocument when you remove it from the main XML document. You can learn more about this standard at http://www.w3.org/TR/2002/REC-xml-exc-c14n-20020718/.

When you create a data transfer methodology for an application, you create a standard for that data that affects the canonical representation of the information. For example, creating a Web service means defining the information that SOAP presents to the user of the Web service and the expectations your Web service has for data exchange. Just about every kind of application relies on some form of canonical representation. Of course, databases are the most organized of these applications because a database relies on a specific schema. The following list presents ideas on how you can avoid canonical representation issues.

Precise Data Format The format of the information is important. One of the reasons that browsers have problems with HTML is that the developer defines the data format poorly, in many cases, causing problems when the browser interprets the data. The XML standards, such as the XML/HTML hybrid called eXtensible HyperText Markup Language (XHTML), seek to correct this problem by enforcing specific data formatting rules. For example, every opening tag must have a closing tag. However, this requirement also applies to other application types. It’s essential to define a specific data format so that every application requiring the data can understand the form it receives.

Data Type Definition A standard that doesn’t precisely define data types is going to cause problems. For example, it’s possible to create a standard where every data element is a string, but this leads to validation and verification problems. Including a numeric data type is a step in the right direction, but even so, a simple numeric type causes problems because one sender could provide an integer (32-bit number), while another provides a long (64-bit number). The standard should also specify how to handle dates, times, Boolean values, and currency. It should also differentiate between integers and real numbers.

Parsing Requirements One of the more interesting issues that developers have to consider is the problem of data parsing. It’s important to answer the question of which party performs this task. In general, developers agree that it’s the responsibility of the party receiving the data to parse it and check it for correctness. In other words, you should never rely on the party sending the data to get the information correct.

Data Ordering Non-ordered data causes security problems because the application must receive data in whatever order the sender chooses. The resulting complexity often leaves holes in the implementation that are hard to find and fix. Using a precise data order doesn’t increase the complexity of sending the data, but it does reduce the complexity of receiving the data. If the order of the data is incorrect, the recipient can assume some type of error has occurred.

Error Reporting and Handling It’s essential to provide some type of error reporting and handling mechanism. A .NET application can accomplish this task by raising an error locally. However, when working with a remote data source or sink (the recipient of the data), the application must provide some type of error reporting mechanism other than a raising an error. Most technologies today rely on a specially formatted message. For example, SOAP relies on this approach. You can also return an error value, which is the technique used by the Win32 API.