Abstract Syntax Notation (ASN.1) is an abstract notational format designed to represent simple and complex objects in a machine-independent format (http://asn1.elibel.tm.fr/standards/). It's an underlying building block used for data transmission in several major protocols, including (but not limited to) the following:
ASN.1 is used by quite a few popular protocols on the Internet, so vulnerabilities in major ASN.1 implementations could result in myriad exploitable attack vectors. As always, when encountering a protocol for the first time, you should analyze the blocks of data that are going to be interpreted by remote nodes first to get a basic understanding of how things work and discover some hints about what's likely to go wrong. ASN.1 is not a protocol as such, but a notational standard for expressing some arbitrary protocol without having to define an exact binary representation (an abstract representationhence the name). Therefore, to transmit data for a protocol that uses ASN.1, some encoding rules need to be applied to the protocol definitions. These rules must allow both sides participating in data exchange to accurately interpret information . There are three standardized methods for encoding ASN.1 data:
Auditing applications that use ASN.1 means you're auditing code that implements one of these encoding standards. So you need to be familiar with how these encoding rules work, and then you can apply the lessons learned earlier in Part II of this book. Before you jump into the encoding schemes, take a look at the data types defined by the ASN.1 notational standard, so you know what kind of data elements you are actually going to be encoding. Types for ASN.1 are divided into four classes:
Of these classes, only universal types, summarized in Table 16-1, are defined by the ASN.1 standard; the other three are for private implementation use. ASN.1 also distinguishes between primitive and constructed types. Primitive types are those that can be expressed as a simple value (such as an integer, a Boolean, or an octet string). Constructed types are composed of one or more simple types and other constructed types. Constructed types can be sequences (SEQUENCE), lists (SEQUENCE-OF, SET, and SET-OF), or choices. Note There's no tag value for choices because they are used when several different types can be supplied in the data stream, so choice values are untagged. Basic Encoding RulesBasic Encoding Rules (BER) defines a method for encoding ASN.1 data suitable for transmission across the network. It's a deliberately ambiguous standardthat is, it allows objects to be encoded in several different ways. The rules were invented with this flexibility in mind so they can deal with different situations where ASN.1 might be used. Some encodings are more useful when objects are small and need to be easy to traverse; other encodings are more suited to applications that transmit large objects. The BER specification describes BER-encoded data as consisting of four components, described in the following sections: an identifier, a length, some content data, and an end-of-contents (EOC) sequence. IdentifierThe identifier field represents the tag of the data type being processed. The first byte comprises several fields, as shown in Figure 16-14. Figure 16-14. BER identifier fields
The fields in this byte are as follows.
LengthThe length field, as the name suggests, indicates how many bytes are in the current object. It can indicate a definite or an indefinite length for the object. An indefinite length means the object length is unknown and is terminated with a special EOC sequence. According to the specification (X.690-0207), an indefinite length field should be used only for a constructed sequence (see the explanation of primitive and constructed types after Table 16-1). An indefinite length is indicated simply by a single-length byte with the top bit set and all other bits clear (so the value of the byte is 0x80). The rules for indicating a definite length are as follows:
ContentsThe contents depend on the tag type indicating what type of data the object contains. End of CharacterThe EOC field is required only if this object has an indefinite length. The EOC sequence is two consecutive bytes that are both zero (0x00 0x00). Canonical Encoding and Distinguished EncodingDistinguished Encoding Rules (DER) and Canonical Encoding Rules (CER) are subsets of BER. As mentioned, BER is ambiguous in some ways. For example, you could encode a length of 100 in a few different ways, as shown in the following list:
CER and DER limit the options BER specifies for various purposes, as explained in the following sections. Canonical Encoding RulesCanonical Encoding Rules (CER) are intended to be used when large objects are being transmitted; when all the object data isn't available; or when object sizes aren't known at transmission time. CER uses the same encoding rules as BER, with the following provisions:
Restrictions are also imposed on string and set encodings, but they aren't covered here. For more information, see Chapter 9 of the X.690-207 standard. Distinguished Encoding RulesDistinguished Encoding Rules (DER) are intended to be used for smaller objects in which all bytes for objects are available and the lengths of objects are known at transmission time. DER imposes the following provisions on the basic BER encoding rules:
Vulnerabilities in BER, CER, and DER ImplementationsNow that you know how objects are encoded in BER, you might have an idea of possible vulnerabilities in typical implementations. As you can see, BER implementations can be complex, and there are many small pitfalls that can happen easily. The following sections explain a few of the most common. Tag EncodingsTags contain multiple fields, some combinations of which are illegal in certain incarnations of BER. For example, in CER, an octet string of less than or equal to 1,000 bytes must be encoded using a primitive form rather than a constructed form. Is this rule really enforced? Depending on what code you're examining, this rule could be important. For example, an IDS decoding ASN.1 data might apply CER rules strictly, decide this data is erroneous input, and not continue to analyze object data; the end implementation, on the other hand, might be more relaxed and accept the input. Apart from these situations, failure to adhere to the specification strictly might not cause security-relevant consequences. Another potential issue with tag encodings is that you might trick an implementation into reading more bytes than are available in the data stream being read, as shown in this example: int decode_tag(unsigned char *ptr, int *length, int *constructed, int *class) { int c, tagnum; *length = 1; c = *ptr++; *class = (c & C0) >> 6; *constructed = (c & 0x20) ? 1 : 0; tagnum = c & 0x1F; if(tagnum != 31) return tagnum; for(tagnum = 0, (*length)++; (c = *ptr) & 0x80; ptr++, (*length)++){ tagnum <<= 7; tagnum |= (c & 0x7F); } return tagnum; } int decode_asn1_object(unsigned char *buffer, size_t length) { int constructed, header_length, class, tag; tag = decode_tag(buffer, &header_length, &constructed, &class); length -= header_length; buffer += header_length; ... do more stuff ... } This code has a simple error; the header_length can be made longer than length in decode_asn1_object(), which leads to an integer underflow on length. This error results in processing random data from the process heap or possibly memory corruption. Length EncodingsMany ASN.1 vulnerabilities have been uncovered in length encoding in the past. A few things might go wrong in this process. First, in multibyte length encodings, the first byte indicates how many length bytes follow. You might run into vulnerabilities if the length field is made to be more bytes than are left in the data stream (similar to the tag encoding vulnerability examined previously). Second, when using the extended length-encoding value, you can specify 32-bit integers; as you already know, doing so can lead to all sorts of problems, usually integer overflows or signed issues. Integer overflows are common when the length value is rounded before an allocation is made. For example, eEye discovered this overflow in the Microsoft ASN.1 implementation. Some annotated assembly code taken from the eEye advisory (www.eeye.com/html/research/advisories/AD20040210-2.html) is shown: 76195338 mov eax, [ebp-18h] ; = length of simple bit string 7619533B cmp eax, ebx ; (EBX = 0) 7619533D jz short 7619539A ; skip this bit string if empty 7619533F cmp [ebp+14h], ebx ; = no-copy flag 76195342 jnz short 761953AF ; don't concatenate if no-copy 76195344 mov ecx, [esi] ; = count of accumulated bits 76195346 lea eax, [ecx+eax+7] ; *** INTEGER OVERFLOW *** 7619534A shr eax, 3 ; div by 8 to get size in bytes 7619534D push eax 7619534E push dword ptr [esi+4] 76195351 push dword ptr [ebp-4] 76195354 call DecMemReAlloc ; allocates a zero-byte block In this code, the 32-bit length taken from the ASN.1 header (stored in eax in this code) is added to the amount of accumulated (already read) bytes plus 7. The data is a bit string, so you need to add 7 and then divide by 8 to find the number of bytes required (because lengths are specified in bits for a bit string). Triggering an integer overflow causes DecMemReAlloc() to allocate a 0-byte block, which isn't adequate to hold the amount of data subsequently copied into it. Signed issues are also likely in ASN.1 length interpreting. OpenSSL used to contain a number of vulnerabilities of this type, as discussed in Chapter 6 in the section on signed integer vulnerabilities. Packed Encoding Rules (PER)Packed Encoding Rules (PER) is quite different from the BER encoding scheme you've already seen. It's designed as a more compact alternative to BER. PER can represent data objects by using bit fields rather than bytes as the basic data unit. PER can be used only to encode values of a single ASN.1 type. ASN.1 objects encoded with PER consist of three fields described in the following sections: preamble, length, and contents. PreambleA preamble is a bit map used when dealing with sequence, set, and set-of data types. It indicates which optional fields of a complex structure are present. LengthThe length encoding for data elements in PER is a little more complex than in BER because you're dealing with bit fields, and a few more rules are involved in PER's length-decoding specification. The length field can represent a size in bytes, bits, or a count of data elements, depending on the type of data being encoded. There are two types of encoding: aligned variants (those aligned on octet boundaries) and unaligned variants (those not necessarily aligned on octet boundaries). Lengths for data fields can also be constrained (by enforcing a maximum and minimum length), semiconstrained (enforcing only a maximum or minimum length), or unconstrained (allowing any length of data to be specified). An important note: The program decoding a PER bit stream must already know the structure of an incoming ASN.1 stream so that it knows how to decode the length. The program must know whether the length data represents a constrained or unconstrained length and what the boundaries are for constrained lengths; otherwise, it's impossible to know the true value the length represents. Unconstrained LengthsFor an unconstrained length, the following encoding is used:
Constrained and Semiconstrained LengthsA constrained length is encoded as a bit field; its size varies depending on the range of lengths that can be supplied. There are several different ways to encode constrained lengths, depending on the range. The length is encoded as "length lower bound," which conserves space and prevents users from being able to specify illegal length values for constrained numbers. In general, a constrained length is encoded by determining the range of values (per the ASN.1 specification for the data being transmitted), and then using a bit field that's the exact size required to represent that range. For example, say a field can be between 1,000 and 1,008 bytes. The range of lengths that can be supplied is 8, so the bit field would be 3 bits. Note This discussion is a slight oversimplification of how constrained lengths are encoded, but it's fine for the purposes of this chapter. Interested readers can refer to Clause 10.5 of the PER specification (X.691-0207) for full details. Vulnerabilities in PERPER implementations can have a variety of integer-related issues, as in BER. The problems in PER are a little more restricted, however, especially for constrained values. Even for unconstrained lengths, you're limited to sending sequences of 64KB chunks, which can prevent integer overflows from occurring. Implementations that make extensive use of 16-bit integers are definitely at high risk, however, as they can be made to wrapparticularly because the length attribute might represent a count of elements (so an allocation would multiply the count by the size of each element). Errors in decoding lengths could also result in integer overflows of 16-bit integers. Specifically, unconstrained lengths allow you to specify large blocks of data in 64KB chunks, and each chunk has a size determined by getting the bottom 6 bits of the octet and multiplying it by 16KB. You're supposed to encode only a value of 1 to 4, but the implementation might not enforce this rule, as in the following example: #define LENGTH_16K (1024 * 16) unsigned short decode_length(PER_BUFFER *buffer) { if(GetBits(buffer,1) == 0) return GetBits(buffer, 7); if(GetBits(buffer,1) == 0) return GetBits(buffer, 14); return GetBits(buffer, 6) * LENGTH_16K; } unsigned char *decode_octetstring(PER_BUFFER *buffer) { unsigned char *bytes; unsigned long length; length = decode_length(buffer); bytes = (unsigned char *)calloc(length+1, sizeof(unsigned char)); if(!bytes) return NULL; decode_bytes(bytes, buffer, length); return bytes; } In this example, no verification is done to ensure that the low 6 bits of a large object encode only a value between 1 and 4 (inclusive). By specifying a larger value, the multiplication of 16KB causes truncation of the high 16 bits of the result (because decode_length() returns a 16-bit integer). Another thing to be wary of is checking return values incorrectly. Take a look at the previous example modified slightly: #define LENGTH_16K (1024 * 16) int decode_length(PER_BUFFER *buffer) { if(bytes_left(buffer) <= 0) return -1; if(GetBits(buffer,1) == 0) return GetBits(buffer, 7); if(GetBits(buffer,1) == 0){ if(bytes_left(buffer) < 2) return -1; return GetBits(buffer, 14); } return GetBits(buffer, 6) * LENGTH_16K; } unsigned char *decode_octetstring(PER_BUFFER *buffer) { unsigned char *bytes; unsigned long length; length = decode_length(buffer); bytes = (unsigned char *)calloc(length+1, sizeof(unsigned char)); if(!bytes) return NULL; decode_bytes(bytes, buffer, length); return bytes; } In this example, you can't trigger a 16-bit integer wrap because decode_length() returns an integer; however, the function now returns -1 on error, which isn't checked for. In fact, if an error is returned, the -1 is passed as a length to calloc(). It's then added to 1, resulting in 0 bytes allocated, followed by a large copy in decode_bytes(). XML Encoding RulesXML Encoding Rules (XER) provides a standard for encoding ASN.1 in XML documents. XML is complex markup language, and basic XML rules aren't covered in this section. XER is quite different from the other encoding formats; it's a textual representation of ASN.1 objects, as opposed to the other encoding formats, which are binary. Therefore, the problems you run into with XER are likely to be far different. Note Should you be confronted with the task of auditing an XER implementation, you'll probably need to analyze the XML implementation to ensure that the code is secure. After all, if the XML parser is broken, it doesn't matter what XER bugs you might fix because the underlying XML parser can be attacked directly. An XER-encoded object consists of two parts: an XML prolog and an XML document element that describes a single ASN.1 object. The XML document element contains the actual ASN.1 object data. It's simply encoded by using standard document element conventions in XML. The XML prolog doesn't have to be used. If it is, it's a standard XML header tag, which might look like this: <?xml version = "1.0" encoding="UTF-8"> XER VulnerabilitiesThe most likely vulnerabilities in XER are obviously text-based errorssimple buffer overflows or pointer arithmetic bugs. When auditing XER implementations, remember that programs that exchange data by using XER are often exposing a huge codebase to untrusted data. This applies not just to XER but to the XML implementation and encoding schemes for transmitting and storing XML data. In particular, check the UTF encoding schemes for encoding Unicode codepoints, which are discussed in depth in Chapter 8. |