Abstract Syntax Notation (ASN.1)

Abstract Syntax Notation (ASN.1) is an abstract notational format designed to represent simple and complex objects in a machine-independent format (http://asn1.elibel.tm.fr/standards/). It's an underlying building block used for data transmission in several major protocols, including (but not limited to) the following:

Certificate and key encoding Primarily used in SSL and ISAKMP, but also used in other places, such as PGP-encoded keys.
Authentication information encoding Microsoft-based operating systems use ASN.1 extensively for transmitting authentication information, particularly when NTLM authentication is used.
Simple Network Management Protocol (SNMP) Objects are encoded with ASN.1 in SNMP requests and replies.
Identity encoding Used in ISAKMP implementations to encode identity information.
Lightweight Directory Access Protocol (LDAP) Objects communicated over LDAP also use ASN.1 as a primary encoding scheme.

ASN.1 is used by quite a few popular protocols on the Internet, so vulnerabilities in major ASN.1 implementations could result in myriad exploitable attack vectors.

As always, when encountering a protocol for the first time, you should analyze the blocks of data that are going to be interpreted by remote nodes first to get a basic understanding of how things work and discover some hints about what's likely to go wrong.

ASN.1 is not a protocol as such, but a notational standard for expressing some arbitrary protocol without having to define an exact binary representation (an abstract representationhence the name). Therefore, to transmit data for a protocol that uses ASN.1, some encoding rules need to be applied to the protocol definitions. These rules must allow both sides participating in data exchange to accurately interpret information . There are three standardized methods for encoding ASN.1 data:

Basic Encoding Rules (BER)
Packed Encoding Rules (PER)
XML Encoding Rules (XER)

Auditing applications that use ASN.1 means you're auditing code that implements one of these encoding standards. So you need to be familiar with how these encoding rules work, and then you can apply the lessons learned earlier in Part II of this book.

Before you jump into the encoding schemes, take a look at the data types defined by the ASN.1 notational standard, so you know what kind of data elements you are actually going to be encoding. Types for ASN.1 are divided into four classes:

Universal Universal tags are for data types defined by the ASN.1 standard (listed in Table 16-1).

Table 16-1. ASN.1 Universal Data Types
Universal Identifier	Data Type
0	Reserved
1	Boolean
2	Integer
3	Bit string
4	Octet string
5	Null
6	Object identifier
7	Object descriptor
8	Extended and instance-of
9	Real
10	Enumerated type
11	Embedded PDV
12	UTF-8 string
13	Relative object identifier
14	Reserved
15	Reserved
16	Sequence and sequence-of
17	Set and set-of
18	Numeric character string
19	Printable character string
20	Teletex character string
21	Videotex character string
22	International alphabet 5 (IA5) character string
23	UTC time
24	Generalized time
25	Graphic character string
26	Visible character string
27	General character string
28	Character string
29	Character string
30	Character string

Application Tags that are unique to an application.
Context-specific These tags are used to identify a member in a constructed type (such as a set).
Private Tags that are unique in an organization

Of these classes, only universal types, summarized in Table 16-1, are defined by the ASN.1 standard; the other three are for private implementation use.

ASN.1 also distinguishes between primitive and constructed types. Primitive types are those that can be expressed as a simple value (such as an integer, a Boolean, or an octet string). Constructed types are composed of one or more simple types and other constructed types. Constructed types can be sequences (SEQUENCE), lists (SEQUENCE-OF, SET, and SET-OF), or choices.

Note

There's no tag value for choices because they are used when several different types can be supplied in the data stream, so choice values are untagged.

Basic Encoding Rules

Basic Encoding Rules (BER) defines a method for encoding ASN.1 data suitable for transmission across the network. It's a deliberately ambiguous standardthat is, it allows objects to be encoded in several different ways. The rules were invented with this flexibility in mind so they can deal with different situations where ASN.1 might be used. Some encodings are more useful when objects are small and need to be easy to traverse; other encodings are more suited to applications that transmit large objects. The BER specification describes BER-encoded data as consisting of four components, described in the following sections: an identifier, a length, some content data, and an end-of-contents (EOC) sequence.

Identifier

The identifier field represents the tag of the data type being processed. The first byte comprises several fields, as shown in Figure 16-14.

Figure 16-14. BER identifier fields

The fields in this byte are as follows.

Class (2 bits) The class of the data type, which can be universal (0), application (1), context-specific (2), or private (3).
P/C (1 bit) Indicates whether the field is primitive (value of 0) or constructed (value of 1).
Tag number (5 bits) The actual tag value. If the tag value is less than or equal to 30, it's encoded as a normal byte value in the lower 5 bits. If the tag value is larger than 30, all tag bits are set to 1, and the tag value is specified by a series of bytes following the tag byte. Each byte uses the lower 7 bits to represent part of the tag value and the top bit to indicate whether any more bytes follow. So if all tag bits are set to 1, an indefinite number of tag bytes follow, and processing stops when a byte with a clear top bit is encountered. To encode the value 0x3333, for example, the 0xFF 0xD6 0x33 byte sequence would be used. The lead byte can vary, depending on whether the value is universal or private, constructed, or primitive.

Length

The length field, as the name suggests, indicates how many bytes are in the current object. It can indicate a definite or an indefinite length for the object. An indefinite length means the object length is unknown and is terminated with a special EOC sequence. According to the specification (X.690-0207), an indefinite length field should be used only for a constructed sequence (see the explanation of primitive and constructed types after Table 16-1). An indefinite length is indicated simply by a single-length byte with the top bit set and all other bits clear (so the value of the byte is 0x80). The rules for indicating a definite length are as follows:

For a length value of 127 or less, a single octet is supplied, in which the length value is supplied in the lower 7 bits and the top bit is clear. For example, to express a length of 100, the byte 0x64 would be supplied.
For a length value larger than 127, the top bit is set and the low 7 bits are used to indicate how many length octets follow. For example, to indicate a length of 65,535, you would supply the following bytes: 0x82 0xFF 0xFF.

The contents depend on the tag type indicating what type of data the object contains.

End of Character

The EOC field is required only if this object has an indefinite length. The EOC sequence is two consecutive bytes that are both zero (0x00 0x00).

Canonical Encoding and Distinguished Encoding

Distinguished Encoding Rules (DER) and Canonical Encoding Rules (CER) are subsets of BER. As mentioned, BER is ambiguous in some ways. For example, you could encode a length of 100 in a few different ways, as shown in the following list:

0x64 Single-byte encoding
0x81 0x64 Multi-byte encoding
0x82 0x00 0x64 Multi-byte encoding

CER and DER limit the options BER specifies for various purposes, as explained in the following sections.

Canonical Encoding Rules

Canonical Encoding Rules (CER) are intended to be used when large objects are being transmitted; when all the object data isn't available; or when object sizes aren't known at transmission time. CER uses the same encoding rules as BER, with the following provisions:

Constructed types must use an indefinite length encoding.
Primitive types must use the fewest encoding bytes necessary to express the object size. For example, an object with a length of 100 can give the length only as a single byte, 0x64. Any other length expressions are illegal.

Restrictions are also imposed on string and set encodings, but they aren't covered here. For more information, see Chapter 9 of the X.690-207 standard.

Distinguished Encoding Rules

Distinguished Encoding Rules (DER) are intended to be used for smaller objects in which all bytes for objects are available and the lengths of objects are known at transmission time. DER imposes the following provisions on the basic BER encoding rules:

All objects must have a definite length encoding; there are no indefinite length objects (and, therefore, no EOC sequences on objects encoded with DER).
The length encoding must use the fewest bytes necessary for expressing a size (as with CER).

Vulnerabilities in BER, CER, and DER Implementations

Now that you know how objects are encoded in BER, you might have an idea of possible vulnerabilities in typical implementations. As you can see, BER implementations can be complex, and there are many small pitfalls that can happen easily. The following sections explain a few of the most common.

Tag Encodings

Tags contain multiple fields, some combinations of which are illegal in certain incarnations of BER. For example, in CER, an octet string of less than or equal to 1,000 bytes must be encoded using a primitive form rather than a constructed form. Is this rule really enforced? Depending on what code you're examining, this rule could be important. For example, an IDS decoding ASN.1 data might apply CER rules strictly, decide this data is erroneous input, and not continue to analyze object data; the end implementation, on the other hand, might be more relaxed and accept the input. Apart from these situations, failure to adhere to the specification strictly might not cause security-relevant consequences.

Another potential issue with tag encodings is that you might trick an implementation into reading more bytes than are available in the data stream being read, as shown in this example:

int decode_tag(unsigned char *ptr, int *length,                int *constructed, int *class) {     int c, tagnum;     *length = 1;     c = *ptr++;     *class = (c & C0) >> 6;     *constructed = (c & 0x20) ? 1 : 0;     tagnum = c & 0x1F;     if(tagnum != 31)         return tagnum;     for(tagnum = 0, (*length)++; (c = *ptr) & 0x80;         ptr++, (*length)++){         tagnum <<= 7;         tagnum |= (c & 0x7F);     }     return tagnum; } int decode_asn1_object(unsigned char *buffer, size_t length) {     int constructed, header_length, class, tag;     tag = decode_tag(buffer, &header_length,                      &constructed, &class);     length -= header_length;     buffer += header_length;     ... do more stuff ... }

This code has a simple error; the header_length can be made longer than length in decode_asn1_object(), which leads to an integer underflow on length. This error results in processing random data from the process heap or possibly memory corruption.

Length Encodings

Many ASN.1 vulnerabilities have been uncovered in length encoding in the past. A few things might go wrong in this process. First, in multibyte length encodings, the first byte indicates how many length bytes follow. You might run into vulnerabilities if the length field is made to be more bytes than are left in the data stream (similar to the tag encoding vulnerability examined previously).

Second, when using the extended length-encoding value, you can specify 32-bit integers; as you already know, doing so can lead to all sorts of problems, usually integer overflows or signed issues. Integer overflows are common when the length value is rounded before an allocation is made. For example, eEye discovered this overflow in the Microsoft ASN.1 implementation. Some annotated assembly code taken from the eEye advisory (www.eeye.com/html/research/advisories/AD20040210-2.html) is shown:

76195338 mov eax, [ebp-18h] ; = length of simple bit string 7619533B cmp eax, ebx ; (EBX = 0) 7619533D jz short 7619539A ; skip this bit string if empty 7619533F cmp [ebp+14h], ebx ; = no-copy flag 76195342 jnz short 761953AF ; don't concatenate if no-copy 76195344 mov ecx, [esi] ; = count of accumulated bits 76195346 lea eax, [ecx+eax+7] ; *** INTEGER OVERFLOW *** 7619534A shr eax, 3 ; div by 8 to get size in bytes 7619534D push eax 7619534E push dword ptr [esi+4] 76195351 push dword ptr [ebp-4] 76195354 call DecMemReAlloc ; allocates a zero-byte block

In this code, the 32-bit length taken from the ASN.1 header (stored in eax in this code) is added to the amount of accumulated (already read) bytes plus 7. The data is a bit string, so you need to add 7 and then divide by 8 to find the number of bytes required (because lengths are specified in bits for a bit string). Triggering an integer overflow causes DecMemReAlloc() to allocate a 0-byte block, which isn't adequate to hold the amount of data subsequently copied into it.

Signed issues are also likely in ASN.1 length interpreting. OpenSSL used to contain a number of vulnerabilities of this type, as discussed in Chapter 6 in the section on signed integer vulnerabilities.

Packed Encoding Rules (PER)

Packed Encoding Rules (PER) is quite different from the BER encoding scheme you've already seen. It's designed as a more compact alternative to BER. PER can represent data objects by using bit fields rather than bytes as the basic data unit. PER can be used only to encode values of a single ASN.1 type. ASN.1 objects encoded with PER consist of three fields described in the following sections: preamble, length, and contents.

Preamble

A preamble is a bit map used when dealing with sequence, set, and set-of data types. It indicates which optional fields of a complex structure are present.

Length

The length encoding for data elements in PER is a little more complex than in BER because you're dealing with bit fields, and a few more rules are involved in PER's length-decoding specification. The length field can represent a size in bytes, bits, or a count of data elements, depending on the type of data being encoded.

There are two types of encoding: aligned variants (those aligned on octet boundaries) and unaligned variants (those not necessarily aligned on octet boundaries). Lengths for data fields can also be constrained (by enforcing a maximum and minimum length), semiconstrained (enforcing only a maximum or minimum length), or unconstrained (allowing any length of data to be specified). An important note: The program decoding a PER bit stream must already know the structure of an incoming ASN.1 stream so that it knows how to decode the length. The program must know whether the length data represents a constrained or unconstrained length and what the boundaries are for constrained lengths; otherwise, it's impossible to know the true value the length represents.

Unconstrained Lengths

For an unconstrained length, the following encoding is used:

If the length to be encoded is less than 128, you can encode it in a single byte, with the top bit set to 0 and the lower 7 bits used to encode the length.
If the length is larger than 127 but less than 16KB, two octets are used; the first octet has the two most significant bits set to 1 and 0. The length is then encoded in the remaining 6 bits of the first octet and the entire second octet.
If the length is 16KB or larger, a single octet is supplied with the two most significant bits set to 1 and the lower 6 bits encoding a value from 1 to 4. That value is then multiplied by 16KB to find the real length, so a maximum of 64KB can be represented with this one byte. Because lengths can be larger than that or be a value that's not a multiple of 16KB, any remaining data can follow this length-value pair by using the same encoding rules. So a value of 64KB + 2 would be split up into two length-value fields, one with a length of 64KB followed by 64KB of data and the next field with a length of 2 followed by 2 bytes of data.

Constrained and Semiconstrained Lengths

A constrained length is encoded as a bit field; its size varies depending on the range of lengths that can be supplied. There are several different ways to encode constrained lengths, depending on the range. The length is encoded as "length lower bound," which conserves space and prevents users from being able to specify illegal length values for constrained numbers. In general, a constrained length is encoded by determining the range of values (per the ASN.1 specification for the data being transmitted), and then using a bit field that's the exact size required to represent that range. For example, say a field can be between 1,000 and 1,008 bytes. The range of lengths that can be supplied is 8, so the bit field would be 3 bits.

Note

This discussion is a slight oversimplification of how constrained lengths are encoded, but it's fine for the purposes of this chapter. Interested readers can refer to Clause 10.5 of the PER specification (X.691-0207) for full details.

Vulnerabilities in PER

PER implementations can have a variety of integer-related issues, as in BER. The problems in PER are a little more restricted, however, especially for constrained values. Even for unconstrained lengths, you're limited to sending sequences of 64KB chunks, which can prevent integer overflows from occurring. Implementations that make extensive use of 16-bit integers are definitely at high risk, however, as they can be made to wrapparticularly because the length attribute might represent a count of elements (so an allocation would multiply the count by the size of each element). Errors in decoding lengths could also result in integer overflows of 16-bit integers. Specifically, unconstrained lengths allow you to specify large blocks of data in 64KB chunks, and each chunk has a size determined by getting the bottom 6 bits of the octet and multiplying it by 16KB. You're supposed to encode only a value of 1 to 4, but the implementation might not enforce this rule, as in the following example:

#define LENGTH_16K (1024 * 16) unsigned short decode_length(PER_BUFFER *buffer) {     if(GetBits(buffer,1) == 0)         return GetBits(buffer, 7);     if(GetBits(buffer,1) == 0)         return GetBits(buffer, 14);     return GetBits(buffer, 6) * LENGTH_16K; } unsigned char *decode_octetstring(PER_BUFFER *buffer) {     unsigned char *bytes;     unsigned long length;     length = decode_length(buffer);     bytes = (unsigned char *)calloc(length+1,                                     sizeof(unsigned char));     if(!bytes)         return NULL;     decode_bytes(bytes, buffer, length);     return bytes; }

In this example, no verification is done to ensure that the low 6 bits of a large object encode only a value between 1 and 4 (inclusive). By specifying a larger value, the multiplication of 16KB causes truncation of the high 16 bits of the result (because decode_length() returns a 16-bit integer).

Another thing to be wary of is checking return values incorrectly. Take a look at the previous example modified slightly:

#define LENGTH_16K (1024 * 16) int decode_length(PER_BUFFER *buffer) {     if(bytes_left(buffer) <= 0)         return -1;     if(GetBits(buffer,1) == 0)             return GetBits(buffer, 7);     if(GetBits(buffer,1) == 0){         if(bytes_left(buffer) < 2)             return -1;             return GetBits(buffer, 14);     }     return GetBits(buffer, 6) * LENGTH_16K; } unsigned char *decode_octetstring(PER_BUFFER *buffer) {     unsigned char *bytes;     unsigned long length;     length = decode_length(buffer);     bytes = (unsigned char *)calloc(length+1,                                     sizeof(unsigned char));     if(!bytes)         return NULL;     decode_bytes(bytes, buffer, length);     return bytes; }

In this example, you can't trigger a 16-bit integer wrap because decode_length() returns an integer; however, the function now returns -1 on error, which isn't checked for. In fact, if an error is returned, the -1 is passed as a length to calloc(). It's then added to 1, resulting in 0 bytes allocated, followed by a large copy in decode_bytes().

XML Encoding Rules

XML Encoding Rules (XER) provides a standard for encoding ASN.1 in XML documents. XML is complex markup language, and basic XML rules aren't covered in this section. XER is quite different from the other encoding formats; it's a textual representation of ASN.1 objects, as opposed to the other encoding formats, which are binary. Therefore, the problems you run into with XER are likely to be far different.

Note

Should you be confronted with the task of auditing an XER implementation, you'll probably need to analyze the XML implementation to ensure that the code is secure. After all, if the XML parser is broken, it doesn't matter what XER bugs you might fix because the underlying XML parser can be attacked directly.

An XER-encoded object consists of two parts: an XML prolog and an XML document element that describes a single ASN.1 object. The XML document element contains the actual ASN.1 object data. It's simply encoded by using standard document element conventions in XML. The XML prolog doesn't have to be used. If it is, it's a standard XML header tag, which might look like this:

<?xml version = "1.0" encoding="UTF-8">

XER Vulnerabilities

The most likely vulnerabilities in XER are obviously text-based errorssimple buffer overflows or pointer arithmetic bugs. When auditing XER implementations, remember that programs that exchange data by using XER are often exposing a huge codebase to untrusted data. This applies not just to XER but to the XML implementation and encoding schemes for transmitting and storing XML data. In particular, check the UTF encoding schemes for encoding Unicode codepoints, which are discussed in depth in Chapter 8.

Table 16-1. ASN.1 Universal Data Types

Basic Encoding Rules

Identifier

Figure 16-14. BER identifier fields

Length

Contents

End of Character

Canonical Encoding and Distinguished Encoding

Canonical Encoding Rules

Distinguished Encoding Rules

Vulnerabilities in BER, CER, and DER Implementations

Tag Encodings

Length Encodings

Packed Encoding Rules (PER)

Preamble

Length

Unconstrained Lengths

Constrained and Semiconstrained Lengths

Vulnerabilities in PER

XML Encoding Rules

XER Vulnerabilities