5. Signed and Unsigned Data Types | Win32 API Programming with Visual Basic

Page 64

5. Signed and Unsigned Data Types


		In this chapter, we will take a close look at signed and unsigned data types. The topic is one of special importance in view of the prevalence of unsigned data types in the Win32 API, their general absence in Visual Basic, and the frequent difficulty of translating an unsigned value into its corresponding signed value.


	Signed and Unsigned Representations


		We have seen that VC++ and the Win32 API use both signed and unsigned integral data types, whereas VB has only one unsigned type the Byte data type. This can create a problem when an API function either expects or returns an unsigned data type. To understand just what is involved, we need to take a look at the internal workings of these data types and how they are represented in memory.


		The place to start is with some carefully defined terminology. We will couch our examples in terms of 16-bit words, but what we say applies equally to words of any length.


		A 16-bit word is simply a string of 16 binary bits, as in:


		w = 1111000011110000


		The key point is that a binary word is not a number until and unless it is given an interpretation as a number. It is just a string of bits.


		There are two common ways in which integers are represented as 16-bit words in most computers, including PCs: the unsigned representation is used to represent only nonnegative integers, and the two's complement signed representation is used to represent both negative and nonnegative integers. The latter is generally abbreviated in Microsoft's documentation as the signed representation, and we may do

Page 65


		so as well, although there are other types of signed representations (including the one's complement signed representation and the signed magnitude representation).


		It is important to understand that it is not the number itself that is signed or unsigned, it is the representation of the number as a binary word that is signed or unsigned. Numbers are neither signed nor unsigned. (A number can be positive, negative, or zero, but that is not the same thing as signed or unsigned all numbers have a sign.) Thus, the commonly used term signed integer is highly misleading. It should be read as signed representation of the integer. Nevertheless, this terminology is so common and so convenient that we will use it as well.


		When we declare an integer variable in Visual Basic and give it a value, as in:


		Dim i As Integer i = 5


		VB represents the integer using the two's complement signed representation. In this, we have no choice. Put another way when VB interprets a binary word as an integer, it does so using the two's complement binary representation of that word. Period. On the other hand, VC++ is more flexible, allowing us to choose the representation, as in:


		unsigned int ui; ui = 65000; int i; // or signed int i; i = -30000;


		Here ui is an unsigned integer, that is, ui is represented in memory by VC++ as a (32-bit) binary word using the unsigned representation. On the other hand, i is a signed integer, that is, i is represented in memory using the two's complement signed representation.


	Why Two Different Representations?


		The reason for using an unsigned representation for integers is simple: by using an unsigned representation, we can represent larger positive integers than when using a signed representation. In exchange, we give up the ability to represent negative numbers.


		In particular, a 16-bit word that uses the two's complement signed representation can represent any integer in the range -32768 to 32767, whereas a 16-bit word that uses the unsigned representation can represent integers in the range 0 to 65535.


		There are compelling reasons to include both signed and unsigned representations in a programming language. It is probably not necessary to comment on the fact that a language would be hampered significantly if it could not represent negative numbers. On the other hand, an unsigned representation is useful for two reasons.

Page 66


		When we want to do arithmetic with positive numbers only, we get a larger range of values using an unsigned representation. This happens with addresses, for instance. If the word size of a computer is, say, 32 bits, then the most natural way to access all 2³² possible memory addresses is by using an unsigned representation.


		Many numeric data types do not require the use of arithmetic. For instance, window handles are 32-bit numbers, but it makes no sense to add, subtract, or otherwise manipulate these numbers. They are strictly for identification purposes. Thus, the HANDLE data type is an unsigned long data type.


		It is time that we consider how the two's complement signed and unsigned interpretations actually work.


	Unsigned Representation


		When a 16-bit word is interpreted as unsigned, we simply count up from 0 in binary, as in the following list (the double arrow stands for ''represents"):


		0000 0000 0000 0000 0 0000 0000 0000 0001 1 0000 0000 0000 0010 2 . . . 0111 1111 1111 1111 2^15 - 1 = 32767 1000 0000 0000 0000 2^15 = 32768 . . . 1111 1111 1111 1111 2^16 - 1 = 65535


		Thus, the unsigned interpretation of 16-bit words allows us to represent all integers in the range 0 to 2¹⁶-1=65535.


		As you probably know, each position in a binary number represents a power of 2, just as each position in a decimal number represents a power of 10. Table 5-1 is a template for creating unsigned representations of numbers. Each column represents a successive power of 2.

Table 5-1. A Template for Unsigned Representations
2¹⁵	2¹⁴	2¹³	2¹²	2¹¹	2¹⁰	2⁹	2⁸	2⁷	2⁶	2⁵	2⁴	2³	2²	2¹	2⁰
32768	16384	8192	4096	2048	1024	512	256	128	64	32	16	8	4	2	1

Page 67


		Table 5-2 shows an example of filling in this template to find the unsigned representation of the integer 50000. By successively subtracting powers of 2, starting with the largest one that fits, we arrive at:


		50000 = 32768 + 16384 + 512 + 256 + 64 + 16


		Next, we place these numbers in the third row of Table 5-2 and then put 1s underneath the numbers and 0s everywhere else. This gives a fourth row in Table 5-2, from which we get the unsigned representation:


		1100 0011 0101 0000 (unsigned) 50000

Table 5-2. An Unsigned Example: Representing 50000
2¹⁵	2¹⁴	2¹³	2¹²	2¹¹	2¹⁰	2⁹	2⁸	2⁷	2⁶	2⁵	2⁴	²³	2²	2¹	2⁰
32768	16384	8192	4096	2048	1024	512	256	128	64	32	16	8	4	2	1
32768	16384					512	256		64		16
1	1	0	0	0	0	1	1	0	1	0	1	0	0	0	0


	Signed Representation


		The strategy used in two's complement representation is to use the leftmost bit as an indicator of the sign of the number. The leftmost bit is called the sign bit. If the sign bit is a 0, the word is interpreted as a nonnegative integer. If the sign bit is 1, the number is interpreted as a negative integer:


		0xxx xxxx xxxx xxxx nonnegative integer 1xxx xxxx xxxx xxxx negative integer


		*The Signed-Magnitude Representation*


		The most obvious way to fill in the other bits is with the magnitude (or absolute value) of the number. For example, to represent the positive number 5, we would write:


		0000 0000 0000 0101 5


		since the binary representation of 5 is 101. For the negative number -5, we would simply change the sign bit:


		1000 0000 0000 0101 -5


		This method of representing both positive and negative numbers is called the signed-magnitude representation. It is very simple, but not very useful. One problem is that arithmetic with numbers represented in this way requires taking special cases based on the sign of the number. (Just try adding the binary representations for 5 and -5.) Also, there are two representations of the number 0 (+0 and -0):


		0000 0000 0000 0000 0 1000 0000 0000 0000 0

Page 68


		*The Two's Complement Representation*


		The two's complement representation is a much better approach and is used by most modern computers. It is easy to describe using a table. The analog of Table 5-1 for the two's complement signed interpretation is Table 5-3. The only difference between this table and Table 5-1 is the negative sign in the first column.

Table 5-3. A Template for Two's Complement Signed Representations

-2¹⁵

2¹⁴

2¹³

2¹²

2¹¹

2¹⁰

2⁹

2⁸

2⁷

2⁶

2⁵

2³

2²

2¹

^-32768

¹⁶³⁸⁴

⁸¹⁹²

⁴⁰⁹⁶

²⁰⁴⁸

¹⁰²⁴

⁵¹²

²⁵⁶

¹²⁸

⁶⁴

³²

¹⁶

⁸

⁴


		To illustrate, Table 5-4 computes the signed representation of the integer -15536. Note that it is the same as the unsigned representation of 50000:


		1100 0011 0101 0000 (signed) -15536

Table 5-4. A Signed Example: Representing -15536
-2¹⁵	2¹⁴	2¹³	2¹²	2¹¹	2¹⁰	2⁹	2⁸	2⁷	2⁶	2⁵	2⁴	2³	2²	2¹	2⁰
-32768	16384	8192	4096	2048	1024	512	256	128	64	32	16	8	4	2	1
-32768	16384					512	256		64		16
1	1	0	0	0	0	1	1	0	1	0	1	0	0	0	0


		Since the only difference between Table 5-1 and Table 5-3 is that the numbers in the first column are negative, it is clear that a binary word with a sign bit of 0 represents the same integer using either the signed or the unsigned representation. Put another way, for integers in the range 0 to 32767, the two representations are identical.


		Also, since the sum of all of the numbers in the first row of Table 5-3 is -1, it is clear that a number is negative if and only if its sign bit is 1.


		Following is a list that shows how signed representation works. The list is ordered by increasing binary word. Note the sudden change from positive to negative integers in the middle of the list.


		0000 0000 0000 0000 0 0000 0000 0000 0001 1 0000 0000 0000 0010 2 . . . 0111 1111 1111 1111 2^15 - 1 = 32767 ' positive 1000 0000 0000 0000 -2^15 = -32768 ' negative 1000 0000 0000 0001 -2^15 + 1 = -32767

Page 69


		. . . 1111 1111 1111 1101 -3 1111 1111 1111 1110 -2 1111 1111 1111 1111 -1


		*Why Is It Called Two's Complement?*


		The reason that it is called this has to do with how we take the negative of a number that is represented in this form. Consider any number x written in two's complement form. Let us use the number in Table 5-4 (x = -15536):


		x 1100 0011 0101 0000


		Consider now the ordinary complement of this binary word; that is, the word obtained by changing all 0s to 1s and all 1s to 0s:


		x^c 0011 1100 1010 1111


		Adding the two binary numbers gives:


		x + x^c 1111 1111 1111 1111


		Note that this will be the result no matter what number we start with.


		But the binary word consisting of all 1s is the representation of the number -1, so we have:


		x + x^c = -1


		from which it follows that:


		x + (x^c + 1) = 0


		or:


		-x = x^c + 1


		Thus, to get the negative of a number, we take the complement of its signed representation and then add 1. The resulting binary word is called the two's complement of the original binary word. Thus, to get the negative of a number that is represented in two's complement form, just take the two's complement of the number's binary representation..


	Translating Between Signed and Unsigned Representations


		Now we come to the heart of the matter translation between signed and unsigned representations. There are two issues to consider.

Page 70


		First, we may need to pass to an API function a number that is too large to fit in the corresponding VB signed data type. For instance, we may need to pass a 16-bit representation of a number in the upper "unsigned" range 32768 to 65535, say for example the number 50000. In VC++, we could simply write:


		unsigned short usVar; usVar = 50000;


		but in VB, the code:


		Dim iVar As Integer iVar = 5000


		will produce an Overflow runtime error. Note that we cannot use the code:


		Dim lVar As Long lVar = 50000


		because the function is expecting a 16-bit binary word.


		The second problem is the reverse. Suppose, for example, that an API function wants to return a 16-bit value in the range 32768 to 65535, such as 50000. Of course, the return value must be in a VB variable (since we are working in VB). But VB will interpret the variable as having a signed data type. In fact, it will interpret the number as -15536 because, as we have seen, this number has the same signed representation as the number 50000 has unsigned representation. So, the question is: "How do we recover the intended value?"


		These problems are easy to solve if we look at them in the correct light. Referring to Figure 5-1, the point is that VB will give a 16-bit binary word a signed interpretation, whereas Win32 will give it an unsigned interpretation (we are assuming here that Win32 is expecting or returning an unsigned short integer).


		Figure 5-1. Passing numbers between VB and Win32


		As in Figure 5-1, if w is a 16-bit binary word, let us write un(w) to denote the number obtained by thinking of w as an unsigned representation and si(w) as the number obtained by thinking of w as a signed representation. Thus, from our previous example in Table 5-2 and Table 5-4, we have:

Page 71


		un(1100 0011 0101 0000) = 50000 si(1100 0011 0101 0000) = -15536


		The point to keep in mind is that we are actually passing or receiving a binary word, not a number. VB and Win32 will both interpret this binary word as a number. The difficulty comes when they use different interpretations. VB uses the signed integer interpretation, and Win32 (we are assuming) uses the unsigned short integer interpretation.


		Thus, referring to Figure 5-1, in passing a number un(w) to an API function, we need to tell VB to pass the number si(w), since Win32 will interpret the binary word w that is actually passed (on the stack) as un(w). Conversely, in receiving a number, VB will see it as si(w), and we need to make the translation to si(w) which, by the way, will require using a larger VB data type to hold the value.


		So the whole problem boils down to translating between si(w) and un(w).


		We can see how to make these translations by noting that the only difference between Table 5-1 and Table 5-3 is the negative sign in the first column. Accordingly, there are two cases to consider.


		The first case is when the number si(w) is nonnegative or, equivalently, the number un(w) is in the lower half of the unsigned range (0 to 32767). (Whether we are passing or receiving, we will know one of these numbers!) In this case, the sign bit of w is 0. Hence, as we have seen:


		un(w) = si(w)


		Thus, in this case, we can use an ordinary VB integer to pass the number, and, in the other direction, the return value in a VB integer is the actual number (no changes are necessary).


		On the other hand, suppose that the number si(w) is negative or, equivalently, un(w) is in the upper range 32768 to 65535. In this case, the sign bit of w is 1. This bit, being in the first column of Table 5-4, contributes a total of 2¹⁵ to the number un(w). On the other hand, it contributes a total of -2¹⁵ to the number si(w). Since the contributions from all other columns are the same in both un(w) and si(w), subtracting out the contributions from the first column should produce equal values, that is:


		un(w) - 2^15 = si(w) - (-2^15)


		From this, a little algebra gives the two formulas:


		un(w) = si(w) + 2^16 si(w) = un(w) - 2^16


		These formulas are the key to all. We can now summarize.

Page 72


		*Integers*


		To pass a number un(w) in the range 0 to 65535 in a VB integer variable, put the number si(w) in the variable. In the other direction, if a VB integer variable receives a number and VB shows this number to be si(w), then the number passed is actually un(w). Here is the relationship between si(w) and un(w).


		For si(w) >= 0 or 0 <= un(w) <= 32767(= 2¹⁵-1):


		un(w) = si(w)


		For si(w) < 0 or 32768 <= un(w) <= 65535 (= 2¹⁶-1):


		un(w) = si(w) + 2^16 si(w) = un(w) - 2^16


		Sometimes a picture is worth a thousand words. Figure 5-2 shows the situation here. When a number lies in the range that is common to the signed and unsigned ranges (0 to 32767), then no changes are required when sending or receiving the number. To send a number in the upper unsigned range, subtract 2¹⁶ to bring it into the signed range before sending the number. When receiving a number in the lower signed range, add 2¹⁶ to get the actual number sent (in the unsigned range).


		Figure 5-2. Translating between signed and unsigned integers


		*Longs*


		Of course, the same principle applies to 32-bit longs.


		To pass a number un(w) in the range 0 to 2³²-1 in a VB long variable, put the number si(w) in the variable. In the other direction, if a VB long variable receives a number and VB shows this number to be si(w), then the number passed is actually un(w). Here is the relationship between si(w) and un(w):

Page 73


		For si(w) >= 0 or 0 <= un(w) <= 2³¹-1:


		un(w) = si(w)


		For si(w) < 0 or 2³¹ <= un(w) <= 2³²-1:


		un(w) = si(w) + 2^32 si(w) = un(w) - 2^32


		Figure 5-3 illustrates the translation process.


		Figure 5-3. Translating between signed and unsigned longs


		*Bytes*


		The situation for bytes is actually the reverse of that for integers and longs, since the VB Byte type is unsigned. The problem here occurs when the API function expects or returns a signed byte. Nevertheless, the principle is exactly the same.


		To pass a number si(w) in the range -128 to 127 in a VB byte variable, put the number un(w) in the variable. In the other direction, if a VB long variable receives a number and VB shows this number to be un(w), then the number passed is actually si(w). Here is the relationship between si(w) and un(w):


		For si(w) >= 0 or 0 <= un(w) <= 127 (=2⁷-1):


		un(w) = si(w)


		For si(w) < 0 or 128 <= un(w) <= 255 (=2⁸-1):


		un(w) = si(w) + 2^8 si(w) = un(w) - 2^8


		Figure 5-4 illustrates the translation process.

Page 74


		Figure 5-4. Translating signed and unsigned bytes


		*Examples*


		Here are some examples:


		Pass a number in the range 0 to 65535 that currently resides in a VB long variable lng to an API function with an unsigned short parameter:


		APIFunction(unsigned short param)


		with VB declaration:


		Declare APIFunction(param As Integer)


		Solution:


		Dim param As Integer ' same size as the API function's parameter If lng >= 0 And lng <= 32767 Then param = lng ElseIf lng >= 32768 And lng <= 65535 Then param = CInt(lng - 2^16) Else MsgBox "Value out of range for an unsigned short", vbCritical End If Call APIFunction(param)


		Pass a number in the range 0 to 2³²-1 that currently resides in a VB Currency variable cVar to an API function with an unsigned int or unsigned long parameter:


		APIFunction(unsigned int param)


		with VB declaration:


		Declare APIFunction(param As Long)


		Solution


		Dim param As Long ' same size as the API function's parameter If cVar >= 0 And cVar <= 2^31 - 1 Then

Page 75


		param = cVar ElseIf cVar >= 2^31 And cVar <= 2^32 - 1 Then param = CLng(cVar - 2^32) Else MsgBox "Value out of range for an unsigned int", vbCritical End If Call APIFunction(param)


		In the next example, the situation is reversed the API function expects a signed value but the VB Byte data type is unsigned.


		Pass a number in the range -128 to 127 currently residing in a VB integer variable iVar to an API function with a signed char parameter (recall that the VB Byte type is unsigned):


		APIFunction(signed char param)


		with VB declaration:


		Declare APIFunction(param As Byte)


		Solution:


		Dim param As Byte ' same size as the API function's parameter if iVar >= 0 And iVar <= 127 Then param = iVar ElseIf iVar >= -127 And iVar <= -1 Then param = CByte(iVar + 2^8) End If Call APIFunction(param)


		Receive a number in the range 0 to 65535 in a VB integer variable iVar, from an API function that has an OUT parameter of type unsigned short:


		APIFunction(unsigned short param)


		with VB declaration:


		Declare APIFunction(param As Integer)


		Solution:


		Dim lRealValue As Long ' to hold the real value passed to VB If iVar >= -32768 And iVar <= -1 Then lRealValue = CLng(iVar) + 2^16 ElseIf iVar >= 0 And iVar <= 32767 Then lRealValue = CLng(iVar) End If


		Receive a number in the range 0 to 2³²-1 in a VB long variable lVar, from an API function that has an OUT parameter of type unsigned int:


		APIFunction(unsigned int param)


		with VB declaration:


		Declare APIFunction(param As Long)


		Solution:


		Dim cRealValue As Currency ' to hold the real value passed to VB If lVar >= -2^31 And lVar <= -1 Then

Page 76


		cRealValue = CCur(lVar) + 2^32 ElseIf lVar >= 0 And lVar <= 2^31 - 1 Then cRealValue = CCur(lVar) End If


		Receive a number in the range 0 to 255 in a VB byte variable bVar, from an API function that has an OUT parameter of type unsigned char:


		APIFunction(unsigned char param)


		with VB declaration:


		Declare APIFunction(param As Byte)


		Solution:


		Dim iRealValue As Integer ' to hold the real value passed to VB If bVar >= -128 And bar <= -1 Then iRealValue = CInt(bVar) + 128 ElseIf bVar >= 0 And bVar <= 127 Then iRealValue = CInt(bVar) End If


	Converting Between Word Lengths


		Let us conclude our discussion of signed data types with the issue of converting between word lengths. This issue does not arise in API programming, so you may skip it if desired.


		To illustrate, suppose we have a number in the signed integer range -32768 to 32767 and we want to place it in a Long variable. What does VB do to the 16-bit signed representation of the number to get a 32-bit signed representation?


		If the number is positive, the answer is as expected VB just puts 16 additional 0s on the left. For instance,


		0000 0000 0000 1010 5 0000 0000 0000 0000 0000 0000 0000 1010 5


		On the other hand, what about a negative number?


		As an example, consider the negative number -32765, with signed representation:


		1000 0000 0000 0011 -32765


		Putting 16 0s on the left would produce a positive number, so this is not correct. Also, just changing the sign bit does not help the word:


		1000 0000 0000 0000 1000 0000 0000 0011


		represents:


		-2^31 + 2^15 + 2 + 1 = -2147450877


		which is certainly not -32765.

Page 77


		Suppose instead that we put 16 1s on the left, changing:


		1000 0000 0000 0011 -32765


		to:


		1111 1111 1111 1111 1000 0000 0000 0011 x


		To compute the value of x, we look at the contributions of the new bits.


		Since the original sign bit contributes -2¹⁵ to the number -32765, but now contributes 2¹⁵ to the number x, the increase in going from -32765 to x from this bit alone is:


		2 * (2^15) = 2^16


		In addition, the new 1s in positions 16 through 30 contribute an increase of:


		2^16 + 2^17 + + 2^30


		to the value of x. Finally, the 31st bit, which is the new sign bit, contributes a negative quantity -2³¹. Adding up all of the changes gives the net change in going from -32765 to x:


		2*2^15 + 2^16 + 2^17 + + 2^30 - 2^31


		Some algebra that I guess you would prefer that I omit shows that this net increase is actually 0:


		2*2^15 + 2^16 + 2^17 + + 2^30 - 2^31 = 0


		In other words, there is no change. Hence, x = -32765!


		We have shown that adding 16 1s on the left does not change the number. Put another way, to get the 32-bit signed representation of a negative number from the number's 16-bit signed representation, we just put 16 1s on the left.


		We can combine both cases (positive and negative numbers) as follows:


		To get the 32-bit signed representation of a number from the number's 16-bit signed representation, just copy the sign bit (whether it be a 0 or a 1) to the left 16 times. This process is called sign extension.