Exercises | ItaniumR Architecture for Programmers. Understanding 64-Bit Processors and EPIC Principles

1:	Suppose that a computer architecture with a 16-bit instruction word allocates a 4-bit field for opcodes and two 6-bit fields for operands. What is the maximum number of opcodes? What is the maximum number of locations that can be accessed by direct addressing?
2:	What is the maximum overall effective number of detailed opcodes, taking into account not only the major opcode field of instructions but also the template targeting of instructions to one of five kinds of specialized CPU functional units in an implementation of Itanium architecture? How many bits would have been necessary to achieve the same effect in an ordinary opcode field as more typically seen for other ISA architectures?
3:	What criterion based upon binary representation can an assembler use in order to decide whether the Itanium instruction `add` `r1=number,r3` should be expressed as an `adds` machine instruction with a 14-bit immediate constant or an `addl` machine instruction with a 22-bit immediate constant? For `addl`, what further constraint must be enforced?
4:	The Itanium instructions that add and subtract can use immediate addressing for one source operand. Discuss whether it would be possible in some hypothetical computer architecture to use immediate addressing for a destination operand. Would that be a useful or a dangerous scheme?
5:	The assembler syntax for the Itanium `shladd` instruction specifies the degree of shifting as `count2`. If this instruction follows the general pattern, the count would be encoded in a 2-bit field somewhere within the 41-bit instruction. How is it possible to encode shift amounts of 1, 2, 3, and 4 positions using only a 2-bit number?
6:	Write two different sequences of two `shladd` instructions that will compute and leave as a result in register `r15` a value that is 10 times the value presently in register `r14`. Which method would not need another intermediate scratch register and thus might be preferable?
7:	What value would a negation operation on the value 2ⁿ^-1 in an n-bit two's complement representation produce? Explain why this particular operation causes overflow.
8:	What Itanium instruction would you use, and how, if you wanted to compute the address of a given element of a byte array?
9:	Try to subtract 3 from 2 in a 3-bit two's complement representation. Can the result be represented in 3 bits, or is there an overflow condition?
10:	Investigate the first instruction bundle produced by the assembler for SQUARES, explaining fully the encoding of the specific instruction in: slot 0 slot 1
11:	Consider Itanium program A that stores its floating-point data and its integer data at separate contiguous ranges of addresses, and program B that intermingles its floating-point and integer data. Which program should execute faster? Why?
12:	Rewrite the program HEXNUM in the following ways: to use the post-increment feature of the Itanium load instruction; to handle an octal expression instead of a hexadecimal expression; and/or to handle a decimal expression (somewhat more challenging).
13:	The relationship between the effective memory access time (the average time it takes to reference a variable held in memory) for a computer using a single cache is given by t_eff = t_cache + (1 h) x t_main, where t_eff is the effective memory access time, h is the hit ratio, and t_cache and t_main are the cache and main memory access times, on the assumption that the hardware first looks in the cache and then in main memory if there is a cache miss. Using this relationship, show that the value of 0.3 t_main can be calculated for a cache that is five times faster than main memory with a hit ratio of 90%. How would things change if the hit ratio were 95%?
14:	Consider a PC with just two levels of cache. One is part of the processor and the second is external to it. Suppose that the relative performance of each level is 1 for the internal cache, 3 for the external cache, and 7 for the main memory. That is, if the internal cache operates at 10 ns, the second level would operate at 30 ns, and main memory would operate at 70 ns. Assuming a hit ratio of 90% for the first-level cache and a 95% hit ratio for the second-level cache, what is the effective access time for memory references? Generalize the formula given in exercise 13 to apply to a two-level cache as just described.
15:	Suppose that quad word addresses `0x30000` and `0x30008` initially contain hexadecimal values `0x0123456789abcdef` and `0xfedcba9876543210` and that register Gr₂₀ contains the number 1. Assume little-endian storage. Describe the final contents of the full quad words at addresses 30000 and 30008 after the following instructions have executed: `st4 {effective address 0x30000} = r20` `st4 {effective address 0x3000C} = r20`
16:	Suppose that quad word addresses `0x30000` and `0x30008` initially contain hexadecimal values `0x0123456789abcdef` and `0xfedcba9876543210`. Assume little-endian storage. State the quad word value contained in register Gr₂₀ after each of the following instructions has executed: `ld4 r20 = {effective address 0x30000}` `ld4 r20 = {effective address 0x3000C}`
17:	Compare and contrast the capabilities and limitations of the `movl` instruction and the `mov` pseudo-op for initializing a general register with an integer quantity.
18:	Write fragments of data and text segments in Itanium assembly language (not a complete program) for the transaction that must take place to the schematic database described in Section 4.5.5 when widgets are sold and the inventory must be updated. Assume that register `r32` contains the index for a type of widget (i.e., record or row number) and register `r33` contains the number of units sold.
19:	Write and test an Itanium assembly language program to subtract two 4-element vectors.
20:	Suggest why the register in the autoincrement deferred addressing mode advances 4 in the VAX version.
21:	Schematically write out an Itanium instruction sequence that would be equivalent to the processing in a CISC architecture for a source operand specified as `-(Rx)`.
22:	Write an Itanium assembly language program that will fetch two double words of data found at the symbolic memory locations `VAL1` and `VAL2` into separate general registers, negate each value, and store each double word back into its original memory location. Assign an arbitrary initial positive value to `VAL1` and a negative value to `VAL2`. Test thoroughly using the debugger.
23:	(Reference hunt) Seek out appropriate, reliable reference material that enables you to add a column in Table 4-5 for another processor architecture.