Exercises | ItaniumR Architecture for Programmers. Understanding 64-Bit Processors and EPIC Principles

1:	Why do you think that MISD has been the least-used category in Flynn's classification of computing systems?
2:	(Manual search) The PA-RISC `PERMUTE` instruction selects any combination of four 16-bit segments from a source register and places that combination in a result register. Find out which Itanium integer parallel instruction (Table 12-2) has corresponding capabilities.
3:	Suppose that registers `r14` and `r15` contain the ASCII strings `"00000000"` and `"99999999"` respectively, and that register `r16` contains an arbitrary string of eight characters. Find out whether the comments in the following code sequence are appropriate: pcmp1.gt r8 = r14,r16 // if some char < "0" or pcmp1.gt r9 = r16,r15;; // if some char > "9" then cmp.ne p6,p0 = r8,r0 // p6 = true or cmp.ne p7,p0 = r9,r0 // p7 = true; so that (p6) br.cond.spnt error // this branch executes or (p7) br.cond.spnt error;; // this branch executes Explain why two instruction groups are required here.
4:	Provide an algebraic rationalization for one of the multiplication algorithms in Section 12.3.2, citing the reference sources that you consult for understanding either the method or the Itanium instructions.
5:	Define the parity of a 64-bit value as 0 if the total number of 1 bits is even or as 1 if the total number of 1 bits is odd. Write a concise Itanium instruction sequence that computes in register `ret0` the parity of the value in register `in0`: (a) using the `popcnt` `r1` = `r3` instruction that sets register `r1` equal to the number of 1 bits found in register `r3`; or (b) using any Itanium instructions except `popcnt`.
6:	(Manual search) Describe the operation of the `czx` Itanium instructions. Discuss how they could be used in a sequence that: (a) determines the length of a null-terminated string; or (b) determines the address of the last byte of a null-terminated string.
7:	(Manual search) Floating-point register `f1` contains one copy of the constant +1.0, which enables the implementation of `fadd` and `fmpy` as pseudo-ops of the fundamental three-operand fused multiply add operation for full-width data. Show how to use the `fpack` instruction to place two copies of +1.0 into a chosen floating-point register and then explain how to derive floating-point parallel addition and subtraction as special cases of the `fpma`, `fpms`, and/or `fpnma` instructions. Explain why register `f1` would not suit this purpose.