Exercises

1:

Compare and critique the hardware support for control of flow through software routines that is provided in various ways by computer architectures. What is special about the approach taken for Itanium architecture?

2:

Defend or refute the proposition that predicate register bits in Itanium architecture are no better than condition code bits in traditional architectures (e.g., VAX).

3:

In Alpha architecture, the various sorts of relational test (equality and signed or unsigned inequality) are built into the opcodes of conditional branch instructions, and there are separate forms of branch instructions for integer and floating-point data. In Itanium architecture, the relational tests are built into the opcodes of compare instructions, and there are separate forms of compare instructions for integer and floating-point data. Can you suggest any reason why the latter may be superior?

4:

Do signed and unsigned cmp instructions behave the same if the two source operands both "look negative"? Explain why or why not.

5:

Suggest why the Itanium architecture does not have a cmp.neu instruction.

6:

One sometimes sees in a high-level language a deeply nested ifthen ifthen construct, where the THEN-block at the deepest nesting depth N contains an extensive block of contingent instructions. Show schematically what this looks like if transcribed directly into Itanium assembly language. What might be the advantage of employing some predicated unconditional compare instructions?

7:

Why should the branch instruction at the bottom of a counted loop have a "taken" hint?

8:

Explain the execution of an unconditional branch instruction whose offset value is 0.

9:

Write a program for vector subtraction that is, one that subtracts corresponding components of two lists of numbers. Store the result of U V in a third vector, X.

10:

Modify the DOTLOOP program to use a loop with a stopping condition based on an address comparison. Discuss the factors that determine whether the modified program could perform better than the original DOTLOOP on traditional and Itanium processors.

11:

Show how you would have written the comment fields of instructions in the body of the loop if you had originally expressed the DOTLOOP program like the final variant in Section 5.5.1.

12:

Write a program that counts the number of uppercase ASCII letters in a null-terminated string in memory.

13:

Write a program using the br.cloop and shladd instructions (but not pmpy2) that will multiply a number N by 97. Compare the figure of merit (latency in the loop) of your program to that of a similar program that would use a general register instead of the architectural loop count register as the counter for loop control. (You do not have to write this latter program.)

14:

What adaptations could be made in the scheme for a four-way choice in Section 5.7.2 if there is only a three-way choice to be made?

15:

Explain which letter (A, B, C, or D) should be assigned in the scheme for a four-way choice in Section 5.7.3 to the code block that is known a priori to be the most probable to execute? Why?

16:

Critique the comparative merits and drawbacks of the two methods in Sections 5.7.2 and 5.7.3 for programming a four-way mutually exclusive choice, on the assumption that all four code blocks are:

  1. quite brief; or

  2. rather lengthy.

17:

The signum function, SGN(x), is defined as +1, 0, 1 according to whether the argument x is positive with some magnitude, zero, or negative with some magnitude. Write what you believe to be the shortest and fastest possible implementation of an Itanium instruction sequence computing the signum function for x in Gr9 and the result in Gr8. You do not have to write a complete program.

18:

Write a program that finds the least value in an array of signed integers stored as double words. For loop control, use a method based on address pointer comparison.

19:

Analyze instruction latency in the loop of the MAXIMUM program.

20:

The GNU assembler rewrites the two compare instructions in the MAXIMUM program as cmp.ltu p0,p6=r14,r15 and cmp.lt p7,p0=r8,r9. Explain why.

21:

Design a routine for computing the polynomial N3 + N based on the finite differences algorithm for cubes, following these revised specifications:

  1. Provide a quad word for M, where M is the maximum number of instances to be computed.

  2. Apart from M, the program must use dynamic instead of assembly-time (re)initialization.

  3. Let label top denote the first executable instruction beyond truly global initializations.

  4. Use a central loop for the repetitive part of the algorithm. Think carefully how many times this loop should run in relation to M.

  5. Store the results in successive quad word memory locations beginning at label POLY.

  6. Design the program to take up the fewest possible bundles of executable instructions.

Test the program using the debugger with M=8. Inspect the contents of the reserved storage locations to demonstrate the validity of your program. Analyze the expected overall latency in the loop in your program.



ItaniumR Architecture for Programmers. Understanding 64-Bit Processors and EPIC Principles
ItaniumR Architecture for Programmers. Understanding 64-Bit Processors and EPIC Principles
ISBN: N/A
EAN: N/A
Year: 2003
Pages: 223

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net