�ݺ�ߣ

• A RISC style, register-register instruction set architecture.
•Supports compiler-based exploitation of ILP.
•Includes variety of features to improve performance, such as
register windows and a rotating floating-point register (FPR) stack.

• 128 64-bit general purpose registers, GPRs (shortly 65 bits wide)
• 128 82-bit floating point registers (FPRs), which has 2 extra exponent
bits over the standard 80-bit IEEE format
• 64 1-bit predicate registers
• 8 64-bit branch registers
• Variety of registers for system control, memory mapping, performance
counters, communication with OS

 Integer registers help in procedural calls using a register stack mechanism.
Registers 32-128 are used as the register stack and each procedure is allocated
registers 0-96 for its use.
 The Current Frame Pointer (CFM) points to the set of registers to be used by a
procedure. The frame has 2 parts,
Local area, for local storage
Output area, to pass values to any called procedure
The alloc instruction specifies the size of these areas.
 On a procedural call, the CFM is updated so that the R32 of the called procedure
points to the first register of the output area of the called procedure. This update
enables passing the parameters from the caller to the callee. The callee executes the
alloc instruction to allocate the required no.of local and output registers.

Special load and store instructions are available for saving and restoring the register
stack, and special hardware called register stack engine handles overflow of the register
stack.
Three other sets of registers,
Floating point registers : Hold floating point data
Branch registers : Hold the branch destination address for indirect branches
Predicate registers : Hold predicates, which control the execution of
predicate instructions
 Both integer and FPRs support register rotation for registers 32 to 128.
 Register rotation eases the task of allocating registers in software-pipelined loops.
When combined with predication, it is possible to avoid the need for unrolling and
for separate prologue and epilogue code for software-pipelined code.

IA -64 uses two different concepts to achieve the benefits of implicit
parallelism and the ease of instruction decode.
Implicit parallelism is achieved by placing instructions into instruction
groups.
The fixed formatting of multiple instructions is achieved using the
concept of bundle, which has three instructions.

• A sequence of consecutive instructions with o register data dependences
among them.
• All the instructions in a group are executed in parallel, if sufficient
hardware resources and existed and if any dependences through memory
were preserved.
• Can be long, but the compiler must specify the boundary between two
instruction groups.
• This boundary is indicated by placing a stop between two instructions that
support of different groups.

• IA-64 instructions are encoded into bundles which are 128-bits wide.
• Each bundle consists of a 5-bit template field and three instructions, each 41 bits
in length.
• The template field of a bundle specifies what type of execution units each
instruction in the bundle requires. It describes the presence of any stops associated
with the bundle and the execution unit type.
Execution unit Instruction type Instruction
I A Integer ALU
I Non-ALU integer
M M Memory access
F F Floating point
B B Branches
L+X L+X Extended

• Five primary instruction classes, A, I, M, F, B
• Each IA instruction is 41 bits in length.
• The higher order 4 bits , together with the bundle bits that specify the
execution unit slot, are used as the major opcode.
• The lower-order 6 bits of every instruction are used for specifying the
predicate register that guards the instruction.

•An instruction predicated by specifying a predicate register, whose identity is
placed in the lower 6 bits of each instruction field.
•Predicate registers are set by compare and test instructions. A compare
instruction specifies one of ten different comparison tests and two predicate
registers as destination. The two predicate registers are written with the result of
the comparison or some logical function that combines two tests.
•This capability allows multiple comparisons to be supported in one instruction.

• Supported by providing NAT (Not A Thing) bits for GPRs and NATV bits (Not A
Thing Value) for FPRs.
• A deferred exception is resolved in two ways.
1. If a non-speculative instruction receives a NAT/ NATVal as a source operand, it
generates an immediate and unrecoverable exception.
2. A chk.s instruction can be used to detect the presence of NAT or NATVal and
branch to a routine designed by the compiler to recover from the speculative
operation.
• Memory reference support in IA-64 uses a concept called advanced load. An
advanced load is a load that has been speculatively moved above store on which it is
dependent. To speculatively perform a load, ld.a instruction.
• Executing this instruction creates an entry in a special table called ALAT.

• The ALAT stores both the register destination of the load and the address of
the accessed memory location. When a store is executed, an associative lookup
against the active ALAT entries is performed.
• Before any non-speculative instruction uses the value generated by an
advanced load or a value derived from the result of an advanced load, an
explicit check is required.
• The check specifies the destination register of the advanced load. If the
ALAT for that register is still valid, the speculation was legal and the only
effect of the check is to clear the ALAT entry. If the check fails, the action
depends on which of two different types of checks was employed (ld.c/chk.a).

�ݺ�ߣ

Intel IA 64

More Related Content

Intel IA 64