際際滷

際際滷Share a Scribd company logo
INSTRUCTION LEVEL
PARALLALISM
PRESENTED BY KAMRAN ASHRAF
13-NTU-4009
INTRODUCTION
 Instruction-level parallelism (ILP) is a
measure of how many operations in a
computer program can be performed
"in-parallel" at the same time
WHAT IS A PARALLEL INSTRUCTION?
 Parallel instructions are a set of instructions that do not depend on each other
to be executed.
 Hierarchy
 Bit level Parallelism
 16 bit add on 8 bit processor
 Instruction level Parallelism
 Loop level Parallelism
 for (i=1; i<=1000; i= i+1)
x[i] = x[i] + y[i];
 Thread level Parallelism
 multi-core computers
EXAMPLE
Consider the following program:
1. e = a + b
2. f = c + d
3. g = e * f
 Operation 3 depends on the results of "e" and "f" which are calculated from operations 1 and
2, so "g" cannot be calculated until both of "e" and "f" are computed.
 However, operations 1 and 2 do not depend on any other operation, so they can be
computed simultaneously.
 If we assume that each operation can be completed in one unit of time then these three
instructions can be completed in a total of two units of time, giving an ILP factor of 3/2;
which means 3/2 = 1.5 greater than without ILP.
WHY ILP?
 One of the goals of compilers and processors designers is to use as much ILP as
possible.
 Ordinary programs are written execute instructions in sequence; one after the other, in
the order as written by programmers.
 ILP allows the compiler and the processor to overlap the execution of multiple
instructions or even to change the order in which instructions are executed.
ILP TECHNIQUES
Micro-architectural techniques that use ILP include:
 Instruction pipelining
 Superscalar
 Out-of-order execution
 Register renaming
 Speculative execution
 Branch prediction
INSTRUCTION PIPELINE
 An instruction pipeline is a technique
used in the design of modern
microprocessors, microcontrollers and
CPUs to increase their instruction
throughput (the number of instructions
that can be executed in a unit of time).
PIPELINING
 The main idea is to divide the processing of a CPU instruction
into a series of independent steps of "microinstructions with
storage at the end of each step.
 This allows the CPUs control logic to handle instructions at the
processing rate of the slowest step, which is much faster than
the time needed to process the instruction as a single step.
EXAMPLE
 For example, the RISC pipeline is broken into five stages with a set of flip flops between
each stage as follow:
 Instruction fetch
 Instruction decode & register fetch
 Execute
 Memory access
 Register write back
 The vertical axis is successive instructions, the horizontal axis is time. So in the green
column, the earliest instruction is in WB stage, and the latest instruction is undergoing
instruction fetch.
SUPERSCALER
 A superscalar CPU architecture
implements ILP inside a single processor
which allows faster CPU throughput at the
same clock rate.
WHY SUPERSCALER
 A superscalar processor executes more than one instruction during a clock
cycle
 Simultaneously dispatches multiple instructions to multiple redundant
functional units built inside the processor.
 Each functional unit is not a separate CPU core but an execution resource
inside the CPU such as an arithmetic logic unit, floating point unit (FPU), a
bit shifter, or a multiplier.
EXAMPLE
 Simple superscalar pipeline. By fetching and dispatching two instructions at a time, a
maximum of two instructions per cycle can be completed.
OUT-OF-ORDER EXECUTION
 OoOE, is a technique used in most high-
performance microprocessors.
 The key concept is to allow the processor to
avoid a class of delays that occur when the data
needed to perform an operation are unavailable.
 Most modern CPU designs include support for out
of order execution.
STEPS
 Out-of-order processors breaks up the processing of instructions into these steps:
 Instruction fetch.
 Instruction dispatch to an instruction queue (also called instruction buffer)
 The instruction waits in the queue until its input operands are available.
 The instruction is issued to the appropriate functional unit and executed by that unit.
 The results are queued (Re-order Buffer).
 Only after all older instructions have their results written back to the register file, then this
result is written back to the register.
OTHER ILP TECHNIQUES
 Register renaming which is a technique used to avoid unnecessary serialization of
program operations caused by the reuse of registers by those operations, in order to
enable out-of-order execution.
 Speculative execution which allow the execution of complete instructions or parts of
instructions before being sure whether this execution is required.
 Branch prediction which is used to avoid delays cause of control dependencies to be
resolved. Branch prediction determines whether a conditional branch (jump) in the
instruction flow of a program is likely to be taken or not.
THANKS

More Related Content

What's hot (20)

Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUB
Nusrat Mary
Instruction Set Architecture
Instruction Set ArchitectureInstruction Set Architecture
Instruction Set Architecture
Dilum Bandara
Direct memory access (dma)
Direct memory access (dma)Direct memory access (dma)
Direct memory access (dma)
Zubair Khalid
Superscalar Processor
Superscalar ProcessorSuperscalar Processor
Superscalar Processor
Manash Kumar Mondal
Instruction codes
Instruction codesInstruction codes
Instruction codes
pradeepa velmurugan
Unit 3-pipelining &amp; vector processing
Unit 3-pipelining &amp; vector processingUnit 3-pipelining &amp; vector processing
Unit 3-pipelining &amp; vector processing
vishal choudhary
CPU Scheduling Algorithms
CPU Scheduling AlgorithmsCPU Scheduling Algorithms
CPU Scheduling Algorithms
Shubhashish Punj
Branch prediction
Branch predictionBranch prediction
Branch prediction
Aneesh Raveendran
pipelining
pipeliningpipelining
pipelining
Siddique Ibrahim
Superscalar & superpipeline processor
Superscalar & superpipeline processorSuperscalar & superpipeline processor
Superscalar & superpipeline processor
Muhammad Ishaq
memory reference instruction
memory reference instructionmemory reference instruction
memory reference instruction
DeepikaT13
Computer registers
Computer registersComputer registers
Computer registers
DeepikaT13
Cache memory
Cache memoryCache memory
Cache memory
Anuj Modi
Processor Organization and Architecture
Processor Organization and ArchitectureProcessor Organization and Architecture
Processor Organization and Architecture
Vinit Raut
Control Unit Design
Control Unit DesignControl Unit Design
Control Unit Design
Vinit Raut
Modes Of Transfer in Input/Output Organization
Modes Of Transfer in Input/Output OrganizationModes Of Transfer in Input/Output Organization
Modes Of Transfer in Input/Output Organization
MOHIT AGARWAL
DMA and DMA controller
DMA and DMA controllerDMA and DMA controller
DMA and DMA controller
nishant upadhyay
Multi processor scheduling
Multi  processor schedulingMulti  processor scheduling
Multi processor scheduling
Shashank Kapoor
Instruction format
Instruction formatInstruction format
Instruction format
Sanjeev Patel
Parallelism
ParallelismParallelism
Parallelism
Md Raseduzzaman
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUB
Nusrat Mary
Instruction Set Architecture
Instruction Set ArchitectureInstruction Set Architecture
Instruction Set Architecture
Dilum Bandara
Direct memory access (dma)
Direct memory access (dma)Direct memory access (dma)
Direct memory access (dma)
Zubair Khalid
Unit 3-pipelining &amp; vector processing
Unit 3-pipelining &amp; vector processingUnit 3-pipelining &amp; vector processing
Unit 3-pipelining &amp; vector processing
vishal choudhary
CPU Scheduling Algorithms
CPU Scheduling AlgorithmsCPU Scheduling Algorithms
CPU Scheduling Algorithms
Shubhashish Punj
Superscalar & superpipeline processor
Superscalar & superpipeline processorSuperscalar & superpipeline processor
Superscalar & superpipeline processor
Muhammad Ishaq
memory reference instruction
memory reference instructionmemory reference instruction
memory reference instruction
DeepikaT13
Computer registers
Computer registersComputer registers
Computer registers
DeepikaT13
Cache memory
Cache memoryCache memory
Cache memory
Anuj Modi
Processor Organization and Architecture
Processor Organization and ArchitectureProcessor Organization and Architecture
Processor Organization and Architecture
Vinit Raut
Control Unit Design
Control Unit DesignControl Unit Design
Control Unit Design
Vinit Raut
Modes Of Transfer in Input/Output Organization
Modes Of Transfer in Input/Output OrganizationModes Of Transfer in Input/Output Organization
Modes Of Transfer in Input/Output Organization
MOHIT AGARWAL
Multi processor scheduling
Multi  processor schedulingMulti  processor scheduling
Multi processor scheduling
Shashank Kapoor
Instruction format
Instruction formatInstruction format
Instruction format
Sanjeev Patel

Similar to INSTRUCTION LEVEL PARALLALISM (20)

The AVR Pipelining explanation detailed.pdf
The AVR Pipelining explanation detailed.pdfThe AVR Pipelining explanation detailed.pdf
The AVR Pipelining explanation detailed.pdf
KSRaviKumarMVGREEE
Lecture-9 Parallel-processing .pptx
Lecture-9 Parallel-processing      .pptxLecture-9 Parallel-processing      .pptx
Lecture-9 Parallel-processing .pptx
hammadtahsan
Pipelining 16 computers Artitacher pdf
Pipelining   16 computers Artitacher  pdfPipelining   16 computers Artitacher  pdf
Pipelining 16 computers Artitacher pdf
MadhuGupta99385
complete DLD.pptxbjngjjgujjhhujhhhuujhguh
complete DLD.pptxbjngjjgujjhhujhhhuujhguhcomplete DLD.pptxbjngjjgujjhhujhhhuujhguh
complete DLD.pptxbjngjjgujjhhujhhhuujhguh
kashafishfaq21
Assembly p1
Assembly p1Assembly p1
Assembly p1
raja khizar
pipelining
pipeliningpipelining
pipelining
sudhir saurav
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementation
kavitha2009
MIPS IMPLEMENTATION.pptx
MIPS IMPLEMENTATION.pptxMIPS IMPLEMENTATION.pptx
MIPS IMPLEMENTATION.pptx
JEEVANANTHAMG6
pipelining
pipeliningpipelining
pipelining
Sadaf Rasheed
Chapter 3
Chapter 3Chapter 3
Chapter 3
Rozase Patel
Pipelining , structural hazards
Pipelining , structural hazardsPipelining , structural hazards
Pipelining , structural hazards
Munaam Munawar
Pipeline & Nonpipeline Processor
Pipeline & Nonpipeline ProcessorPipeline & Nonpipeline Processor
Pipeline & Nonpipeline Processor
Smit Shah
Unit 5-lecture 5
Unit 5-lecture 5Unit 5-lecture 5
Unit 5-lecture 5
vishal choudhary
What is simultaneous multithreading
What is simultaneous multithreadingWhat is simultaneous multithreading
What is simultaneous multithreading
Fraboni Ec
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIA
Aiman Hud
Design pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesDesign pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelines
Mahmudul Hasan
Basic structure of computers by aniket bhute
Basic structure of computers by aniket bhuteBasic structure of computers by aniket bhute
Basic structure of computers by aniket bhute
Aniket Bhute
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
IDES Editor
Pipeline Computing by S. M. Risalat Hasan Chowdhury
Pipeline Computing by S. M. Risalat Hasan ChowdhuryPipeline Computing by S. M. Risalat Hasan Chowdhury
Pipeline Computing by S. M. Risalat Hasan Chowdhury
S. M. Risalat Hasan Chowdhury
Debate on RISC-CISC
Debate on RISC-CISCDebate on RISC-CISC
Debate on RISC-CISC
kollatiMeenakshi
The AVR Pipelining explanation detailed.pdf
The AVR Pipelining explanation detailed.pdfThe AVR Pipelining explanation detailed.pdf
The AVR Pipelining explanation detailed.pdf
KSRaviKumarMVGREEE
Lecture-9 Parallel-processing .pptx
Lecture-9 Parallel-processing      .pptxLecture-9 Parallel-processing      .pptx
Lecture-9 Parallel-processing .pptx
hammadtahsan
Pipelining 16 computers Artitacher pdf
Pipelining   16 computers Artitacher  pdfPipelining   16 computers Artitacher  pdf
Pipelining 16 computers Artitacher pdf
MadhuGupta99385
complete DLD.pptxbjngjjgujjhhujhhhuujhguh
complete DLD.pptxbjngjjgujjhhujhhhuujhguhcomplete DLD.pptxbjngjjgujjhhujhhhuujhguh
complete DLD.pptxbjngjjgujjhhujhhhuujhguh
kashafishfaq21
Basic MIPS implementation
Basic MIPS implementationBasic MIPS implementation
Basic MIPS implementation
kavitha2009
MIPS IMPLEMENTATION.pptx
MIPS IMPLEMENTATION.pptxMIPS IMPLEMENTATION.pptx
MIPS IMPLEMENTATION.pptx
JEEVANANTHAMG6
Pipelining , structural hazards
Pipelining , structural hazardsPipelining , structural hazards
Pipelining , structural hazards
Munaam Munawar
Pipeline & Nonpipeline Processor
Pipeline & Nonpipeline ProcessorPipeline & Nonpipeline Processor
Pipeline & Nonpipeline Processor
Smit Shah
What is simultaneous multithreading
What is simultaneous multithreadingWhat is simultaneous multithreading
What is simultaneous multithreading
Fraboni Ec
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIA
Aiman Hud
Design pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelinesDesign pipeline architecture for various stage pipelines
Design pipeline architecture for various stage pipelines
Mahmudul Hasan
Basic structure of computers by aniket bhute
Basic structure of computers by aniket bhuteBasic structure of computers by aniket bhute
Basic structure of computers by aniket bhute
Aniket Bhute
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
Implementing True Zero Cycle Branching in Scalar and Superscalar Pipelined Pr...
IDES Editor
Pipeline Computing by S. M. Risalat Hasan Chowdhury
Pipeline Computing by S. M. Risalat Hasan ChowdhuryPipeline Computing by S. M. Risalat Hasan Chowdhury
Pipeline Computing by S. M. Risalat Hasan Chowdhury
S. M. Risalat Hasan Chowdhury

More from Kamran Ashraf (6)

The Maximum Subarray Problem
The Maximum Subarray ProblemThe Maximum Subarray Problem
The Maximum Subarray Problem
Kamran Ashraf
Ubiquitous Computing
Ubiquitous ComputingUbiquitous Computing
Ubiquitous Computing
Kamran Ashraf
Application programming interface sockets
Application programming interface socketsApplication programming interface sockets
Application programming interface sockets
Kamran Ashraf
Error Detection types
Error Detection typesError Detection types
Error Detection types
Kamran Ashraf
VIRTUAL MEMORY
VIRTUAL MEMORYVIRTUAL MEMORY
VIRTUAL MEMORY
Kamran Ashraf
Graphic Processing Unit
Graphic Processing UnitGraphic Processing Unit
Graphic Processing Unit
Kamran Ashraf
The Maximum Subarray Problem
The Maximum Subarray ProblemThe Maximum Subarray Problem
The Maximum Subarray Problem
Kamran Ashraf
Ubiquitous Computing
Ubiquitous ComputingUbiquitous Computing
Ubiquitous Computing
Kamran Ashraf
Application programming interface sockets
Application programming interface socketsApplication programming interface sockets
Application programming interface sockets
Kamran Ashraf
Error Detection types
Error Detection typesError Detection types
Error Detection types
Kamran Ashraf
Graphic Processing Unit
Graphic Processing UnitGraphic Processing Unit
Graphic Processing Unit
Kamran Ashraf

INSTRUCTION LEVEL PARALLALISM

  • 1. INSTRUCTION LEVEL PARALLALISM PRESENTED BY KAMRAN ASHRAF 13-NTU-4009
  • 2. INTRODUCTION Instruction-level parallelism (ILP) is a measure of how many operations in a computer program can be performed "in-parallel" at the same time
  • 3. WHAT IS A PARALLEL INSTRUCTION? Parallel instructions are a set of instructions that do not depend on each other to be executed. Hierarchy Bit level Parallelism 16 bit add on 8 bit processor Instruction level Parallelism Loop level Parallelism for (i=1; i<=1000; i= i+1) x[i] = x[i] + y[i]; Thread level Parallelism multi-core computers
  • 4. EXAMPLE Consider the following program: 1. e = a + b 2. f = c + d 3. g = e * f Operation 3 depends on the results of "e" and "f" which are calculated from operations 1 and 2, so "g" cannot be calculated until both of "e" and "f" are computed. However, operations 1 and 2 do not depend on any other operation, so they can be computed simultaneously. If we assume that each operation can be completed in one unit of time then these three instructions can be completed in a total of two units of time, giving an ILP factor of 3/2; which means 3/2 = 1.5 greater than without ILP.
  • 5. WHY ILP? One of the goals of compilers and processors designers is to use as much ILP as possible. Ordinary programs are written execute instructions in sequence; one after the other, in the order as written by programmers. ILP allows the compiler and the processor to overlap the execution of multiple instructions or even to change the order in which instructions are executed.
  • 6. ILP TECHNIQUES Micro-architectural techniques that use ILP include: Instruction pipelining Superscalar Out-of-order execution Register renaming Speculative execution Branch prediction
  • 7. INSTRUCTION PIPELINE An instruction pipeline is a technique used in the design of modern microprocessors, microcontrollers and CPUs to increase their instruction throughput (the number of instructions that can be executed in a unit of time).
  • 8. PIPELINING The main idea is to divide the processing of a CPU instruction into a series of independent steps of "microinstructions with storage at the end of each step. This allows the CPUs control logic to handle instructions at the processing rate of the slowest step, which is much faster than the time needed to process the instruction as a single step.
  • 9. EXAMPLE For example, the RISC pipeline is broken into five stages with a set of flip flops between each stage as follow: Instruction fetch Instruction decode & register fetch Execute Memory access Register write back The vertical axis is successive instructions, the horizontal axis is time. So in the green column, the earliest instruction is in WB stage, and the latest instruction is undergoing instruction fetch.
  • 10. SUPERSCALER A superscalar CPU architecture implements ILP inside a single processor which allows faster CPU throughput at the same clock rate.
  • 11. WHY SUPERSCALER A superscalar processor executes more than one instruction during a clock cycle Simultaneously dispatches multiple instructions to multiple redundant functional units built inside the processor. Each functional unit is not a separate CPU core but an execution resource inside the CPU such as an arithmetic logic unit, floating point unit (FPU), a bit shifter, or a multiplier.
  • 12. EXAMPLE Simple superscalar pipeline. By fetching and dispatching two instructions at a time, a maximum of two instructions per cycle can be completed.
  • 13. OUT-OF-ORDER EXECUTION OoOE, is a technique used in most high- performance microprocessors. The key concept is to allow the processor to avoid a class of delays that occur when the data needed to perform an operation are unavailable. Most modern CPU designs include support for out of order execution.
  • 14. STEPS Out-of-order processors breaks up the processing of instructions into these steps: Instruction fetch. Instruction dispatch to an instruction queue (also called instruction buffer) The instruction waits in the queue until its input operands are available. The instruction is issued to the appropriate functional unit and executed by that unit. The results are queued (Re-order Buffer). Only after all older instructions have their results written back to the register file, then this result is written back to the register.
  • 15. OTHER ILP TECHNIQUES Register renaming which is a technique used to avoid unnecessary serialization of program operations caused by the reuse of registers by those operations, in order to enable out-of-order execution. Speculative execution which allow the execution of complete instructions or parts of instructions before being sure whether this execution is required. Branch prediction which is used to avoid delays cause of control dependencies to be resolved. Branch prediction determines whether a conditional branch (jump) in the instruction flow of a program is likely to be taken or not.