際際滷

際際滷Share a Scribd company logo
CS 465
Computer Architecture
Fall 2009
Lecture 01: Introduction
Daniel Barbar叩 ( cs.gmu.edu/~dbarbara)
[Adapted from Computer Organization and Design,
Patterson & Hennessy, 息 2005, UCB]
Course Administration
 Instructor: Daniel Barbar叩
dbarbara@gmu.edu
4420 Eng. Bldg.
 Text: Required: Computer Organization & Design 
The Hardware Software Interface, Patterson &
Hennessy, the 4th Edition
Grading Information
 Grade determinates
 Midterm Exam ~25%
 Final Exam 1 ~35%
 Homeworks ~40%
- Due at the beginning of class (or, if its code to be submitted
electronically, by 17:00 on the due date). No late assignments
will be accepted.
 Course prerequisites
 grade of C or better in CS 367
Acknowledgements
 際際滷s adopted from Dr. Zhong
 Contributions from Dr. Setia
 際際滷s also adopt materials from many other universities
 IMPORTANT:
- 際際滷s are not intended as replacement for the text
- You spent the money on the book, please read it!
Course Topics (Tentative)
 Instruction set architecture (Chapter 2)
 MIPS
 Arithmetic operations & data (Chapter 3)
 System performance (Chapter 4)
 Processor (Chapter 5)
 Datapath and control
 Pipelining to improve performance (Chapter 6)
 Memory hierarchy (Chapter 7)
 I/O (Chapter 8)
Focus of the Course
 How computers work
 MIPS instruction set architecture
 The implementation of MIPS instruction set architecture  MIPS
processor design
 Issues affecting modern processors
 Pipelining  processor performance improvement
 Cache  memory system, I/O systems
Why Learn Computer Architecture?
 You want to call yourself a computer scientist
 Computer architecture impacts every other aspect of computer science
 You need to make a purchasing decision or offer expert advice
 You want to build software people use  sell many, many copies-
(need performance)
 Both hardware and software affect performance
- Algorithm determines number of source-level statements
- Language/compiler/architecture determine machine instructions (Chapter 2
and 3)
- Processor/memory determine how fast instructions are executed (Chapter 5,
6, and 7)
- Assessing and understanding performance(Chapter 4)
Outline Today
 Course logistics
 Computer architectures overview
 Trends in computer architectures
Computer Systems
 Software
 Application software  Word Processors, Email, Internet
Browsers, Games
 Systems software  Compilers, Operating Systems
 Hardware
 CPU
 Memory
 I/O devices (mouse, keyboard, display, disks, networks,..)
Operating
systems
Applications
software
laTE X
Virtual
memory
File
system
I/O device
drivers
Assemblers
as
Compilers
gcc
Systems
software
Software
Software
D.Barbar叩
instruction set
software
hardware
Instruction Set Architecture
 One of the most important abstractions is ISA
 A critical interface between HW and SW
 Example: MIPS
 Desired properties
 Convenience (from software side)
 Efficiency (from hardware side)
D.Barbar叩
What is Computer Architecture
 Programmers view: a pleasant environment
 Operating systems view: a set of resources (hw
& sw)
 System architecture view: a set of components
 Compilers view: an instruction set architecture
with OS help
 Microprocessor architecture view: a set of
functional units
 VLSI designers view: a set of transistors
implementing logic
 Mechanical engineers view: a heater!
D.Barbar叩
What is Computer Architecture
 Patterson & Hennessy: Computer
architecture = Instruction set architecture
+ Machine organization + Hardware
 For this course, computer architecture
mainly refers to ISA (Instruction Set
Architecture)
 Programmer-visible, serves as the boundary
between the software and hardware
 Modern ISA examples: MIPS, SPARC,
PowerPC, DEC Alpha
D.Barbar叩
Organization and Hardware
 Organization: high-level aspects of a computers
design
 Principal components: memory, CPU, I/O, 
 How components are interconnected
 How information flows between components
 E.g. AMD Opteron 64 and Intel Pentium 4: same ISA
but different organizations
 Hardware: detailed logic design and the
packaging technology of a computer
 E.g. Pentium 4 and Mobile Pentium 4: nearly identical
organizations but different hardware details
Types of computers and their applications
 Desktop
 Run third-party software
 Office to home applications
 30 years old
 Servers
 Modern version of what used to be called mainframes,
minicomputers and supercomputers
 Large workloads
 Built using the same technology in desktops but higher capacity
- Expandable
- Scalable
- Reliable
 Large spectrum: from low-end (file storage, small businesses) to
supercomputers (high end scientific and engineering
applications)
- Gigabytes to Terabytes to Petabytes of storage
 Examples: file servers, web servers, database servers
Types of computers
 Embedded
 Microprocessors everywhere! (washing machines, cell phones,
automobiles, video games)
 Run one or a few applications
 Specialized hardware integrated with the application (not your
common processor)
 Usually stringent limitations (battery power)
 High tolerance for failure (dont want your airplane avionics to
fail!)
 Becoming ubiquitous
 Engineered using processor cores
- The core allows the engineer to integrate other functions into the
processor for fabrication on the same chip
- Using hardware description languages: Verilog, VHDL
Where is the Market?
290
93
3
488
114
3
892
135
4
862
129
4
1122
131
5
0
200
400
600
800
1000
1200
1998 1999 2000 2001 2002
Embedded
Desktop
Servers
Millions
of
Computers
In this class you will learn
 How programs written in a high-level language (e.g.,
Java) translate into the language of the hardware and
how the hardware executes them.
 The interface between software and hardware and how
software instructs hardware to perform the needed
functions.
 The factors that determine the performance of a program
 The techniques that hardware designers employ to
improve performance.
As a consequence, you will understand what features may
make one computer design better than another for a
particular application
High-level to Machine Language
High-level language program
(in C)
Assembly language program
(for MIPS)
Binary machine language program
(for MIPS)
Compiler
Assembler
Evolution
 In the beginning there were only bits and people spent
countless hours trying to program in machine language
01100011001 011001110100
 Finally before everybody went insane, the assembler
was invented: write in mnemonics called assembly
language and let the assembler translate (a one to one
translation)
Add A,B
 This wasnt for everybody, obviously (imagine how
modern applications would have been possible in
assembly), so high-level language were born (and with
them compilers to translate to assembly, a many-to-one
translation)
C= A*(SQRT(B)+3.0)
THE BIG IDEA
 Levels of abstraction: each layer provides its own
(simplified) view and hides the details of the next.
Instruction Set Architecture (ISA)
 ISA: An abstract interface between the hardware and the
lowest level software of a machine that encompasses all
the information necessary to write a machine language
program that will run correctly, including instructions,
registers, memory access, I/O, and so on.
... the attributes of a [computing] system as seen by the
programmer, i.e., the conceptual structure and functional
behavior, as distinct from the organization of the data flows and
controls, the logic design, and the physical implementation.
 Amdahl, Blaauw, and Brooks, 1964
 Enables implementations of varying cost and performance to run
identical software
 ABI (application binary interface): The user portion of the
instruction set plus the operating system interfaces used
by application programmers. Defines a standard for
binary portability across computers.
ISA Type Sales
0
200
400
600
800
1000
1200
1400
1998 1999 2000 2001 2002
Other
SPARC
Hitachi SH
PowerPC
Motorola 68K
MIPS
IA-32
ARM
PowerPoint comic bar chart with approximate values (see
text for correct values)
Millions
of
Processor
Organization of a computer
Anatomy of Computer
Personal Computer
Processor
Computer
Control
(brain)
Datapath
(brawn)
Memory
(where
programs,
data
live when
running)
Devices
Input
Output
Keyboard,
Mouse
Display,
Printer
Disk
(where
programs,
data
live when
not running)
5 classic components
 Datapath: performs arithmetic operation
 Control: guides the operation of other components based on the user
instructions
PC Motherboard Closeup
Inside the Pentium 4
Moores Law
 In 1965, Gordon Moore predicted that the number of
transistors that can be integrated on a die would double
every 18 to 24 months (i.e., grow exponentially with
time).
 Amazingly visionary  million transistor/chip barrier was
crossed in the 1980s.
 2300 transistors, 1 MHz clock (Intel 4004) - 1971
 16 Million transistors (Ultra Sparc III)
 42 Million transistors, 2 GHz clock (Intel Xeon)  2001
 55 Million transistors, 3 GHz, 130nm technology, 250mm2 die
(Intel Pentium 4) - 2004
 140 Million transistor (HP PA-8500)
Processor Performance Increase
1
10
100
1000
10000
1987 1989 1991 1993 1995 1997 1999 2001 2003
Year
Performance
(SPEC
Int)
SUN-4/260 MIPS M/120
MIPS M2000
IBM RS6000
HP 9000/750
DEC AXP/500 IBM POWER 100
DEC Alpha 4/266
DEC Alpha 5/500
DEC Alpha 21264/600
DEC Alpha 5/300
DEC Alpha 21264A/667
Intel Xeon/2000
Intel Pentium 4/3000
Year
Transistors
1000
10000
100000
1000000
10000000
100000000
1970 1975 1980 1985 1990 1995 2000
i80386
i4004
i8080
Pentium
i80486
i80286
i8086
CMOS improvements:
 Die size: 2X every 3 yrs
 Line width: halve / 7 yrs
Itanium II: 241 million
Pentium 4: 55 million
Alpha 21264: 15 million
Pentium Pro: 5.5 million
PowerPC 620: 6.9 million
Alpha 21164: 9.3 million
Sparc Ultra: 5.2 million
Moores Law
Trend: Microprocessor Capacity
Moores Law
 Cramming More Components onto Integrated Circuits
 Gordon Moore, Electronics, 1965
 # of transistors per cost-effective integrated circuit doubles every 18 months
Transistor capacity doubles every 18-24 months
Speed 2x / 1.5 years (since 85);
100X performance in last decade
Trend: Microprocessor Performance
Memory
 Dynamic Random Access Memory (DRAM)
 The choice for main memory
 Volatile (contents go away when power is lost)
 Fast
 Relatively small
 DRAM capacity: 2x / 2 years (since 96);
64x size improvement in last decade
 Static Random Access Memory (SRAM)
 The choice for cache
 Much faster than DRAM, but less dense and more costly
 Magnetic disks
 The choice for secondary memory
 Non-volatile
 Slower
 Relatively large
 Capacity: 2x / 1 year (since 97)
250X size in last decade
 Solid state (Flash) memory
 The choice for embedded computers
 Non-volatile
Memory
 Optical disks
 Removable, therefore very large
 Slower than disks
 Magnetic tape
 Even slower
 Sequential (non-random) access
 The choice for archival
DRAM Capacity Growth
10
100
1000
10000
100000
1000000
1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002
Year of introduction
Kbit
capacity
16K
64K
256K
1M
4M
16M
64M
128M
256M
512M
Trend: Memory Capacity
size
Year
Bits
1000
10000
100000
1000000
10000000
100000000
1000000000
1970 1975 1980 1985 1990 1995 2000
year size (Mbit)
1980 0.0625
1983 0.25
1986 1
1989 4
1992 16
1996 64
1998 128
2000 256
2002 512
2006 2048
 Now 1.4X/yr, or 2X every 2 years.
 more than 10000X since 1980!
Growth of capacity per chip
(Kilo, Mega, Giga, Tera, Peta, Exa, Zetta, Yotta = 1024)
Come up with a clever mnemonic, fame!
Dramatic Technology Change
 State-of-the-art PC when you graduate:
(at least)
 Processor clock speed: 5000 MegaHertz
(5.0 GigaHertz)
 Memory capacity: 4000 MegaBytes
(4.0 GigaBytes)
 Disk capacity: 2000 GigaBytes
(2.0 TeraBytes)
 New units! Mega => Giga, Giga => Tera
Example Machine Organization
 Workstation design target
 25% of cost on processor
 25% of cost on memory (minimum memory size)
 Rest on I/O devices, power supplies, box
CPU
Computer
Control
Datapath
Memory Devices
Input
Output
MIPS R3000 Instruction Set Architecture
 Instruction Categories
 Load/Store
 Computational
 Jump and Branch
 Floating Point
- coprocessor
 Memory Management
 Special
R0 - R31
PC
HI
LO
OP
OP
OP
rs rt rd sa funct
rs rt immediate
jump target
3 Instruction Formats: all 32 bits wide
Registers
Defining Performance
 Which airplane has the best performance?
0 100 200 300 400 500
Douglas
DC-8-50
BAC/Sud
Concorde
Boeing 747
Boeing 777
Passenger Capacity
0 2000 4000 6000 8000 10000
Douglas DC-
8-50
BAC/Sud
Concorde
Boeing 747
Boeing 777
Cruising Range (miles)
0 500 1000 1500
Douglas
DC-8-50
BAC/Sud
Concorde
Boeing 747
Boeing 777
Cruising Speed (mph)
0 100000 200000 300000 400000
Douglas DC-
8-50
BAC/Sud
Concorde
Boeing 747
Boeing 777
Passengers x mph
則1.4
Performance
Response Time and Throughput
 Response time
 How long it takes to do a task
 Throughput
 Total work done per unit time
- e.g., tasks/transactions/ per hour
 How are response time and throughput affected by
 Replacing the processor with a faster version?
 Adding more processors?
 Well focus on response time for now
Relative Performance
 Define Performance = 1/Execution Time
 X is n time faster than Y
n

 X
Y
Y
X
time
Execution
time
Execution
e
Performanc
e
Performanc
 Example: time taken to run a program
 10s on A, 15s on B
 Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
 So A is 1.5 times faster than B
Measuring Execution Time
 Elapsed time
 Total response time, including all aspects
- Processing, I/O, OS overhead, idle time
 Determines system performance
 CPU time
 Time spent processing a given job
- Discounts I/O time, other jobs shares
 Comprises user CPU time and system CPU time
 Different programs are affected differently by CPU and system
performance
CPU Clocking
 Operation of digital hardware governed by a constant-rate clock
Clock (cycles)
Data transfer
and computation
Update state
Clock period
 Clock period: duration of a clock cycle
 e.g., 250ps = 0.25ns = 2501012s
 Clock frequency (rate): cycles per second
 e.g., 4.0GHz = 4000MHz = 4.0109Hz
CPU Time
 Performance improved by
 Reducing number of clock cycles
 Increasing clock rate
 Hardware designer must often trade off clock rate against cycle
count
Rate
Clock
Cycles
Clock
CPU
Time
Cycle
Clock
Cycles
Clock
CPU
Time
CPU
CPU Time Example
 Computer A: 2GHz clock, 10s CPU time
 Designing Computer B
 Aim for 6s CPU time
 Can do faster clock, but causes 1.2  clock cycles
 How fast must Computer B clock be?
4GHz
6s
10
24
6s
10
20
1.2
Rate
Clock
10
20
2GHz
10s
Rate
Clock
Time
CPU
Cycles
Clock
6s
Cycles
Clock
1.2
Time
CPU
Cycles
Clock
Rate
Clock
9
9
B
9
A
A
A
A
B
B
B
Instruction Count and CPI
 Instruction Count for a program
 Determined by program, ISA and compiler
 Average cycles per instruction
 Determined by CPU hardware
 If different instructions have different CPI
- Average CPI affected by instruction mix
Rate
Clock
CPI
Count
n
Instructio
Time
Cycle
Clock
CPI
Count
n
Instructio
Time
CPU
n
Instructio
per
Cycles
Count
n
Instructio
Cycles
Clock
CPI Example
 Computer A: Cycle Time = 250ps, CPI = 2.0
 Computer B: Cycle Time = 500ps, CPI = 1.2
 Same ISA
 Which is faster, and by how much?
1.2
500ps
I
600ps
I
A
Time
CPU
B
Time
CPU
600ps
I
500ps
1.2
I
B
Time
Cycle
B
CPI
Count
n
Instructio
B
Time
CPU
500ps
I
250ps
2.0
I
A
Time
Cycle
A
CPI
Count
n
Instructio
A
Time
CPU




















A is faster
by this much
CPI in More Detail
 If different instruction classes take different numbers of
cycles




n
1
i
i
i )
Count
n
Instructio
(CPI
Cycles
Clock
 Weighted average CPI











n
1
i
i
i
Count
n
Instructio
Count
n
Instructio
CPI
Count
n
Instructio
Cycles
Clock
CPI
Relative frequency
CPI Example
 Alternative compiled code sequences using instructions in classes A,
B, C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
 Sequence 1: IC = 5
 Clock Cycles
= 21 + 12 + 23
= 10
 Avg. CPI = 10/5 = 2.0
 Sequence 2: IC = 6
 Clock Cycles
= 41 + 12 + 13
= 9
 Avg. CPI = 9/6 = 1.5
Performance Summary
 Performance depends on
 Algorithm: affects IC, possibly CPI
 Programming language: affects IC, CPI
 Compiler: affects IC, CPI
 Instruction set architecture: affects IC, CPI, Tc
The BIG Picture
cycle
Clock
Seconds
n
Instructio
cycles
Clock
Program
ns
Instructio
Time
CPU
Power Trends
 In CMOS IC technology
則1.5
The
Power
Wall
Frequency
Voltage
load
Capacitive
Power 2



1000
30 5V  1V
Reducing Power
 Suppose a new CPU has
 85% of capacitive load of old CPU
 15% voltage and 15% frequency reduction
0.52
0.85
F
V
C
0.85
F
0.85)
(V
0.85
C
P
P 4
old
2
old
old
old
2
old
old
old
new










 The power wall
 We cant reduce voltage further
 We cant remove more heat
 How else can we improve performance?
Uniprocessor Performance
則1.6
The
Sea
Change:
The
Switch
to
Multiprocessors
Constrained by power, instruction-level parallelism,
memory latency
Multiprocessors
 Multicore microprocessors
 More than one processor per chip
 Requires explicitly parallel programming
 Compare with instruction level parallelism
- Hardware executes multiple instructions at once
- Hidden from the programmer
 Hard to do
- Programming for performance
- Load balancing
- Optimizing communication and synchronization
SPEC CPU Benchmark
 Programs used to measure performance
 Supposedly typical of actual workload
 Standard Performance Evaluation Corp (SPEC)
 Develops benchmarks for CPU, I/O, Web, 
 SPEC CPU2006
 Elapsed time to execute a selection of programs
- Negligible I/O, so focuses on CPU performance
 Normalize relative to reference machine
 Summarize as geometric mean of performance ratios
- CINT2006 (integer) and CFP2006 (floating-point)
n
n
1
i
i
ratio
time
Execution
CINT2006 for Opteron X4 2356
Name Description IC109 CPI Tc (ns) Exec time Ref time SPECratio
perl Interpreted string processing 2,118 0.75 0.40 637 9,777 15.3
bzip2 Block-sorting compression 2,389 0.85 0.40 817 9,650 11.8
gcc GNU C Compiler 1,050 1.72 0.47 24 8,050 11.1
mcf Combinatorial optimization 336 10.00 0.40 1,345 9,120 6.8
go Go game (AI) 1,658 1.09 0.40 721 10,490 14.6
hmmer Search gene sequence 2,783 0.80 0.40 890 9,330 10.5
sjeng Chess game (AI) 2,176 0.96 0.48 37 12,100 14.5
libquantum Quantum computer simulation 1,623 1.61 0.40 1,047 20,720 19.8
h264avc Video compression 3,102 0.80 0.40 993 22,130 22.3
omnetpp Discrete event simulation 587 2.94 0.40 690 6,250 9.1
astar Games/path finding 1,082 1.79 0.40 773 7,020 9.1
xalancbmk XML parsing 1,058 2.70 0.40 1,143 6,900 6.0
Geometric mean 11.7
High cache miss rates
SPEC Power Benchmark
 Power consumption of server at different workload levels
 Performance: ssj_ops/sec
 Power: Watts (Joules/sec)












 
 

10
0
i
i
10
0
i
i power
ssj_ops
Watt
per
ssj_ops
Overall
SPECpower_ssj2008 for X4
Target Load % Performance (ssj_ops/sec) Average Power (Watts)
100% 231,867 295
90% 211,282 286
80% 185,803 275
70% 163,427 265
60% 140,160 256
50% 118,324 246
40% 920,35 233
30% 70,500 222
20% 47,126 206
10% 23,066 180
0% 0 141
Overall sum 1,283,590 2,605
ssj_ops/ power 493
Pitfall: Amdahls Law
 Improving an aspect of a computer and expecting a proportional
improvement in overall performance
則1.8
Fallacies
and
Pitfalls
20
80
20 

n
 Cant be done!
unaffected
affected
improved T
factor
t
improvemen
T
T 

 Example: multiply accounts for 80s/100s
 How much improvement in multiply performance to get 5 overall?
 Corollary: make the common case fast
Fallacy: Low Power at Idle
 Look back at X4 power benchmark
 At 100% load: 295W
 At 50% load: 246W (83%)
 At 10% load: 180W (61%)
 Google data center
 Mostly operates at 10%  50% load
 At 100% load less than 1% of the time
 Consider designing processors to make power
proportional to load
Pitfall: MIPS as a Performance Metric
 MIPS: Millions of Instructions Per Second
 Doesnt account for
- Differences in ISAs between computers
- Differences in complexity between instructions
6
6
6
10
CPI
rate
Clock
10
rate
Clock
CPI
count
n
Instructio
count
n
Instructio
10
time
Execution
count
n
Instructio
MIPS







 CPI varies between programs on a given CPU
Concluding Remarks
 Cost/performance is improving
 Due to underlying technology development
 Hierarchical layers of abstraction
 In both hardware and software
 Instruction set architecture
 The hardware/software interface
 Execution time: the best performance measure
 Power is a limiting factor
 Use parallelism to improve performance
則1.9
Concluding
Remarks

More Related Content

Similar to CS465Lec1.ppt computer architecture in the fall term (20)

Ntroduction to computer architecture and organization
Ntroduction to computer architecture and organizationNtroduction to computer architecture and organization
Ntroduction to computer architecture and organization
Fakulti seni, komputeran dan indusri kreatif
An introduction to digital signal processors 1
An introduction to digital signal processors 1An introduction to digital signal processors 1
An introduction to digital signal processors 1
Hossam Hassan
CP means computer programming in b tech first year
CP means computer programming in b tech first yearCP means computer programming in b tech first year
CP means computer programming in b tech first year
sriramsriram123654
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
enriquealbabaena6868
Unit i-introduction
Unit i-introductionUnit i-introduction
Unit i-introduction
akruthi k
Principles of operating system
Principles of operating systemPrinciples of operating system
Principles of operating system
Anil Dharmapuri
01.osdoc
01.osdoc01.osdoc
01.osdoc
Pramod Redekar
Assembly chapter One.pptx
Assembly chapter One.pptxAssembly chapter One.pptx
Assembly chapter One.pptx
ssuserb78e291
CMP 221.pptx computer science machine and assembly language
CMP 221.pptx computer science machine and assembly languageCMP 221.pptx computer science machine and assembly language
CMP 221.pptx computer science machine and assembly language
omotunwaserejoice
Essential Knowledge of Computers.pptx
Essential Knowledge of Computers.pptxEssential Knowledge of Computers.pptx
Essential Knowledge of Computers.pptx
HODCSE74
Introduction computer
Introduction computer Introduction computer
Introduction computer
dileeepajeewan
Embedded systems
Embedded systemsEmbedded systems
Embedded systems
Pruthvi Koli
Chapter_01computer architecture chap 2 .ppt
Chapter_01computer architecture chap 2 .pptChapter_01computer architecture chap 2 .ppt
Chapter_01computer architecture chap 2 .ppt
Maaz609108
Unit 1.pptx
Unit 1.pptxUnit 1.pptx
Unit 1.pptx
AshwiniMate10
Unit 1 one part introduction to computers
Unit 1 one part introduction to computersUnit 1 one part introduction to computers
Unit 1 one part introduction to computers
Neha Kurale
ComputerProgrammingCPattPatelChapter.pptx
ComputerProgrammingCPattPatelChapter.pptxComputerProgrammingCPattPatelChapter.pptx
ComputerProgrammingCPattPatelChapter.pptx
northernimpact53
INTRODUCTION TO COMPUTER SYSTEMS ARCHITECTURE1_17 December 2023.ppt
INTRODUCTION TO COMPUTER SYSTEMS ARCHITECTURE1_17 December 2023.pptINTRODUCTION TO COMPUTER SYSTEMS ARCHITECTURE1_17 December 2023.ppt
INTRODUCTION TO COMPUTER SYSTEMS ARCHITECTURE1_17 December 2023.ppt
MozammelHaque53
Unit 1 computer concepts
Unit 1   computer conceptsUnit 1   computer concepts
Unit 1 computer concepts
Mithun DSouza
125252.ppt
125252.ppt125252.ppt
125252.ppt
divlee1
microprocessor and microcontroller material
microprocessor and microcontroller materialmicroprocessor and microcontroller material
microprocessor and microcontroller material
sivapriyaSivakumar1
An introduction to digital signal processors 1
An introduction to digital signal processors 1An introduction to digital signal processors 1
An introduction to digital signal processors 1
Hossam Hassan
CP means computer programming in b tech first year
CP means computer programming in b tech first yearCP means computer programming in b tech first year
CP means computer programming in b tech first year
sriramsriram123654
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
1.1. SOC AND MULTICORE ARCHITECTURES FOR EMBEDDED SYSTEMS (2).pdf
enriquealbabaena6868
Unit i-introduction
Unit i-introductionUnit i-introduction
Unit i-introduction
akruthi k
Principles of operating system
Principles of operating systemPrinciples of operating system
Principles of operating system
Anil Dharmapuri
Assembly chapter One.pptx
Assembly chapter One.pptxAssembly chapter One.pptx
Assembly chapter One.pptx
ssuserb78e291
CMP 221.pptx computer science machine and assembly language
CMP 221.pptx computer science machine and assembly languageCMP 221.pptx computer science machine and assembly language
CMP 221.pptx computer science machine and assembly language
omotunwaserejoice
Essential Knowledge of Computers.pptx
Essential Knowledge of Computers.pptxEssential Knowledge of Computers.pptx
Essential Knowledge of Computers.pptx
HODCSE74
Introduction computer
Introduction computer Introduction computer
Introduction computer
dileeepajeewan
Embedded systems
Embedded systemsEmbedded systems
Embedded systems
Pruthvi Koli
Chapter_01computer architecture chap 2 .ppt
Chapter_01computer architecture chap 2 .pptChapter_01computer architecture chap 2 .ppt
Chapter_01computer architecture chap 2 .ppt
Maaz609108
Unit 1 one part introduction to computers
Unit 1 one part introduction to computersUnit 1 one part introduction to computers
Unit 1 one part introduction to computers
Neha Kurale
ComputerProgrammingCPattPatelChapter.pptx
ComputerProgrammingCPattPatelChapter.pptxComputerProgrammingCPattPatelChapter.pptx
ComputerProgrammingCPattPatelChapter.pptx
northernimpact53
INTRODUCTION TO COMPUTER SYSTEMS ARCHITECTURE1_17 December 2023.ppt
INTRODUCTION TO COMPUTER SYSTEMS ARCHITECTURE1_17 December 2023.pptINTRODUCTION TO COMPUTER SYSTEMS ARCHITECTURE1_17 December 2023.ppt
INTRODUCTION TO COMPUTER SYSTEMS ARCHITECTURE1_17 December 2023.ppt
MozammelHaque53
Unit 1 computer concepts
Unit 1   computer conceptsUnit 1   computer concepts
Unit 1 computer concepts
Mithun DSouza
125252.ppt
125252.ppt125252.ppt
125252.ppt
divlee1
microprocessor and microcontroller material
microprocessor and microcontroller materialmicroprocessor and microcontroller material
microprocessor and microcontroller material
sivapriyaSivakumar1

Recently uploaded (20)

A Simple Introduction to data Science- what is it and what does it do
A Simple Introduction to data Science- what is it and what does it doA Simple Introduction to data Science- what is it and what does it do
A Simple Introduction to data Science- what is it and what does it do
sarah mabrouk
100 questions on Data Science to Master interview
100 questions on Data Science to Master interview100 questions on Data Science to Master interview
100 questions on Data Science to Master interview
yashikanigam1
Blood Bank Management Skahfhfhystem.pptx
Blood Bank Management Skahfhfhystem.pptxBlood Bank Management Skahfhfhystem.pptx
Blood Bank Management Skahfhfhystem.pptx
vedantgupta411
AI-vs-Data-Science-Unraveling-the-Tech-Landscape
AI-vs-Data-Science-Unraveling-the-Tech-LandscapeAI-vs-Data-Science-Unraveling-the-Tech-Landscape
AI-vs-Data-Science-Unraveling-the-Tech-Landscape
Ozias Rondon
SQL-for-Data-Analytics-Top-10-Queries-Every-Analyst-Should-Know
SQL-for-Data-Analytics-Top-10-Queries-Every-Analyst-Should-KnowSQL-for-Data-Analytics-Top-10-Queries-Every-Analyst-Should-Know
SQL-for-Data-Analytics-Top-10-Queries-Every-Analyst-Should-Know
Ozias Rondon
TCP/IP PRESENTATION BY SHARMILA FALLER FOR INFORMATION SYSTEM
TCP/IP PRESENTATION BY SHARMILA FALLER FOR INFORMATION SYSTEMTCP/IP PRESENTATION BY SHARMILA FALLER FOR INFORMATION SYSTEM
TCP/IP PRESENTATION BY SHARMILA FALLER FOR INFORMATION SYSTEM
sharmilafaller
Chapter-4-Plane-Wave-Propagation-pdf.pdf
Chapter-4-Plane-Wave-Propagation-pdf.pdfChapter-4-Plane-Wave-Propagation-pdf.pdf
Chapter-4-Plane-Wave-Propagation-pdf.pdf
ShamsAli42
MeasureCamp Belgrade 2025 - Yasen Lilov - Past - Present - Prompt
MeasureCamp Belgrade 2025 - Yasen Lilov - Past - Present - PromptMeasureCamp Belgrade 2025 - Yasen Lilov - Past - Present - Prompt
MeasureCamp Belgrade 2025 - Yasen Lilov - Past - Present - Prompt
Yasen Lilov
537116365-Domain-6-Presentation-New.pptx
537116365-Domain-6-Presentation-New.pptx537116365-Domain-6-Presentation-New.pptx
537116365-Domain-6-Presentation-New.pptx
PorshaAbril1
Threat Intelligence Platform_ The Future of Cybersecurity Defense.docx
Threat Intelligence Platform_ The Future of Cybersecurity Defense.docxThreat Intelligence Platform_ The Future of Cybersecurity Defense.docx
Threat Intelligence Platform_ The Future of Cybersecurity Defense.docx
dexposewebcast
Introduction to Microsoft Power BI is a business analytics service
Introduction to Microsoft Power BI is a business analytics serviceIntroduction to Microsoft Power BI is a business analytics service
Introduction to Microsoft Power BI is a business analytics service
Kongu Engineering College, Perundurai, Erode
20-NoSQLMongoDbiig data analytics hB.pdf
20-NoSQLMongoDbiig data analytics hB.pdf20-NoSQLMongoDbiig data analytics hB.pdf
20-NoSQLMongoDbiig data analytics hB.pdf
ssuser2d043c
Automating-Your-World-with-Python-Scripts
Automating-Your-World-with-Python-ScriptsAutomating-Your-World-with-Python-Scripts
Automating-Your-World-with-Python-Scripts
Ozias Rondon
DRMS-S- 13 CCA-DRR-Social Protection.ppt
DRMS-S- 13 CCA-DRR-Social Protection.pptDRMS-S- 13 CCA-DRR-Social Protection.ppt
DRMS-S- 13 CCA-DRR-Social Protection.ppt
ChiefTraining
Chat Bots - An Analytical study including Indian players
Chat Bots - An Analytical study including Indian playersChat Bots - An Analytical study including Indian players
Chat Bots - An Analytical study including Indian players
DR. Ram Kumar Pathak
20230109_NLDL_Tutorial_Tan.pdf data analysis
20230109_NLDL_Tutorial_Tan.pdf data analysis20230109_NLDL_Tutorial_Tan.pdf data analysis
20230109_NLDL_Tutorial_Tan.pdf data analysis
aitaghavi
Information Security Management-Planning 1.pptx
Information Security Management-Planning 1.pptxInformation Security Management-Planning 1.pptx
Information Security Management-Planning 1.pptx
FrancisFayiah
This presentation detail concepts of cryptocurrency
This presentation detail concepts of cryptocurrencyThis presentation detail concepts of cryptocurrency
This presentation detail concepts of cryptocurrency
Aslbtr
deloitte esg 16092024 for september 2024 pdf.pdf
deloitte esg 16092024 for september 2024 pdf.pdfdeloitte esg 16092024 for september 2024 pdf.pdf
deloitte esg 16092024 for september 2024 pdf.pdf
JatinSharma979989
Visionaize for Visionaize AI Powered Solution For Thermal Power Plant.pptx
Visionaize  for Visionaize AI Powered Solution For Thermal Power Plant.pptxVisionaize  for Visionaize AI Powered Solution For Thermal Power Plant.pptx
Visionaize for Visionaize AI Powered Solution For Thermal Power Plant.pptx
SumantaBasu12
A Simple Introduction to data Science- what is it and what does it do
A Simple Introduction to data Science- what is it and what does it doA Simple Introduction to data Science- what is it and what does it do
A Simple Introduction to data Science- what is it and what does it do
sarah mabrouk
100 questions on Data Science to Master interview
100 questions on Data Science to Master interview100 questions on Data Science to Master interview
100 questions on Data Science to Master interview
yashikanigam1
Blood Bank Management Skahfhfhystem.pptx
Blood Bank Management Skahfhfhystem.pptxBlood Bank Management Skahfhfhystem.pptx
Blood Bank Management Skahfhfhystem.pptx
vedantgupta411
AI-vs-Data-Science-Unraveling-the-Tech-Landscape
AI-vs-Data-Science-Unraveling-the-Tech-LandscapeAI-vs-Data-Science-Unraveling-the-Tech-Landscape
AI-vs-Data-Science-Unraveling-the-Tech-Landscape
Ozias Rondon
SQL-for-Data-Analytics-Top-10-Queries-Every-Analyst-Should-Know
SQL-for-Data-Analytics-Top-10-Queries-Every-Analyst-Should-KnowSQL-for-Data-Analytics-Top-10-Queries-Every-Analyst-Should-Know
SQL-for-Data-Analytics-Top-10-Queries-Every-Analyst-Should-Know
Ozias Rondon
TCP/IP PRESENTATION BY SHARMILA FALLER FOR INFORMATION SYSTEM
TCP/IP PRESENTATION BY SHARMILA FALLER FOR INFORMATION SYSTEMTCP/IP PRESENTATION BY SHARMILA FALLER FOR INFORMATION SYSTEM
TCP/IP PRESENTATION BY SHARMILA FALLER FOR INFORMATION SYSTEM
sharmilafaller
Chapter-4-Plane-Wave-Propagation-pdf.pdf
Chapter-4-Plane-Wave-Propagation-pdf.pdfChapter-4-Plane-Wave-Propagation-pdf.pdf
Chapter-4-Plane-Wave-Propagation-pdf.pdf
ShamsAli42
MeasureCamp Belgrade 2025 - Yasen Lilov - Past - Present - Prompt
MeasureCamp Belgrade 2025 - Yasen Lilov - Past - Present - PromptMeasureCamp Belgrade 2025 - Yasen Lilov - Past - Present - Prompt
MeasureCamp Belgrade 2025 - Yasen Lilov - Past - Present - Prompt
Yasen Lilov
537116365-Domain-6-Presentation-New.pptx
537116365-Domain-6-Presentation-New.pptx537116365-Domain-6-Presentation-New.pptx
537116365-Domain-6-Presentation-New.pptx
PorshaAbril1
Threat Intelligence Platform_ The Future of Cybersecurity Defense.docx
Threat Intelligence Platform_ The Future of Cybersecurity Defense.docxThreat Intelligence Platform_ The Future of Cybersecurity Defense.docx
Threat Intelligence Platform_ The Future of Cybersecurity Defense.docx
dexposewebcast
20-NoSQLMongoDbiig data analytics hB.pdf
20-NoSQLMongoDbiig data analytics hB.pdf20-NoSQLMongoDbiig data analytics hB.pdf
20-NoSQLMongoDbiig data analytics hB.pdf
ssuser2d043c
Automating-Your-World-with-Python-Scripts
Automating-Your-World-with-Python-ScriptsAutomating-Your-World-with-Python-Scripts
Automating-Your-World-with-Python-Scripts
Ozias Rondon
DRMS-S- 13 CCA-DRR-Social Protection.ppt
DRMS-S- 13 CCA-DRR-Social Protection.pptDRMS-S- 13 CCA-DRR-Social Protection.ppt
DRMS-S- 13 CCA-DRR-Social Protection.ppt
ChiefTraining
Chat Bots - An Analytical study including Indian players
Chat Bots - An Analytical study including Indian playersChat Bots - An Analytical study including Indian players
Chat Bots - An Analytical study including Indian players
DR. Ram Kumar Pathak
20230109_NLDL_Tutorial_Tan.pdf data analysis
20230109_NLDL_Tutorial_Tan.pdf data analysis20230109_NLDL_Tutorial_Tan.pdf data analysis
20230109_NLDL_Tutorial_Tan.pdf data analysis
aitaghavi
Information Security Management-Planning 1.pptx
Information Security Management-Planning 1.pptxInformation Security Management-Planning 1.pptx
Information Security Management-Planning 1.pptx
FrancisFayiah
This presentation detail concepts of cryptocurrency
This presentation detail concepts of cryptocurrencyThis presentation detail concepts of cryptocurrency
This presentation detail concepts of cryptocurrency
Aslbtr
deloitte esg 16092024 for september 2024 pdf.pdf
deloitte esg 16092024 for september 2024 pdf.pdfdeloitte esg 16092024 for september 2024 pdf.pdf
deloitte esg 16092024 for september 2024 pdf.pdf
JatinSharma979989
Visionaize for Visionaize AI Powered Solution For Thermal Power Plant.pptx
Visionaize  for Visionaize AI Powered Solution For Thermal Power Plant.pptxVisionaize  for Visionaize AI Powered Solution For Thermal Power Plant.pptx
Visionaize for Visionaize AI Powered Solution For Thermal Power Plant.pptx
SumantaBasu12

CS465Lec1.ppt computer architecture in the fall term

  • 1. CS 465 Computer Architecture Fall 2009 Lecture 01: Introduction Daniel Barbar叩 ( cs.gmu.edu/~dbarbara) [Adapted from Computer Organization and Design, Patterson & Hennessy, 息 2005, UCB]
  • 2. Course Administration Instructor: Daniel Barbar叩 dbarbara@gmu.edu 4420 Eng. Bldg. Text: Required: Computer Organization & Design The Hardware Software Interface, Patterson & Hennessy, the 4th Edition
  • 3. Grading Information Grade determinates Midterm Exam ~25% Final Exam 1 ~35% Homeworks ~40% - Due at the beginning of class (or, if its code to be submitted electronically, by 17:00 on the due date). No late assignments will be accepted. Course prerequisites grade of C or better in CS 367
  • 4. Acknowledgements 際際滷s adopted from Dr. Zhong Contributions from Dr. Setia 際際滷s also adopt materials from many other universities IMPORTANT: - 際際滷s are not intended as replacement for the text - You spent the money on the book, please read it!
  • 5. Course Topics (Tentative) Instruction set architecture (Chapter 2) MIPS Arithmetic operations & data (Chapter 3) System performance (Chapter 4) Processor (Chapter 5) Datapath and control Pipelining to improve performance (Chapter 6) Memory hierarchy (Chapter 7) I/O (Chapter 8)
  • 6. Focus of the Course How computers work MIPS instruction set architecture The implementation of MIPS instruction set architecture MIPS processor design Issues affecting modern processors Pipelining processor performance improvement Cache memory system, I/O systems
  • 7. Why Learn Computer Architecture? You want to call yourself a computer scientist Computer architecture impacts every other aspect of computer science You need to make a purchasing decision or offer expert advice You want to build software people use sell many, many copies- (need performance) Both hardware and software affect performance - Algorithm determines number of source-level statements - Language/compiler/architecture determine machine instructions (Chapter 2 and 3) - Processor/memory determine how fast instructions are executed (Chapter 5, 6, and 7) - Assessing and understanding performance(Chapter 4)
  • 8. Outline Today Course logistics Computer architectures overview Trends in computer architectures
  • 9. Computer Systems Software Application software Word Processors, Email, Internet Browsers, Games Systems software Compilers, Operating Systems Hardware CPU Memory I/O devices (mouse, keyboard, display, disks, networks,..)
  • 11. D.Barbar叩 instruction set software hardware Instruction Set Architecture One of the most important abstractions is ISA A critical interface between HW and SW Example: MIPS Desired properties Convenience (from software side) Efficiency (from hardware side)
  • 12. D.Barbar叩 What is Computer Architecture Programmers view: a pleasant environment Operating systems view: a set of resources (hw & sw) System architecture view: a set of components Compilers view: an instruction set architecture with OS help Microprocessor architecture view: a set of functional units VLSI designers view: a set of transistors implementing logic Mechanical engineers view: a heater!
  • 13. D.Barbar叩 What is Computer Architecture Patterson & Hennessy: Computer architecture = Instruction set architecture + Machine organization + Hardware For this course, computer architecture mainly refers to ISA (Instruction Set Architecture) Programmer-visible, serves as the boundary between the software and hardware Modern ISA examples: MIPS, SPARC, PowerPC, DEC Alpha
  • 14. D.Barbar叩 Organization and Hardware Organization: high-level aspects of a computers design Principal components: memory, CPU, I/O, How components are interconnected How information flows between components E.g. AMD Opteron 64 and Intel Pentium 4: same ISA but different organizations Hardware: detailed logic design and the packaging technology of a computer E.g. Pentium 4 and Mobile Pentium 4: nearly identical organizations but different hardware details
  • 15. Types of computers and their applications Desktop Run third-party software Office to home applications 30 years old Servers Modern version of what used to be called mainframes, minicomputers and supercomputers Large workloads Built using the same technology in desktops but higher capacity - Expandable - Scalable - Reliable Large spectrum: from low-end (file storage, small businesses) to supercomputers (high end scientific and engineering applications) - Gigabytes to Terabytes to Petabytes of storage Examples: file servers, web servers, database servers
  • 16. Types of computers Embedded Microprocessors everywhere! (washing machines, cell phones, automobiles, video games) Run one or a few applications Specialized hardware integrated with the application (not your common processor) Usually stringent limitations (battery power) High tolerance for failure (dont want your airplane avionics to fail!) Becoming ubiquitous Engineered using processor cores - The core allows the engineer to integrate other functions into the processor for fabrication on the same chip - Using hardware description languages: Verilog, VHDL
  • 17. Where is the Market? 290 93 3 488 114 3 892 135 4 862 129 4 1122 131 5 0 200 400 600 800 1000 1200 1998 1999 2000 2001 2002 Embedded Desktop Servers Millions of Computers
  • 18. In this class you will learn How programs written in a high-level language (e.g., Java) translate into the language of the hardware and how the hardware executes them. The interface between software and hardware and how software instructs hardware to perform the needed functions. The factors that determine the performance of a program The techniques that hardware designers employ to improve performance. As a consequence, you will understand what features may make one computer design better than another for a particular application
  • 19. High-level to Machine Language High-level language program (in C) Assembly language program (for MIPS) Binary machine language program (for MIPS) Compiler Assembler
  • 20. Evolution In the beginning there were only bits and people spent countless hours trying to program in machine language 01100011001 011001110100 Finally before everybody went insane, the assembler was invented: write in mnemonics called assembly language and let the assembler translate (a one to one translation) Add A,B This wasnt for everybody, obviously (imagine how modern applications would have been possible in assembly), so high-level language were born (and with them compilers to translate to assembly, a many-to-one translation) C= A*(SQRT(B)+3.0)
  • 21. THE BIG IDEA Levels of abstraction: each layer provides its own (simplified) view and hides the details of the next.
  • 22. Instruction Set Architecture (ISA) ISA: An abstract interface between the hardware and the lowest level software of a machine that encompasses all the information necessary to write a machine language program that will run correctly, including instructions, registers, memory access, I/O, and so on. ... the attributes of a [computing] system as seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls, the logic design, and the physical implementation. Amdahl, Blaauw, and Brooks, 1964 Enables implementations of varying cost and performance to run identical software ABI (application binary interface): The user portion of the instruction set plus the operating system interfaces used by application programmers. Defines a standard for binary portability across computers.
  • 23. ISA Type Sales 0 200 400 600 800 1000 1200 1400 1998 1999 2000 2001 2002 Other SPARC Hitachi SH PowerPC Motorola 68K MIPS IA-32 ARM PowerPoint comic bar chart with approximate values (see text for correct values) Millions of Processor
  • 24. Organization of a computer
  • 25. Anatomy of Computer Personal Computer Processor Computer Control (brain) Datapath (brawn) Memory (where programs, data live when running) Devices Input Output Keyboard, Mouse Display, Printer Disk (where programs, data live when not running) 5 classic components Datapath: performs arithmetic operation Control: guides the operation of other components based on the user instructions
  • 28. Moores Law In 1965, Gordon Moore predicted that the number of transistors that can be integrated on a die would double every 18 to 24 months (i.e., grow exponentially with time). Amazingly visionary million transistor/chip barrier was crossed in the 1980s. 2300 transistors, 1 MHz clock (Intel 4004) - 1971 16 Million transistors (Ultra Sparc III) 42 Million transistors, 2 GHz clock (Intel Xeon) 2001 55 Million transistors, 3 GHz, 130nm technology, 250mm2 die (Intel Pentium 4) - 2004 140 Million transistor (HP PA-8500)
  • 29. Processor Performance Increase 1 10 100 1000 10000 1987 1989 1991 1993 1995 1997 1999 2001 2003 Year Performance (SPEC Int) SUN-4/260 MIPS M/120 MIPS M2000 IBM RS6000 HP 9000/750 DEC AXP/500 IBM POWER 100 DEC Alpha 4/266 DEC Alpha 5/500 DEC Alpha 21264/600 DEC Alpha 5/300 DEC Alpha 21264A/667 Intel Xeon/2000 Intel Pentium 4/3000
  • 30. Year Transistors 1000 10000 100000 1000000 10000000 100000000 1970 1975 1980 1985 1990 1995 2000 i80386 i4004 i8080 Pentium i80486 i80286 i8086 CMOS improvements: Die size: 2X every 3 yrs Line width: halve / 7 yrs Itanium II: 241 million Pentium 4: 55 million Alpha 21264: 15 million Pentium Pro: 5.5 million PowerPC 620: 6.9 million Alpha 21164: 9.3 million Sparc Ultra: 5.2 million Moores Law Trend: Microprocessor Capacity
  • 31. Moores Law Cramming More Components onto Integrated Circuits Gordon Moore, Electronics, 1965 # of transistors per cost-effective integrated circuit doubles every 18 months Transistor capacity doubles every 18-24 months Speed 2x / 1.5 years (since 85); 100X performance in last decade
  • 33. Memory Dynamic Random Access Memory (DRAM) The choice for main memory Volatile (contents go away when power is lost) Fast Relatively small DRAM capacity: 2x / 2 years (since 96); 64x size improvement in last decade Static Random Access Memory (SRAM) The choice for cache Much faster than DRAM, but less dense and more costly Magnetic disks The choice for secondary memory Non-volatile Slower Relatively large Capacity: 2x / 1 year (since 97) 250X size in last decade Solid state (Flash) memory The choice for embedded computers Non-volatile
  • 34. Memory Optical disks Removable, therefore very large Slower than disks Magnetic tape Even slower Sequential (non-random) access The choice for archival
  • 35. DRAM Capacity Growth 10 100 1000 10000 100000 1000000 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 Year of introduction Kbit capacity 16K 64K 256K 1M 4M 16M 64M 128M 256M 512M
  • 36. Trend: Memory Capacity size Year Bits 1000 10000 100000 1000000 10000000 100000000 1000000000 1970 1975 1980 1985 1990 1995 2000 year size (Mbit) 1980 0.0625 1983 0.25 1986 1 1989 4 1992 16 1996 64 1998 128 2000 256 2002 512 2006 2048 Now 1.4X/yr, or 2X every 2 years. more than 10000X since 1980! Growth of capacity per chip
  • 37. (Kilo, Mega, Giga, Tera, Peta, Exa, Zetta, Yotta = 1024) Come up with a clever mnemonic, fame! Dramatic Technology Change State-of-the-art PC when you graduate: (at least) Processor clock speed: 5000 MegaHertz (5.0 GigaHertz) Memory capacity: 4000 MegaBytes (4.0 GigaBytes) Disk capacity: 2000 GigaBytes (2.0 TeraBytes) New units! Mega => Giga, Giga => Tera
  • 38. Example Machine Organization Workstation design target 25% of cost on processor 25% of cost on memory (minimum memory size) Rest on I/O devices, power supplies, box CPU Computer Control Datapath Memory Devices Input Output
  • 39. MIPS R3000 Instruction Set Architecture Instruction Categories Load/Store Computational Jump and Branch Floating Point - coprocessor Memory Management Special R0 - R31 PC HI LO OP OP OP rs rt rd sa funct rs rt immediate jump target 3 Instruction Formats: all 32 bits wide Registers
  • 40. Defining Performance Which airplane has the best performance? 0 100 200 300 400 500 Douglas DC-8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Passenger Capacity 0 2000 4000 6000 8000 10000 Douglas DC- 8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Cruising Range (miles) 0 500 1000 1500 Douglas DC-8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Cruising Speed (mph) 0 100000 200000 300000 400000 Douglas DC- 8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Passengers x mph 則1.4 Performance
  • 41. Response Time and Throughput Response time How long it takes to do a task Throughput Total work done per unit time - e.g., tasks/transactions/ per hour How are response time and throughput affected by Replacing the processor with a faster version? Adding more processors? Well focus on response time for now
  • 42. Relative Performance Define Performance = 1/Execution Time X is n time faster than Y n X Y Y X time Execution time Execution e Performanc e Performanc Example: time taken to run a program 10s on A, 15s on B Execution TimeB / Execution TimeA = 15s / 10s = 1.5 So A is 1.5 times faster than B
  • 43. Measuring Execution Time Elapsed time Total response time, including all aspects - Processing, I/O, OS overhead, idle time Determines system performance CPU time Time spent processing a given job - Discounts I/O time, other jobs shares Comprises user CPU time and system CPU time Different programs are affected differently by CPU and system performance
  • 44. CPU Clocking Operation of digital hardware governed by a constant-rate clock Clock (cycles) Data transfer and computation Update state Clock period Clock period: duration of a clock cycle e.g., 250ps = 0.25ns = 2501012s Clock frequency (rate): cycles per second e.g., 4.0GHz = 4000MHz = 4.0109Hz
  • 45. CPU Time Performance improved by Reducing number of clock cycles Increasing clock rate Hardware designer must often trade off clock rate against cycle count Rate Clock Cycles Clock CPU Time Cycle Clock Cycles Clock CPU Time CPU
  • 46. CPU Time Example Computer A: 2GHz clock, 10s CPU time Designing Computer B Aim for 6s CPU time Can do faster clock, but causes 1.2 clock cycles How fast must Computer B clock be? 4GHz 6s 10 24 6s 10 20 1.2 Rate Clock 10 20 2GHz 10s Rate Clock Time CPU Cycles Clock 6s Cycles Clock 1.2 Time CPU Cycles Clock Rate Clock 9 9 B 9 A A A A B B B
  • 47. Instruction Count and CPI Instruction Count for a program Determined by program, ISA and compiler Average cycles per instruction Determined by CPU hardware If different instructions have different CPI - Average CPI affected by instruction mix Rate Clock CPI Count n Instructio Time Cycle Clock CPI Count n Instructio Time CPU n Instructio per Cycles Count n Instructio Cycles Clock
  • 48. CPI Example Computer A: Cycle Time = 250ps, CPI = 2.0 Computer B: Cycle Time = 500ps, CPI = 1.2 Same ISA Which is faster, and by how much? 1.2 500ps I 600ps I A Time CPU B Time CPU 600ps I 500ps 1.2 I B Time Cycle B CPI Count n Instructio B Time CPU 500ps I 250ps 2.0 I A Time Cycle A CPI Count n Instructio A Time CPU A is faster by this much
  • 49. CPI in More Detail If different instruction classes take different numbers of cycles n 1 i i i ) Count n Instructio (CPI Cycles Clock Weighted average CPI n 1 i i i Count n Instructio Count n Instructio CPI Count n Instructio Cycles Clock CPI Relative frequency
  • 50. CPI Example Alternative compiled code sequences using instructions in classes A, B, C Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1 Sequence 1: IC = 5 Clock Cycles = 21 + 12 + 23 = 10 Avg. CPI = 10/5 = 2.0 Sequence 2: IC = 6 Clock Cycles = 41 + 12 + 13 = 9 Avg. CPI = 9/6 = 1.5
  • 51. Performance Summary Performance depends on Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI, Tc The BIG Picture cycle Clock Seconds n Instructio cycles Clock Program ns Instructio Time CPU
  • 52. Power Trends In CMOS IC technology 則1.5 The Power Wall Frequency Voltage load Capacitive Power 2 1000 30 5V 1V
  • 53. Reducing Power Suppose a new CPU has 85% of capacitive load of old CPU 15% voltage and 15% frequency reduction 0.52 0.85 F V C 0.85 F 0.85) (V 0.85 C P P 4 old 2 old old old 2 old old old new The power wall We cant reduce voltage further We cant remove more heat How else can we improve performance?
  • 55. Multiprocessors Multicore microprocessors More than one processor per chip Requires explicitly parallel programming Compare with instruction level parallelism - Hardware executes multiple instructions at once - Hidden from the programmer Hard to do - Programming for performance - Load balancing - Optimizing communication and synchronization
  • 56. SPEC CPU Benchmark Programs used to measure performance Supposedly typical of actual workload Standard Performance Evaluation Corp (SPEC) Develops benchmarks for CPU, I/O, Web, SPEC CPU2006 Elapsed time to execute a selection of programs - Negligible I/O, so focuses on CPU performance Normalize relative to reference machine Summarize as geometric mean of performance ratios - CINT2006 (integer) and CFP2006 (floating-point) n n 1 i i ratio time Execution
  • 57. CINT2006 for Opteron X4 2356 Name Description IC109 CPI Tc (ns) Exec time Ref time SPECratio perl Interpreted string processing 2,118 0.75 0.40 637 9,777 15.3 bzip2 Block-sorting compression 2,389 0.85 0.40 817 9,650 11.8 gcc GNU C Compiler 1,050 1.72 0.47 24 8,050 11.1 mcf Combinatorial optimization 336 10.00 0.40 1,345 9,120 6.8 go Go game (AI) 1,658 1.09 0.40 721 10,490 14.6 hmmer Search gene sequence 2,783 0.80 0.40 890 9,330 10.5 sjeng Chess game (AI) 2,176 0.96 0.48 37 12,100 14.5 libquantum Quantum computer simulation 1,623 1.61 0.40 1,047 20,720 19.8 h264avc Video compression 3,102 0.80 0.40 993 22,130 22.3 omnetpp Discrete event simulation 587 2.94 0.40 690 6,250 9.1 astar Games/path finding 1,082 1.79 0.40 773 7,020 9.1 xalancbmk XML parsing 1,058 2.70 0.40 1,143 6,900 6.0 Geometric mean 11.7 High cache miss rates
  • 58. SPEC Power Benchmark Power consumption of server at different workload levels Performance: ssj_ops/sec Power: Watts (Joules/sec) 10 0 i i 10 0 i i power ssj_ops Watt per ssj_ops Overall
  • 59. SPECpower_ssj2008 for X4 Target Load % Performance (ssj_ops/sec) Average Power (Watts) 100% 231,867 295 90% 211,282 286 80% 185,803 275 70% 163,427 265 60% 140,160 256 50% 118,324 246 40% 920,35 233 30% 70,500 222 20% 47,126 206 10% 23,066 180 0% 0 141 Overall sum 1,283,590 2,605 ssj_ops/ power 493
  • 60. Pitfall: Amdahls Law Improving an aspect of a computer and expecting a proportional improvement in overall performance 則1.8 Fallacies and Pitfalls 20 80 20 n Cant be done! unaffected affected improved T factor t improvemen T T Example: multiply accounts for 80s/100s How much improvement in multiply performance to get 5 overall? Corollary: make the common case fast
  • 61. Fallacy: Low Power at Idle Look back at X4 power benchmark At 100% load: 295W At 50% load: 246W (83%) At 10% load: 180W (61%) Google data center Mostly operates at 10% 50% load At 100% load less than 1% of the time Consider designing processors to make power proportional to load
  • 62. Pitfall: MIPS as a Performance Metric MIPS: Millions of Instructions Per Second Doesnt account for - Differences in ISAs between computers - Differences in complexity between instructions 6 6 6 10 CPI rate Clock 10 rate Clock CPI count n Instructio count n Instructio 10 time Execution count n Instructio MIPS CPI varies between programs on a given CPU
  • 63. Concluding Remarks Cost/performance is improving Due to underlying technology development Hierarchical layers of abstraction In both hardware and software Instruction set architecture The hardware/software interface Execution time: the best performance measure Power is a limiting factor Use parallelism to improve performance 則1.9 Concluding Remarks