際際滷

際際滷Share a Scribd company logo
Leakage Power Minimization using SA-Based
Gate Sizing and Threshold Voltage Assignment
Chih-Chuan, Yu
Outline
? Introduction
? Related Work
? Problem Formulation
? Proposed Methodology
? Experimental Results
? Conclusion and Future Work
2
Introduction
? Low Power and High Performance
? Mobile device
? Leakage Power Rise
? ITRS Roadmap 2009 [33]
? Technology scales down
3
Leakage Power Minimization Methods
? Gate Sizing
???? ???? 『
??????? ????? 『
??????? ????????
? Threshold Voltage Assignment
? ??? 『 1/??????? ?????
? ??? 『 ????? ????
? Low Vth on critical path
? High Vth on non-critical path
4
Outline
? Introduction
? Related Work
? Problem Formulation
? Proposed Methodology
? Experimental Results
? Conclusion and Future Work
5
Related Work
6
Continuous methods Discrete methods
? Linear Programming (LP)
? Geometric programming
(GP)
? Sensitivity-based Approach
? Slack and delay Budgeting
? Dynamic Programming(DP)
? Lagrangian Relaxation (LR)
? Linear Programming (LP)
? Simulated Annealing (SA)
Continuous Methods
? Linear Programming (LP)
? Linear delay model
? The selection of gates is defined as linear function
? Geometric programming (GP)
? Polynomial delay model
7
Discrete Methods
? Sensitivity-based approach
? Score and Rank gates according to a defined sensitivity
? Iteratively select the best gate for optimization until no improvement can be
made
? Slack and delay budgeting
? Allocate a slack budget to each gate
? Use the slack budget to trade the power for each gate.
? Dynamic Programming (DP)
? Use decision stage and cost-to-go function.
8
Discrete Methods (cont.)
? Lagrangian Relaxation (LR)
? Covert constrained problem to unconstrained one.
? Lagrange multiplier
? Linear Programming (LP)
? The selection of gates is implemented by assigning value to a binary variable:
1 is chosen and 0 otherwise.
? Simulated Annealing (SA)
? Probabilistic method for finding a good approximation to the global optimum
9
Related Work Comparison
Methodology Pros Cons
Continuous
Sizing
LP
Fast
Modeling Error
Mapping IssueGP
Discrete
Sizing
Sensitivity Local optimal
Slack & Delay
Ignore delay interaction
LP
DP Solution space explosion
LR Large scale Solution Oscillate
SA
Global optimal
Approximation
Fast solution space
exploration
10
Outline
? Introduction
? Related Work
? Problem Formulation
? Proposed Methodology
? Experimental Results
? Conclusion and Future Work
11
Motivational Example
12
Solution u1 u2 u3
Timing
Violation
Total
Leakage
Power
Solution 1 s10 s06 s04 -2.32 26
Solution 2 s10 s06 f04 0 86
Solution 3 s10 s06 m04 0 38
n2n1
oa oa oa
n3 n4
50ps
u1 u2 u3
Problem Formulation
? Inputs:
? Standard Cell Library
? Gate-level Netlist
? Timing Constraints
? Interconnect Parasitics
? Outputs:
? The selection of each cell¨s sizes and threshold voltage
? Objective:
? Satisfy all performance constraints
? Minimize total leakage power
13
Performance constraints
? Slack violation:
? At PO and DFF inputs, it exists negative slack.
? Slew(Transition time) violation:
? At PO and cell input pins, the transition time is larger than the max limit
transition time.
? Max-load violation:
? At cell output pins, the fan-out load summation is larger than the cell¨s max
capacitance.
14
Problem Assumptions
? Interconnect parasitics are modeled as lumped capacitance.
? Sequential sizing is not allowed.
? Only one selection for sequential cells.
? Ideal clock network
? No clock buffer, zero skew, and clock net has zero lumped capacitance.
15
Outline
? Introduction
? Related Work
? Problem Formulation
? Proposed Methodology
? Experimental Results
? Conclusion and Future Work
16
Proposed Methodology
? Phase I: Iterative Algorithm for Initial Solution
? Initial solution that satisfies the timing requirement
? Phase II: Simulated-Annealing-Based Algorithm
? Leakage power minimization
17
Phase I: Pseudo Code
Iterative Algorithm: upsize cells for feasible solution
Inputs: netlist, cell library, timing constraints, and interconnect parasitics
Outputs: each cell¨s size and threshold voltage assignment
Step 1: Count the visited times of the cells traced by negative-slack paths
Step 2: Sort by each cell counter
Step 3: Iterative upsizing in above-defined order
18
Phase I: Pseudo Code (Step 1)
Step 1: Count the visited times of the cells traced by negative-slack paths
Run timing engine to calculate each cell¨s slack;
Initialize each cell¨s counter to zero;
Initialize each cell¨s to smallest type-size;
foreach (negative-slack paths)
foreach (cells in the selected path)
if (selected cell has negative slack)
Increase selected cell¨s counter;
19
Phase I: Pseudo Code (Step 2 & 3)
Step 2: Sort by each cell counter
Sort cell order by each cell¨s counter, from larger to small;
Step 3: Iterative upsizing in above-defined order
do
foreach (cell from above-defined order)
if (selected cell has negative slack)
while (selected cell has larger type-size)
if (new Pleakage < old Pleakage)
Update type-size;
until (no negative slack)
20
Phase II: Simulated-Annealing-Based
1. Solution representation:
? The set of size and type of each cell.
2. Solution perturbation:
? Randomly pick a cell and change its size and threshold voltage assignment.
3. Cost function:
? Total leakage power.
4. Annealing schedule: (next slide)
21
Phase II: SA ! Temperature check
22
IF T > ε
THEN NEXT_ITER
ELSE
THEN FINISHED
FINISHED
START
initialization
T > ε
Find new solution
accept?
Update current solution
Update temperature(T)
update
T?
Yes
No
Yes
Yes
No
No
Phase II: SA ! New solution
23
1. Randomly pick cell
2. Randomly pick new type
and size
3. Call timer and Recalculate
cost
FINISHED
START
initialization
T > ε
Find new solution
accept?
Update current solution
Update temperature(T)
update
T?
Yes
No
Yes
Yes
No
No
Phase II: SA ! Solution acceptance
24
IF Cnew < Clast
IF Cnew < Cbest
THEN state = UPD
ELSE state = NEW
ELSE IF A.Prob. > Random
THEN state = ACP
ELSE state = REJ
? ?0,1expProb.Accept. *TK
C
??
??
old
oldnew
C
CC
C
)( ?
??
? ?1,0Random?
FINISHED
START
initialization
T > ε
Find new solution
accept?
Update current solution
Update temperature(T)
update
T?
Yes
No
Yes
Yes
No
No
Phase II: SA ! Solution update
25
FINISHED
START
initialization
T > ε
Find new solution
accept?
Update current solution
Update temperature(T)
update
T?
Yes
No
Yes
Yes
No
No
IF state = UPD or NEW or
ACP
THEN Slast = Snew
ELSE
THEN Slast = Slast
Phase II: SA ! Temperature update
26
IF γ > φ
THEN DROP_TEMP
ELSE
THEN NEXT_ITER
γ is the counter of successive
state ^Reject ̄
φ is a constant variable
FINISHED
START
initialization
T > ε
Find new solution
accept?
Update current solution
Update temperature(T)
update
T?
Yes
No
Yes
Yes
No
No
Outline
? Introduction
? Related Work
? Problem Formulation
? Proposed Methodology
? Experimental Results
? Conclusion and Future Work
27
Experimental Results
? Experimental Setting
? Standard Library
? Timing Engine
? Acceptance Probability
? Benchmark
? The Trend of Leakage Power Minimization
? Cost Comparison
28
Standard Library
? Cell Library in Synopsys Liberty format
? Combinational cells:
? 11 Footprints:
? in01, na02, na03, na04, no02, no03, no04, ao12, ao22, oa12 and oa22
? Each cell has 30 options
? 3 threshold voltage type and 10 gate size
? Sequential cells:
? 1 Footprints: ms08
29
Power, Capacitance, & Delay LUBs
30
Footprint:
in01
Leakage Power
(uW)
Capacitance
(fF)
Delay Time
(ps)
Vt Type
Gate Size
s m f s m f s m f
1 1 4 16 12.8 14.4 16 11.7 10.7 9.1
3 3 12 48 38.4 43.2 48 8.2 7.2 6.5
4 4 16 64 51.2 57.6 64 6.5 5.7 5.2
6 6 24 96 76.8 86.4 96 6.5 5.7 5.2
8 8 32 128 102.4 115.2 128 6.5 5.7 5.2
Delay time Look-Up Table
? Delay time = f(input slew, output load)
? 2D Linear Interpolation
31
Slew(ps)
Loads (fF)
5 10 15 20 25 30
0 6.5 7.6 8.8 10.0 11.1 12.3
1 7.8 9.0 10.2 11.4 12.6 13.8
2 9.1 10.3 11.5 12.8 14.0 15.2
3 10.4 11.7 12.9 14.2 15.5 16.7
Timing Engine
Runtime
(second/iteration)
PrimeTime?
Full
Functional Timer
Incremental Update
and Full Functional
Timer
DMA 10.00 1.50 0.00087
pci_bridge32 11.00 1.90 0.00096
des_perf 33.00 4.60 0.00067
vga_lcd 44.00 6.80 0.00208
b19 63.00 9.70 0.00238
leon3mp 375.00 26.30 0.01582
netcard 393.00 38.40 0.06694
32
Acceptance Probability
0
0.2
0.4
0.6
0.8
1
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
AcceptanceProbability
ΔC
33
High K
Low K
T*K
C
expProb.Accept.
??
?
old
C
old
CnewC
C
)( ?
??
Benchmark
Design # IO pins # Comb cells # Seq Cells # Total Cells
DMA 959 23K 2K 25K
pci_bridge32 361 30K 3K 33K
des_perf 374 102K 9K 111K
vga_lcd 184 148K 17K 165K
b19 47 213K 7K 219K
leon3mp 333 540K 109K 649K
netcard 1,846 861K 98K 959K
34
The Trend of Leakage Power Minimization
35
0
200000
400000
600000
800000
1000000
1200000
1
65
129
193
257
321
385
449
513
577
641
705
769
833
897
961
1025
leakagepower(μW)
iteration*18K
DMA
0
500000
1000000
1500000
2000000
1
67
133
199
265
331
397
463
529
595
661
727
793
859
925
991
1057
leakagepower(μW)
iteration*16K
pci_bridge32
0
1000000
2000000
3000000
4000000
5000000
1
105
209
313
417
521
625
729
833
937
1041
1145
1249
1353
1457
1561
1665
leakagepower(μW)
iteration*23K
des_perf
0
500000
1000000
1500000
2000000
1
69
137
205
273
341
409
477
545
613
681
749
817
885
953
1021
1089
leakagepower(μW)
iteration*15K
vga_lcd
The Trend of Leakage Power Minimization
(cont.)
36
0
500000
1000000
1500000
1
59
117
175
233
291
349
407
465
523
581
639
697
755
813
871
929
leakagepower(μW)
iteration*16K
b19
0
1000000
2000000
3000000
4000000
5000000
6000000
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
106
leakagepower(μW)
iteration*14K
netcard
0
1000000
2000000
3000000
4000000
5000000
6000000
1
21
41
61
81
101
121
141
161
181
201
221
241
261
281
301
321
leakagepower(μW)
iteration*15K
leon3mp
Cost Comparison
37
)
35
#
(*15
K
gates
RounduphhRuntime ??
3.71E+05
1.54E+06
2.05E+05
1.58E+05
1.47E+05
2.15E+05
4.51E+05
3.68E+05
0.E+00 5.E+05 1.E+06 2.E+06 2.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
DMA
3.51E+05
1.71E+06
2.03E+05
1.15E+05
1.16E+05
6.96E+05
2.26E+05
2.88E+05
0.E+00 5.E+05 1.E+06 2.E+06 2.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
pci_bridge32
1.54E+06
4.15E+06
6.74E+05
8.84E+05
6.97E+05
9.47E+05
2.28E+06
1.13E+06
0.E+00 2.E+06 4.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
des_perf
4.00E+05
1.47E+06
4.15E+05
3.78E+05
3.91E+05
4.63E+05
6.44E+05
7.53E+05
0.E+00 5.E+05 1.E+06 2.E+06 2.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
vga_lcd
◎ 73%
Cost Comparison (cont.)
38
7.32E+05
1.34E+06
6.27E+05
6.14E+05
7.36E+05
7.58E+05
8.62E+05
5.02E+06
0.E+00 2.E+06 4.E+06 6.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
b19
3.90E+06
4.78E+06
1.77E+06
1.97E+06
1.94E+06
1.81E+06
2.10E+06
2.00E+06
0.E+00 2.E+06 4.E+06 6.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
netcard
2.28E+06
5.40E+06
1.42E+06
1.79E+06
2.96E+06
1.47E+06
1.88E+06
1.92E+06
0.E+00 2.E+06 4.E+06 6.E+06
IR+SA
IR
NTUgs
UFRGS-BRAZIL
PowerValve
Goldilocks
eOPT
CUsizer
Total Leakage Power (μWatt)
leon3mp
Outline
? Introduction
? Related Work
? Problem Formulation
? Proposed Methodology
? Experimental Results
? Conclusion and Future Work
39
Conclusion
? An iterative algorithm is the necessary to initialization. Without using
it, the SA approach may not converge in fixed runtime.
? Our approach can reach a feasible solution in the same magnitude of
related works in all benchmarks.
? In some cases, our approach is resulted in a better solution than
previous work and reduce more than 70 % leakage power from initial
solution in sharp time.
40
Future Work
? Much realistic RC network model
? The leakage power minimization of the sequential circuit
41
Q&A
Thank you!
42

More Related Content

Leakage Power Minimization using SA-Based Gate Sizing and Threshold Voltage Assignment

  • 1. Leakage Power Minimization using SA-Based Gate Sizing and Threshold Voltage Assignment Chih-Chuan, Yu
  • 2. Outline ? Introduction ? Related Work ? Problem Formulation ? Proposed Methodology ? Experimental Results ? Conclusion and Future Work 2
  • 3. Introduction ? Low Power and High Performance ? Mobile device ? Leakage Power Rise ? ITRS Roadmap 2009 [33] ? Technology scales down 3
  • 4. Leakage Power Minimization Methods ? Gate Sizing ???? ???? 『 ??????? ????? 『 ??????? ???????? ? Threshold Voltage Assignment ? ??? 『 1/??????? ????? ? ??? 『 ????? ???? ? Low Vth on critical path ? High Vth on non-critical path 4
  • 5. Outline ? Introduction ? Related Work ? Problem Formulation ? Proposed Methodology ? Experimental Results ? Conclusion and Future Work 5
  • 6. Related Work 6 Continuous methods Discrete methods ? Linear Programming (LP) ? Geometric programming (GP) ? Sensitivity-based Approach ? Slack and delay Budgeting ? Dynamic Programming(DP) ? Lagrangian Relaxation (LR) ? Linear Programming (LP) ? Simulated Annealing (SA)
  • 7. Continuous Methods ? Linear Programming (LP) ? Linear delay model ? The selection of gates is defined as linear function ? Geometric programming (GP) ? Polynomial delay model 7
  • 8. Discrete Methods ? Sensitivity-based approach ? Score and Rank gates according to a defined sensitivity ? Iteratively select the best gate for optimization until no improvement can be made ? Slack and delay budgeting ? Allocate a slack budget to each gate ? Use the slack budget to trade the power for each gate. ? Dynamic Programming (DP) ? Use decision stage and cost-to-go function. 8
  • 9. Discrete Methods (cont.) ? Lagrangian Relaxation (LR) ? Covert constrained problem to unconstrained one. ? Lagrange multiplier ? Linear Programming (LP) ? The selection of gates is implemented by assigning value to a binary variable: 1 is chosen and 0 otherwise. ? Simulated Annealing (SA) ? Probabilistic method for finding a good approximation to the global optimum 9
  • 10. Related Work Comparison Methodology Pros Cons Continuous Sizing LP Fast Modeling Error Mapping IssueGP Discrete Sizing Sensitivity Local optimal Slack & Delay Ignore delay interaction LP DP Solution space explosion LR Large scale Solution Oscillate SA Global optimal Approximation Fast solution space exploration 10
  • 11. Outline ? Introduction ? Related Work ? Problem Formulation ? Proposed Methodology ? Experimental Results ? Conclusion and Future Work 11
  • 12. Motivational Example 12 Solution u1 u2 u3 Timing Violation Total Leakage Power Solution 1 s10 s06 s04 -2.32 26 Solution 2 s10 s06 f04 0 86 Solution 3 s10 s06 m04 0 38 n2n1 oa oa oa n3 n4 50ps u1 u2 u3
  • 13. Problem Formulation ? Inputs: ? Standard Cell Library ? Gate-level Netlist ? Timing Constraints ? Interconnect Parasitics ? Outputs: ? The selection of each cell¨s sizes and threshold voltage ? Objective: ? Satisfy all performance constraints ? Minimize total leakage power 13
  • 14. Performance constraints ? Slack violation: ? At PO and DFF inputs, it exists negative slack. ? Slew(Transition time) violation: ? At PO and cell input pins, the transition time is larger than the max limit transition time. ? Max-load violation: ? At cell output pins, the fan-out load summation is larger than the cell¨s max capacitance. 14
  • 15. Problem Assumptions ? Interconnect parasitics are modeled as lumped capacitance. ? Sequential sizing is not allowed. ? Only one selection for sequential cells. ? Ideal clock network ? No clock buffer, zero skew, and clock net has zero lumped capacitance. 15
  • 16. Outline ? Introduction ? Related Work ? Problem Formulation ? Proposed Methodology ? Experimental Results ? Conclusion and Future Work 16
  • 17. Proposed Methodology ? Phase I: Iterative Algorithm for Initial Solution ? Initial solution that satisfies the timing requirement ? Phase II: Simulated-Annealing-Based Algorithm ? Leakage power minimization 17
  • 18. Phase I: Pseudo Code Iterative Algorithm: upsize cells for feasible solution Inputs: netlist, cell library, timing constraints, and interconnect parasitics Outputs: each cell¨s size and threshold voltage assignment Step 1: Count the visited times of the cells traced by negative-slack paths Step 2: Sort by each cell counter Step 3: Iterative upsizing in above-defined order 18
  • 19. Phase I: Pseudo Code (Step 1) Step 1: Count the visited times of the cells traced by negative-slack paths Run timing engine to calculate each cell¨s slack; Initialize each cell¨s counter to zero; Initialize each cell¨s to smallest type-size; foreach (negative-slack paths) foreach (cells in the selected path) if (selected cell has negative slack) Increase selected cell¨s counter; 19
  • 20. Phase I: Pseudo Code (Step 2 & 3) Step 2: Sort by each cell counter Sort cell order by each cell¨s counter, from larger to small; Step 3: Iterative upsizing in above-defined order do foreach (cell from above-defined order) if (selected cell has negative slack) while (selected cell has larger type-size) if (new Pleakage < old Pleakage) Update type-size; until (no negative slack) 20
  • 21. Phase II: Simulated-Annealing-Based 1. Solution representation: ? The set of size and type of each cell. 2. Solution perturbation: ? Randomly pick a cell and change its size and threshold voltage assignment. 3. Cost function: ? Total leakage power. 4. Annealing schedule: (next slide) 21
  • 22. Phase II: SA ! Temperature check 22 IF T > ε THEN NEXT_ITER ELSE THEN FINISHED FINISHED START initialization T > ε Find new solution accept? Update current solution Update temperature(T) update T? Yes No Yes Yes No No
  • 23. Phase II: SA ! New solution 23 1. Randomly pick cell 2. Randomly pick new type and size 3. Call timer and Recalculate cost FINISHED START initialization T > ε Find new solution accept? Update current solution Update temperature(T) update T? Yes No Yes Yes No No
  • 24. Phase II: SA ! Solution acceptance 24 IF Cnew < Clast IF Cnew < Cbest THEN state = UPD ELSE state = NEW ELSE IF A.Prob. > Random THEN state = ACP ELSE state = REJ ? ?0,1expProb.Accept. *TK C ?? ?? old oldnew C CC C )( ? ?? ? ?1,0Random? FINISHED START initialization T > ε Find new solution accept? Update current solution Update temperature(T) update T? Yes No Yes Yes No No
  • 25. Phase II: SA ! Solution update 25 FINISHED START initialization T > ε Find new solution accept? Update current solution Update temperature(T) update T? Yes No Yes Yes No No IF state = UPD or NEW or ACP THEN Slast = Snew ELSE THEN Slast = Slast
  • 26. Phase II: SA ! Temperature update 26 IF γ > φ THEN DROP_TEMP ELSE THEN NEXT_ITER γ is the counter of successive state ^Reject ̄ φ is a constant variable FINISHED START initialization T > ε Find new solution accept? Update current solution Update temperature(T) update T? Yes No Yes Yes No No
  • 27. Outline ? Introduction ? Related Work ? Problem Formulation ? Proposed Methodology ? Experimental Results ? Conclusion and Future Work 27
  • 28. Experimental Results ? Experimental Setting ? Standard Library ? Timing Engine ? Acceptance Probability ? Benchmark ? The Trend of Leakage Power Minimization ? Cost Comparison 28
  • 29. Standard Library ? Cell Library in Synopsys Liberty format ? Combinational cells: ? 11 Footprints: ? in01, na02, na03, na04, no02, no03, no04, ao12, ao22, oa12 and oa22 ? Each cell has 30 options ? 3 threshold voltage type and 10 gate size ? Sequential cells: ? 1 Footprints: ms08 29
  • 30. Power, Capacitance, & Delay LUBs 30 Footprint: in01 Leakage Power (uW) Capacitance (fF) Delay Time (ps) Vt Type Gate Size s m f s m f s m f 1 1 4 16 12.8 14.4 16 11.7 10.7 9.1 3 3 12 48 38.4 43.2 48 8.2 7.2 6.5 4 4 16 64 51.2 57.6 64 6.5 5.7 5.2 6 6 24 96 76.8 86.4 96 6.5 5.7 5.2 8 8 32 128 102.4 115.2 128 6.5 5.7 5.2
  • 31. Delay time Look-Up Table ? Delay time = f(input slew, output load) ? 2D Linear Interpolation 31 Slew(ps) Loads (fF) 5 10 15 20 25 30 0 6.5 7.6 8.8 10.0 11.1 12.3 1 7.8 9.0 10.2 11.4 12.6 13.8 2 9.1 10.3 11.5 12.8 14.0 15.2 3 10.4 11.7 12.9 14.2 15.5 16.7
  • 32. Timing Engine Runtime (second/iteration) PrimeTime? Full Functional Timer Incremental Update and Full Functional Timer DMA 10.00 1.50 0.00087 pci_bridge32 11.00 1.90 0.00096 des_perf 33.00 4.60 0.00067 vga_lcd 44.00 6.80 0.00208 b19 63.00 9.70 0.00238 leon3mp 375.00 26.30 0.01582 netcard 393.00 38.40 0.06694 32
  • 34. Benchmark Design # IO pins # Comb cells # Seq Cells # Total Cells DMA 959 23K 2K 25K pci_bridge32 361 30K 3K 33K des_perf 374 102K 9K 111K vga_lcd 184 148K 17K 165K b19 47 213K 7K 219K leon3mp 333 540K 109K 649K netcard 1,846 861K 98K 959K 34
  • 35. The Trend of Leakage Power Minimization 35 0 200000 400000 600000 800000 1000000 1200000 1 65 129 193 257 321 385 449 513 577 641 705 769 833 897 961 1025 leakagepower(μW) iteration*18K DMA 0 500000 1000000 1500000 2000000 1 67 133 199 265 331 397 463 529 595 661 727 793 859 925 991 1057 leakagepower(μW) iteration*16K pci_bridge32 0 1000000 2000000 3000000 4000000 5000000 1 105 209 313 417 521 625 729 833 937 1041 1145 1249 1353 1457 1561 1665 leakagepower(μW) iteration*23K des_perf 0 500000 1000000 1500000 2000000 1 69 137 205 273 341 409 477 545 613 681 749 817 885 953 1021 1089 leakagepower(μW) iteration*15K vga_lcd
  • 36. The Trend of Leakage Power Minimization (cont.) 36 0 500000 1000000 1500000 1 59 117 175 233 291 349 407 465 523 581 639 697 755 813 871 929 leakagepower(μW) iteration*16K b19 0 1000000 2000000 3000000 4000000 5000000 6000000 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 leakagepower(μW) iteration*14K netcard 0 1000000 2000000 3000000 4000000 5000000 6000000 1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 leakagepower(μW) iteration*15K leon3mp
  • 37. Cost Comparison 37 ) 35 # (*15 K gates RounduphhRuntime ?? 3.71E+05 1.54E+06 2.05E+05 1.58E+05 1.47E+05 2.15E+05 4.51E+05 3.68E+05 0.E+00 5.E+05 1.E+06 2.E+06 2.E+06 IR+SA IR NTUgs UFRGS-BRAZIL PowerValve Goldilocks eOPT CUsizer Total Leakage Power (μWatt) DMA 3.51E+05 1.71E+06 2.03E+05 1.15E+05 1.16E+05 6.96E+05 2.26E+05 2.88E+05 0.E+00 5.E+05 1.E+06 2.E+06 2.E+06 IR+SA IR NTUgs UFRGS-BRAZIL PowerValve Goldilocks eOPT CUsizer Total Leakage Power (μWatt) pci_bridge32 1.54E+06 4.15E+06 6.74E+05 8.84E+05 6.97E+05 9.47E+05 2.28E+06 1.13E+06 0.E+00 2.E+06 4.E+06 IR+SA IR NTUgs UFRGS-BRAZIL PowerValve Goldilocks eOPT CUsizer Total Leakage Power (μWatt) des_perf 4.00E+05 1.47E+06 4.15E+05 3.78E+05 3.91E+05 4.63E+05 6.44E+05 7.53E+05 0.E+00 5.E+05 1.E+06 2.E+06 2.E+06 IR+SA IR NTUgs UFRGS-BRAZIL PowerValve Goldilocks eOPT CUsizer Total Leakage Power (μWatt) vga_lcd ◎ 73%
  • 38. Cost Comparison (cont.) 38 7.32E+05 1.34E+06 6.27E+05 6.14E+05 7.36E+05 7.58E+05 8.62E+05 5.02E+06 0.E+00 2.E+06 4.E+06 6.E+06 IR+SA IR NTUgs UFRGS-BRAZIL PowerValve Goldilocks eOPT CUsizer Total Leakage Power (μWatt) b19 3.90E+06 4.78E+06 1.77E+06 1.97E+06 1.94E+06 1.81E+06 2.10E+06 2.00E+06 0.E+00 2.E+06 4.E+06 6.E+06 IR+SA IR NTUgs UFRGS-BRAZIL PowerValve Goldilocks eOPT CUsizer Total Leakage Power (μWatt) netcard 2.28E+06 5.40E+06 1.42E+06 1.79E+06 2.96E+06 1.47E+06 1.88E+06 1.92E+06 0.E+00 2.E+06 4.E+06 6.E+06 IR+SA IR NTUgs UFRGS-BRAZIL PowerValve Goldilocks eOPT CUsizer Total Leakage Power (μWatt) leon3mp
  • 39. Outline ? Introduction ? Related Work ? Problem Formulation ? Proposed Methodology ? Experimental Results ? Conclusion and Future Work 39
  • 40. Conclusion ? An iterative algorithm is the necessary to initialization. Without using it, the SA approach may not converge in fixed runtime. ? Our approach can reach a feasible solution in the same magnitude of related works in all benchmarks. ? In some cases, our approach is resulted in a better solution than previous work and reduce more than 70 % leakage power from initial solution in sharp time. 40
  • 41. Future Work ? Much realistic RC network model ? The leakage power minimization of the sequential circuit 41

Editor's Notes

  • #3: 遍枠頁 Introduction 議何蛍
  • #4: F書議徨a瞳喩凪頁返隔b崔埆輳{ 詰債嬬 才 互丼嬬 議惹簧圈 S广親室議M業唱Au殻議處M息送侭夛撹議唹來厚頁晩嶷勣。 功 ITRS roadmap 壓 2009定議鷂耿儘験息送侭夛撹議壓 2016定 奐紗 為蛍岻400。 咀緩週詰息送議柊払∧盃慴議h}。
  • #5: 遇壓@鐙猟嶄厘駻 gate sizing 才 threshold voltage assignment 恬蚓週詰 leakage power 議圭隈。 Gate 議樫雁唹啜陳楞Γ嗽才息送撹屎曳咀緩x喘m議樫雁嬬p富息送峙。 総翌Threshold Voltage Assignmentt頁旋喘y峙叉通慚圍互Vth delay^L徽頁息送^弌辛參喘豢non-critical path。 遇詰 Vth辛喘豢critical path參憲栽timing requirement.
  • #6: 俊和栖頁犢慫仂慎腸新
  • #7: Gate sizing and threshold voltage assignmnet 琲P議冩梢90定旗_兵厮幟u鞭欺嶷咀緩υ擴四議朕剖侭戻竃議光N圭隈遇麼勣辛蛍百齊蛍e蛆黯議continuous methods才嘔議 discrete methods。 Continuous method 麼勣嗤 linear programming 才 geometric programming 彪N圭隈。 Discrete methods t嗤 sensitivity-based approach, slack and delay budgeting 吉鎗N圭隈。 參和厘玉初B光N圭隈K拝壓緩何蛍議恷瘁恂匯弌Y。
  • #8: 壓continuous method 嶄 linear programming power model才 gate selection協x格來痕機 遇 geometric programming M匯化 power model 協x撹 謹塀痕機 Modeling Error: misleads optimization due to the inaccuracy of delay and power models. Mapping Issue: makes no guarantee on mapping a continuous solution to a discrete one.
  • #9: 壓 discrete gate sizing 議圭隈嶄遍枠心欺 sensitivity- based approach麿協x阻光徭蒙議 sensitivity 功@協xo耽gateu蛍K拝電會x竃恷互蛍議gateK恂 leakage minimization 岷欺o隈壅晒。 壅輅 slack and delay budgeting 麿孀竃光gate議slack ^寄slack 議 gate Q撹 leakage power議gate。 及眉何蛍頁駻DP 麿頁circuit烝^卆會Q協assignment
  • #10: LR 頁 constrained problem DQ撹 unconstrained problem 肇箔larangian multiplier議盾 LP 嗤e豢continuous method size and threshold voltage議x喘個撹匯binary variable躅榁rounding} SA t頁 壓曜諮議^殻嶄嗤l周仇俊鞭^餓議盾參箔俊除恷煮盾議盾遇拝m喘壓x柊議盾腎g(discrete larger search space)
  • #11: @燕鯉双竃光圭隈議髪c辛參心欺LP, GP, Sensitivity, Slack & delay @ラ圭隈議c頁箔盾rg玉DPt頁嗤丼楕仇孀竃盾LR辛參喘豢寄樫オ}SAt辛參孀欺畠囃恷煮盾(global optimal)議除貌盾 徽continuous sizing頁modeling error才 mapping issues念宀delay才power model撹曳^音粉_議來痕毅瘁宀t嗤辛嬬o隈孀欺x柊議盾 Sensitivity鞭溲^囃來恷煮盾(local optimal) Slack&Delay 才 LP @圭隈策待阻 効gate侭夛撹議delay interaction DP鞭溲 盾腎g議寄弌 LRt辛參壓箔盾^殻囑欺寳Uo隈辺浸鮨壓^囃來恷煮盾(local optimal) SAt俶勣嬬鮨貳ニそ眇實g議圭隈
  • #12: 俊和輅 Problem Formulation議何蛍厘}楚晒參旋蛍裂才箔盾
  • #13: 遍枠頁匯 motivational example 軆矚gate size and threshold voltage assignment 議娼舞。 辛參心欺眉堪俊議inverters 隼瘁 n1 恠欺 n4 議恷寄rg 50ps 厘壓u3x喘音揖議threshold voltage type 辛參心欺 solution 1 `郡阻timing requirement 嗤negative slack Solution2才soultion3 脅嗤壓_欺rg渣藤凪嶄solution議leakage power 曳^弌 咀緩厘岑祇 m議 assignment 音峪嬬欺M怎rg渣藤呀嬬鮟亀預送
  • #14: 壅輅problem formulation議input 膨何蛍蛍e頁 standard cell library(Synopsis format), gate-level netlist(verilog fromat 協x耽cell議兆Q才net議B俊), timing constraints (SDC format 渣clock period才clock input name 嗤input/output delay 參式 PI議transition time才PO議load) 才 interconnect parasitics (SPEF format 協x耽 net 議 篠伏否) 隼瘁output頁壓design嶄議光cell 恷瘁議 gate size 才 threshold voltage 議x喘 厘議朕吠拝M怎performance constraints 才 恷弌晒 total leakage power Performance constrains 厘壓和匯誘唹頭盾
  • #15: Performance constraints 頁勣閲窒@眉violation蛍e slack, slew, max-load violations Slack violation 頁峺飛壓PO 才 DFF議input 嗤negative slack Slew violation t頁峺飛壓PO才cell input pins 嗤slew 寄豢 協x壓library嶄議 max limitation slew 恷瘁 max-load violation 頁壓cell output pins 飛cell fan-out load議才 寄豢 @ cell議恷寄否峙 Ps: pico second (10^-12) Ff: femto farad (10^-15)
  • #16: K拝ξ議h廠恂阻乂邪O參閲窒h廠^豢}s。 遍枠 network札珎BY議篠伏否協x lumped capacitance參圭宴麻。 及屈厘峪combinational circuit恬assignmentsequential sizing音深]。 及眉邪Oclock network 頁尖誦毅匆祥頁]嗤 clock buffer, ]嗤 skew 參式 clock net ]嗤lumped capacitance
  • #17: 俊和栖頁初府厘断戻竃議譜柴圭隈
  • #18: 壓厘議O圭隈嶄蛍撹A粁。 及匯A粁頁\喘 iterative algorithm 箔竃匯辛參M怎 timing requirement議兜兵峙 壅住喇及屈A粁議 Simulated annealing algorithm 恬M匯化議週詰息送峙。
  • #19: 及匯A粁議Iterative algorithm Input 才 output 壓岻念議formulations 嗤戻^。 辛蛍撹眉化E厘壓俊和躓塚錦案f苧光化E。
  • #20: 及匯化厘議朕議頁孀竃耽cell 瓜ラlnegative slack揃trace ^ 遍枠厘枠麻肇光cell議rg餓cell countO0光 cellO撹恷弌議type size 孀欺侭嗤議negative slack 揃彴隼瘁肇心壓@揃忭狼鎮 cell 飛頁negative slack 祥 cell counter 紗匯
  • #21: 及屈化功麻^議峙肇恂電會。瓜trace^埆謹肝議cell電會埆念中。(飛匯咤t孚topology order。) 恷瘁匯化E厘卆孚議電會肇z臥cell頁倦嗤negative slack邪泌嗤祥Q撹^寄議type size岷欺憲栽timing constraints. (匯協]嗤violation? A:云輅杯topology order囑欺@}瘁躰喩詛化E祥辛參閲窒。)
  • #22: 及屈A粁SA厘蛍膨何蛍盾 遍枠 SA 議 solution 頁峺光cell侭喘議type size 及屈 solution議_喨 SCx函匯cell K個麿議 type size 及眉 cost function 頁 total leakage power 恷瘁議annealing schedule 壓flow chart 輻f苧
  • #23: z臥惷畔之饅亀bound 頁阻祥Y崩
  • #24: SCx函匯cell K個麿議 type size 柵出 timer 嶷仟麻cost
  • #25: 隼瘁登獎音勣俊鞭仟議 solution 邪泌仟議cost 曳 f議cost 弌 壅肇登猜之餘 best cost躓頂達嗤阻祥stateOUPDK効仟欺best cost。]嗤阻祥stateONEW。 壅輅else議何蛍肇麻 匯 01岻g議y丘 才 acceptance probability。 acceptance probability 頁匯峺戯 exponential 議 (_____) 肝圭 盾delta cost (豢fcost議個曳箭)
  • #26: 心頁陳state Q協勣音勣効仟 current solution 欺 last solution
  • #27: 心拙e議state REJ欺阻]欺阻週悄 γ: gamma φ: phi
  • #29: O協 (library, timing engine, AP, benchmark) 曳^
  • #30: Combinational Sequential 音深]
  • #31: Power才 capacitance Delay 郡來
  • #32: Delay 頁功slew load臥燕 屈S 來伐綏
  • #33: Netcard 酔 6認蔚 DMA pci 酔 1f蔚 Des 酔 5f蔚 Vga b19 loen 酔 3f蔚
  • #34: AP巷塀 念中嗤戻^ 峺戯 low k high k 嘔 音揖K K湊寄音叟辺殖runtime渣董 K湊弌SA豢local optimal
  • #35: yY 7M Cell count 2f 欺 95f
  • #42: RC network Sequential gate sizing