際際滷

際際滷Share a Scribd company logo
A Quantitative Study of Irregular Programs on
                    GPUs




                  By
           Prashant Momale
              IIT Kanpur

              Guided By
         Prof. S. K. Aggarwal
Introduction

Regular vs Irregular Algorithms
- Regular Programs
(i) operate on large vectors or matrices
(ii) access them in statically predictable ways


- These codes often have high computational Demands
- exhibit extensive data parallelism
- access memory in a streaming fashion, and require little synchronization
i.e. Matrix Multiplication
Introduction(Continue...)

Irregular Programs

- build, traverse, and update irregular data structures such as trees, graphs, and priority
   queues
i.e. domains like n-body simulation, data mining, decisions problems that use Boolean
   satisfiability, optimization theory, social networks
- more difficult to parallelize
- more challenging to map to GPUs than regular programs
Introduction(Continue...)

Many Questions to be solved

- Several GPU implementation of irregular programs have been published but very little
   is known about them
- Some questions do not have clear answers like
(i) Does irregularity really manifest itself as a binary property?
(ii) How is the irregularity behavior of an application influenced by its input, if at all?
(iii) Does an increase in irregularity necessarily degrade performance or might it help in
     certain cases?
- Answers to above questions are really important to understand the behavior of irregular
   programs
Irregularity

Regular Programs
- Control flow and memory access are not data dependent
Ex. In matrix multiplication, knowing source code, starting address and input size and
   without knowing any matrix elements we can predict the behavior



Irregular Programs
- Control flow and memory access are data dependent
- Input values determine the program's behavior
Ex. Binary Search Tree implementation
   The values and the order in which they are processed affect the control flow and
   memory references
Irregularity (Continued....)

Warp Concept

- GPU contains processing elements (PEs) and tightly coupled PEs form a streaming
 multiprocessor (SM).
- Each PE in an SM can run an independent thread of instructions
- The PEs in each SM execute vector instructions that conditionally operate on 32 data
   items.
- A set of 32 threads that run together in this fashion is called a warp.
Irregularity (Continued....)

Control Flow Irregularity
- Sometimes all threads in warp can not perform same instruction.
- Threads automatically get subdivided into sets
- Threads from set performs same instruction
- But sets get executed in serial manner until they re-converge.


Situation where not all threads in warp follow the same control flow is call Thread
   Divergence.
This is a Control Flow Irregularity
Irregularity (Continued....)

Memory Access Irregularity
- Coalesced memory transaction
- When memory access is not coalesced, hardware has to perform many memory
   transactions, one after the other, compared to coalesced access.
This is how Memory Access Irregularity can lower the performance.


- Bank Conflict : Warp can simultaneously access 32 words in shared memory as long
   as they reside in different banks. If more than one word is touched within a bank
   bank conflict occurs.
Bank Conflict is another reason of memory access irregularity
Metrics of Irregularity


(i) Control Flow Irregularity


           CFI = (divergent branches ) / (executed instructions)


(ii)Memory-Access Irregularity


           MAI = ( replayed instructions) / ( issued instructions)
Metrics of Irregularity(Continued...)

- Both metrics ranges from 0% to 100%
- Higher the values higher is the irregularity
- CFI is usually low
- They are independent of runtime
- Both metric s measure irregularity at warp level


These metrics do not classify a program as regular or irregular. Rather, they
  measure the Degree of Irregularity
Results and Analysis
- Analysis of observations about the irregularity exhibited by various CUDA kernels has
   be presented.
- Investigated the effect of different program inputs
- Effect of optimizations on programs
- Variability of the results between different runs
 (i) on same GPU
 (ii) on different GPU
(Benchmarks Used :
Irregular - BFS, Barnes Hut, Data Compression, Delaunay Mesh Refinement,
            Points-to Analysis, Survey Propagation, Single Source Shortest Path, TSP
 Regular - Black Scholes, Histogram, Monte Carlo, Matrix Multiplication, N-Body )
Results and Analysis(Continued....)
Amount of Irregularity

- CFI is usually very low. For above benchmarks it is less than 4.1%
- Most of the programs can not strictly classified as regular or irregular
- Two irregularities appear to be independent of each other
- Irregular control flow generally implies irregular memory access
Results and Analysis(Continued....)

Input Sensitivity
- Input sensitivity is very difficult to predict
- Difficult to do it in application independent way


(i) Input Oblivious - Irregularity remains largely constant for different inputs
(ii) Input-type Dependent - Irregularity varies largely across different types of inputs
                            rather than within a single type
(iii) Input Dependent  Irregularity varies as size of the input varies
Results and Analysis(Continued....)


(iii) Arithmetic Precision 
    Change from single precision to double precision increases CFI and MAI for small
      inputs but decreases both for medium and large inputs
    But the change is very small.
    - It indicates that change in arithmetic precision does not affect the irregularity of
        program.
Results and Analysis(Continued....)

Variability

- Observed for several kernels on different GPUs and same GPUs for multiple runs


Irregularities are quite stable for same GPU and vary somewhat between distinct
   GPUs
Conclusion

- There is no type of programs as regular or irregular
- Irregularity is not necessarily bad for the performance
- By definition, irregular programs are data dependent but deferent inputs yield similar
   degrees of irregularity
- Irregularity does no vary much between distinct GPUs


It is expected that above conclusions hold across a broad range of CUDA-capable GPUs
     and hope that it will increase the understanding of the behavior of irregular GPU
     applications.
References

Paper : A Quantitative Study of Irregular Programs on GPUs
             By - Rupesh Nasre, Keshav Pingali, Martin Burtscher
                  Texas State University
             Published in  IEEE International Symposium on
                            Workload Characterization ( IISWC '13 )
Results and Analysis(Continued....)
Effect of Optimizations and Arithmetic Precision

(i) Regular version of one program reads records from global memory but in optimized
    version if calculates the record values on the fly.
  - This actually increase the Control Flow Irregularity
  - But faster is the performance because computations are cheaper than reading values
   from global memory.


(ii) In optimized Single Source Shortest Path algorithm, nodes which are logically close
    to each other are kept close in memory.
   - It increase the Memory-Access Irregularity but increases the spatial locality
Ad

Recommended

Distributed computing by Dr.C.R.Dhivyaa, Assistant Professor, Kongu Engineeri...
Distributed computing by Dr.C.R.Dhivyaa, Assistant Professor, Kongu Engineeri...
Dhivyaa C.R
A novel algorithm to protect and manage memory locations
A novel algorithm to protect and manage memory locations
iosrjce
Risk Assessment of Construction Projects Using Network Based Adaptive Fuzzy S...
Risk Assessment of Construction Projects Using Network Based Adaptive Fuzzy S...
Hani Nelly Sukma
TYBSC CS SEM 5 AI NOTES
TYBSC CS SEM 5 AI NOTES
WE-IT TUTORIALS
Hybrid systems
Hybrid systems
Dr. C.V. Suresh Babu
Applications of hybrid systems
Applications of hybrid systems
Dr. C.V. Suresh Babu
Flow control in computer
Flow control in computer
rud_d_rcks
Alft
Alft
Vincenzo De Florio
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
inside-BigData.com
TR-CIS-0420-09 BobZigon
TR-CIS-0420-09 BobZigon
Bob Zigon
Catastrophic Cancellation
Catastrophic Cancellation
C4Media
High performance GPU computing with Ruby
High performance GPU computing with Ruby
Prasun Anand
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
npinto
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
npinto
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
npinto
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Silvio Cesare
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
Subhajit Sahu
PraveenBOUT++
PraveenBOUT++
Praveen Narayanan
Hybrid Model Based Testing Tool Architecture for Exascale Computing System
Hybrid Model Based Testing Tool Architecture for Exascale Computing System
CSCJournals
Paralell
Paralell
Mark Vicuna
Fahroo - Computational Mathematics - Spring Review 2012
Fahroo - Computational Mathematics - Spring Review 2012
The Air Force Office of Scientific Research
A Novel Analysis Space For Pointer Analysis And Its Application For Bug Finding
A Novel Analysis Space For Pointer Analysis And Its Application For Bug Finding
Scott Donald
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
Testing the Numerical Precisions Required to Execute Real World Programs
Testing the Numerical Precisions Required to Execute Real World Programs
ijseajournal
L09.pdf
L09.pdf
TRNHONGLINHBCHCM
A Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen Donigian
Data Con LA
Thinking in parallel ab tuladev
Thinking in parallel ab tuladev
Pavel Tsukanov
(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...
(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...
Naoki Shibata
Overview of Off Boarding in Odoo 18 Employees
Overview of Off Boarding in Odoo 18 Employees
Celine George
Wax Moon, Richmond, VA. Terrence McPherson
Wax Moon, Richmond, VA. Terrence McPherson
TerrenceMcPherson1

More Related Content

Similar to Irregular Programs on GPU (20)

Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
inside-BigData.com
TR-CIS-0420-09 BobZigon
TR-CIS-0420-09 BobZigon
Bob Zigon
Catastrophic Cancellation
Catastrophic Cancellation
C4Media
High performance GPU computing with Ruby
High performance GPU computing with Ruby
Prasun Anand
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
npinto
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
npinto
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
npinto
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Silvio Cesare
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
Subhajit Sahu
PraveenBOUT++
PraveenBOUT++
Praveen Narayanan
Hybrid Model Based Testing Tool Architecture for Exascale Computing System
Hybrid Model Based Testing Tool Architecture for Exascale Computing System
CSCJournals
Paralell
Paralell
Mark Vicuna
Fahroo - Computational Mathematics - Spring Review 2012
Fahroo - Computational Mathematics - Spring Review 2012
The Air Force Office of Scientific Research
A Novel Analysis Space For Pointer Analysis And Its Application For Bug Finding
A Novel Analysis Space For Pointer Analysis And Its Application For Bug Finding
Scott Donald
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
Testing the Numerical Precisions Required to Execute Real World Programs
Testing the Numerical Precisions Required to Execute Real World Programs
ijseajournal
L09.pdf
L09.pdf
TRNHONGLINHBCHCM
A Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen Donigian
Data Con LA
Thinking in parallel ab tuladev
Thinking in parallel ab tuladev
Pavel Tsukanov
(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...
(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...
Naoki Shibata
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
inside-BigData.com
TR-CIS-0420-09 BobZigon
TR-CIS-0420-09 BobZigon
Bob Zigon
Catastrophic Cancellation
Catastrophic Cancellation
C4Media
High performance GPU computing with Ruby
High performance GPU computing with Ruby
Prasun Anand
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
[Harvard CS264] 11b - Analysis-Driven Performance Optimization with CUDA (Cli...
npinto
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
[Harvard CS264] 11a - Programming the Memory Hierarchy with Sequoia (Mike Bau...
npinto
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
npinto
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Silvio Cesare
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
Is Multicore Hardware For General-Purpose Parallel Processing Broken? : Notes
Subhajit Sahu
Hybrid Model Based Testing Tool Architecture for Exascale Computing System
Hybrid Model Based Testing Tool Architecture for Exascale Computing System
CSCJournals
A Novel Analysis Space For Pointer Analysis And Its Application For Bug Finding
A Novel Analysis Space For Pointer Analysis And Its Application For Bug Finding
Scott Donald
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
Testing the Numerical Precisions Required to Execute Real World Programs
Testing the Numerical Precisions Required to Execute Real World Programs
ijseajournal
A Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen Donigian
Data Con LA
Thinking in parallel ab tuladev
Thinking in parallel ab tuladev
Pavel Tsukanov
(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...
(Paper) Efficient Evaluation Methods of Elementary Functions Suitable for SIM...
Naoki Shibata

Recently uploaded (20)

Overview of Off Boarding in Odoo 18 Employees
Overview of Off Boarding in Odoo 18 Employees
Celine George
Wax Moon, Richmond, VA. Terrence McPherson
Wax Moon, Richmond, VA. Terrence McPherson
TerrenceMcPherson1
How to Manage Multi Language for Invoice in Odoo 18
How to Manage Multi Language for Invoice in Odoo 18
Celine George
University of Ghana Cracks Down on Misconduct: Over 100 Students Sanctioned
University of Ghana Cracks Down on Misconduct: Over 100 Students Sanctioned
Kweku Zurek
BINARY files CSV files JSON files with example.pptx
BINARY files CSV files JSON files with example.pptx
Ramakrishna Reddy Bijjam
How to Manage Different Customer Addresses in Odoo 18 Accounting
How to Manage Different Customer Addresses in Odoo 18 Accounting
Celine George
Health Care Planning and Organization of Health Care at Various Levels Unit...
Health Care Planning and Organization of Health Care at Various Levels Unit...
RAKESH SAJJAN
LAZY SUNDAY QUIZ "A GENERAL QUIZ" JUNE 2025 SMC QUIZ CLUB, SILCHAR MEDICAL CO...
LAZY SUNDAY QUIZ "A GENERAL QUIZ" JUNE 2025 SMC QUIZ CLUB, SILCHAR MEDICAL CO...
Ultimatewinner0342
Communicable Diseases and National Health Programs Unit 9 | B.Sc Nursing 5t...
Communicable Diseases and National Health Programs Unit 9 | B.Sc Nursing 5t...
RAKESH SAJJAN
Plate Tectonic Boundaries and Continental Drift Theory
Plate Tectonic Boundaries and Continental Drift Theory
Marie
How to Manage Inventory Movement in Odoo 18 POS
How to Manage Inventory Movement in Odoo 18 POS
Celine George
ECONOMICS, DISASTER MANAGEMENT, ROAD SAFETY - STUDY MATERIAL [10TH]
ECONOMICS, DISASTER MANAGEMENT, ROAD SAFETY - STUDY MATERIAL [10TH]
SHERAZ AHMAD LONE
ENGLISH_Q1_W1 PowerPoint grade 3 quarter 1 week 1
ENGLISH_Q1_W1 PowerPoint grade 3 quarter 1 week 1
jutaydeonne
Paper 109 | Archetypal Journeys in Interstellar: Exploring Universal Themes...
Paper 109 | Archetypal Journeys in Interstellar: Exploring Universal Themes...
Rajdeep Bavaliya
The Man In The Back Exceptional Delaware.pdf
The Man In The Back Exceptional Delaware.pdf
dennisongomezk
Birnagar High School Platinum Jubilee Quiz.pptx
Birnagar High School Platinum Jubilee Quiz.pptx
Sourav Kr Podder
Non-Communicable Diseases and National Health Programs Unit 10 | B.Sc Nursi...
Non-Communicable Diseases and National Health Programs Unit 10 | B.Sc Nursi...
RAKESH SAJJAN
2025 June Year 9 Presentation: Subject selection.pptx
2025 June Year 9 Presentation: Subject selection.pptx
mansk2
ABCs of Bookkeeping for Nonprofits TechSoup.pdf
ABCs of Bookkeeping for Nonprofits TechSoup.pdf
TechSoup
This is why students from these 44 institutions have not received National Se...
This is why students from these 44 institutions have not received National Se...
Kweku Zurek
Overview of Off Boarding in Odoo 18 Employees
Overview of Off Boarding in Odoo 18 Employees
Celine George
Wax Moon, Richmond, VA. Terrence McPherson
Wax Moon, Richmond, VA. Terrence McPherson
TerrenceMcPherson1
How to Manage Multi Language for Invoice in Odoo 18
How to Manage Multi Language for Invoice in Odoo 18
Celine George
University of Ghana Cracks Down on Misconduct: Over 100 Students Sanctioned
University of Ghana Cracks Down on Misconduct: Over 100 Students Sanctioned
Kweku Zurek
BINARY files CSV files JSON files with example.pptx
BINARY files CSV files JSON files with example.pptx
Ramakrishna Reddy Bijjam
How to Manage Different Customer Addresses in Odoo 18 Accounting
How to Manage Different Customer Addresses in Odoo 18 Accounting
Celine George
Health Care Planning and Organization of Health Care at Various Levels Unit...
Health Care Planning and Organization of Health Care at Various Levels Unit...
RAKESH SAJJAN
LAZY SUNDAY QUIZ "A GENERAL QUIZ" JUNE 2025 SMC QUIZ CLUB, SILCHAR MEDICAL CO...
LAZY SUNDAY QUIZ "A GENERAL QUIZ" JUNE 2025 SMC QUIZ CLUB, SILCHAR MEDICAL CO...
Ultimatewinner0342
Communicable Diseases and National Health Programs Unit 9 | B.Sc Nursing 5t...
Communicable Diseases and National Health Programs Unit 9 | B.Sc Nursing 5t...
RAKESH SAJJAN
Plate Tectonic Boundaries and Continental Drift Theory
Plate Tectonic Boundaries and Continental Drift Theory
Marie
How to Manage Inventory Movement in Odoo 18 POS
How to Manage Inventory Movement in Odoo 18 POS
Celine George
ECONOMICS, DISASTER MANAGEMENT, ROAD SAFETY - STUDY MATERIAL [10TH]
ECONOMICS, DISASTER MANAGEMENT, ROAD SAFETY - STUDY MATERIAL [10TH]
SHERAZ AHMAD LONE
ENGLISH_Q1_W1 PowerPoint grade 3 quarter 1 week 1
ENGLISH_Q1_W1 PowerPoint grade 3 quarter 1 week 1
jutaydeonne
Paper 109 | Archetypal Journeys in Interstellar: Exploring Universal Themes...
Paper 109 | Archetypal Journeys in Interstellar: Exploring Universal Themes...
Rajdeep Bavaliya
The Man In The Back Exceptional Delaware.pdf
The Man In The Back Exceptional Delaware.pdf
dennisongomezk
Birnagar High School Platinum Jubilee Quiz.pptx
Birnagar High School Platinum Jubilee Quiz.pptx
Sourav Kr Podder
Non-Communicable Diseases and National Health Programs Unit 10 | B.Sc Nursi...
Non-Communicable Diseases and National Health Programs Unit 10 | B.Sc Nursi...
RAKESH SAJJAN
2025 June Year 9 Presentation: Subject selection.pptx
2025 June Year 9 Presentation: Subject selection.pptx
mansk2
ABCs of Bookkeeping for Nonprofits TechSoup.pdf
ABCs of Bookkeeping for Nonprofits TechSoup.pdf
TechSoup
This is why students from these 44 institutions have not received National Se...
This is why students from these 44 institutions have not received National Se...
Kweku Zurek
Ad

Irregular Programs on GPU

  • 1. A Quantitative Study of Irregular Programs on GPUs By Prashant Momale IIT Kanpur Guided By Prof. S. K. Aggarwal
  • 2. Introduction Regular vs Irregular Algorithms - Regular Programs (i) operate on large vectors or matrices (ii) access them in statically predictable ways - These codes often have high computational Demands - exhibit extensive data parallelism - access memory in a streaming fashion, and require little synchronization i.e. Matrix Multiplication
  • 3. Introduction(Continue...) Irregular Programs - build, traverse, and update irregular data structures such as trees, graphs, and priority queues i.e. domains like n-body simulation, data mining, decisions problems that use Boolean satisfiability, optimization theory, social networks - more difficult to parallelize - more challenging to map to GPUs than regular programs
  • 4. Introduction(Continue...) Many Questions to be solved - Several GPU implementation of irregular programs have been published but very little is known about them - Some questions do not have clear answers like (i) Does irregularity really manifest itself as a binary property? (ii) How is the irregularity behavior of an application influenced by its input, if at all? (iii) Does an increase in irregularity necessarily degrade performance or might it help in certain cases? - Answers to above questions are really important to understand the behavior of irregular programs
  • 5. Irregularity Regular Programs - Control flow and memory access are not data dependent Ex. In matrix multiplication, knowing source code, starting address and input size and without knowing any matrix elements we can predict the behavior Irregular Programs - Control flow and memory access are data dependent - Input values determine the program's behavior Ex. Binary Search Tree implementation The values and the order in which they are processed affect the control flow and memory references
  • 6. Irregularity (Continued....) Warp Concept - GPU contains processing elements (PEs) and tightly coupled PEs form a streaming multiprocessor (SM). - Each PE in an SM can run an independent thread of instructions - The PEs in each SM execute vector instructions that conditionally operate on 32 data items. - A set of 32 threads that run together in this fashion is called a warp.
  • 7. Irregularity (Continued....) Control Flow Irregularity - Sometimes all threads in warp can not perform same instruction. - Threads automatically get subdivided into sets - Threads from set performs same instruction - But sets get executed in serial manner until they re-converge. Situation where not all threads in warp follow the same control flow is call Thread Divergence. This is a Control Flow Irregularity
  • 8. Irregularity (Continued....) Memory Access Irregularity - Coalesced memory transaction - When memory access is not coalesced, hardware has to perform many memory transactions, one after the other, compared to coalesced access. This is how Memory Access Irregularity can lower the performance. - Bank Conflict : Warp can simultaneously access 32 words in shared memory as long as they reside in different banks. If more than one word is touched within a bank bank conflict occurs. Bank Conflict is another reason of memory access irregularity
  • 9. Metrics of Irregularity (i) Control Flow Irregularity CFI = (divergent branches ) / (executed instructions) (ii)Memory-Access Irregularity MAI = ( replayed instructions) / ( issued instructions)
  • 10. Metrics of Irregularity(Continued...) - Both metrics ranges from 0% to 100% - Higher the values higher is the irregularity - CFI is usually low - They are independent of runtime - Both metric s measure irregularity at warp level These metrics do not classify a program as regular or irregular. Rather, they measure the Degree of Irregularity
  • 11. Results and Analysis - Analysis of observations about the irregularity exhibited by various CUDA kernels has be presented. - Investigated the effect of different program inputs - Effect of optimizations on programs - Variability of the results between different runs (i) on same GPU (ii) on different GPU (Benchmarks Used : Irregular - BFS, Barnes Hut, Data Compression, Delaunay Mesh Refinement, Points-to Analysis, Survey Propagation, Single Source Shortest Path, TSP Regular - Black Scholes, Histogram, Monte Carlo, Matrix Multiplication, N-Body )
  • 12. Results and Analysis(Continued....) Amount of Irregularity - CFI is usually very low. For above benchmarks it is less than 4.1% - Most of the programs can not strictly classified as regular or irregular - Two irregularities appear to be independent of each other - Irregular control flow generally implies irregular memory access
  • 13. Results and Analysis(Continued....) Input Sensitivity - Input sensitivity is very difficult to predict - Difficult to do it in application independent way (i) Input Oblivious - Irregularity remains largely constant for different inputs (ii) Input-type Dependent - Irregularity varies largely across different types of inputs rather than within a single type (iii) Input Dependent Irregularity varies as size of the input varies
  • 14. Results and Analysis(Continued....) (iii) Arithmetic Precision Change from single precision to double precision increases CFI and MAI for small inputs but decreases both for medium and large inputs But the change is very small. - It indicates that change in arithmetic precision does not affect the irregularity of program.
  • 15. Results and Analysis(Continued....) Variability - Observed for several kernels on different GPUs and same GPUs for multiple runs Irregularities are quite stable for same GPU and vary somewhat between distinct GPUs
  • 16. Conclusion - There is no type of programs as regular or irregular - Irregularity is not necessarily bad for the performance - By definition, irregular programs are data dependent but deferent inputs yield similar degrees of irregularity - Irregularity does no vary much between distinct GPUs It is expected that above conclusions hold across a broad range of CUDA-capable GPUs and hope that it will increase the understanding of the behavior of irregular GPU applications.
  • 17. References Paper : A Quantitative Study of Irregular Programs on GPUs By - Rupesh Nasre, Keshav Pingali, Martin Burtscher Texas State University Published in IEEE International Symposium on Workload Characterization ( IISWC '13 )
  • 18. Results and Analysis(Continued....) Effect of Optimizations and Arithmetic Precision (i) Regular version of one program reads records from global memory but in optimized version if calculates the record values on the fly. - This actually increase the Control Flow Irregularity - But faster is the performance because computations are cheaper than reading values from global memory. (ii) In optimized Single Source Shortest Path algorithm, nodes which are logically close to each other are kept close in memory. - It increase the Memory-Access Irregularity but increases the spatial locality