際際滷

際際滷Share a Scribd company logo
May 2020
OPENACC MONTHLY
HIGHLIGHTS
2
WHAT IS OPENACC?
main()
{
<serial code>
#pragma acc kernels
{
<parallel code>
}
}
Add Simple Compiler Directive
POWERFUL & PORTABLE
Directives-based
programming model for
parallel
computing
Designed for
performance and
portability on
CPUs and GPUs
SIMPLE
Open Specification Developed by OpenACC.org Consortium
3
silica IFPEN, RMM-DIIS on P100
OPENACC GROWING MOMENTUM
Wide Adoption Across Key HPC Codes
ANSYS Fluent
Gaussian
VASP
LSDalton
MPAS
GAMERA
GTC
XGC
ACME
FLASH
COSMO
Numeca
200 APPS* USING OpenACC
Prof. Georg Kresse
Computational Materials Physics
University of Vienna
For VASP, OpenACC is the way forward for GPU
acceleration. Performance is similar to CUDA, and
OpenACC dramatically decreases GPU
development and maintenance efforts. Were
excited to collaborate with NVIDIA and PGI as an
early adopter of Unified Memory.
 
VASP
Top Quantum Chemistry and Material Science Code
* Applications in production and development
4
READ BLOG
The first completely digital GPU Hackathon with the San
Diego Supercomputer Center successfully concluded on
May 13th, marking a new chapter in the evolution of the
hackathon program.
Learn more about this seminal event and helpful hints to
make your next online event a success.
APART BUT TOGETHER:
GPU HACKATHONS GO REMOTE
The training event was fabulous and made a huge difference in terms of getting
people up to speed. The overview tutorials on some of the tools such as NVIDIA Nsight
Systems and NSIGHT Compute, as well as bringing people together to be more
comfortable with the digital environment, mentors, and flow ahead of the event really
helped to overcome what would be natural activation barriers.
Mary Thomas, Computational Data Scientist and HPC Training Lead
San Diego Supercomputer Center
5
DONT MISS THESE UPCOMING EVENTS
COMPLETE LIST OF EVENTS
Event Call Closes Event Date
CCNU GPU Hackathon (Digital) June 12, 2020 August 5-14, 2020
BNL GPU Hackathon June 17, 2020 August 10-19, 2020
UKRSE Virtual GPU Bootcamp June 22, 2020 June 29-30, 2020
C-DAC GPU Hackathon July 7, 2020 September 7-11, 2020
Helmholtz GPU Hackathon July 14, 2020 September 14-18, 2020
Swiss National Supercomputing Center (Switzerland) July 12, 2020 September 28-October 2, 2020
New in 2020: Many of our events are happening digitally! Get the same high-touch training and
mentorship without the hassle of travel!
6
READ BLOG
Something that started as a preliminary investigation
for an undergraduate class project has become the
cover of a prestigious journal publication.
Discover how students at the University of Delaware
used OpenACC to accelerate molecular dynamics
code PPM_One running a large protein complex
(approximately 11.3 million atoms) from 14 hours to
under 47 seconds. and became the PLOS
Computational Biology journal cover.
FROM UNDERGRAD PROJECT TO PLOS
COVER: OPENACC FOR BIOPHYSICS
Image credit: Alex Bryer and Juan R. Perilla
7
LEARN MORE
The NVIDIA HPC SDK is a comprehensive suite of C,
C++, and Fortran compilers, libraries, and tools for GPU-
accelerating HPC modeling and simulation applications. It
supports GPU programming with standard C++ and
Fortran parallel constructs, OpenACC directives and
CUDA速.
GPU-accelerated math libraries maximize performance on
common HPC algorithms, and optimized communications
libraries enable standards-based multi-GPU and scalable
systems programming. Performance profiling and
debugging tools simplify porting and optimization of HPC
applications, and containerization tools enable easy
deployment on-premises or in the cloud.
ANNOUNCING THE NVIDIA HPC SDK
8
VIEW NOW
The GNU Compiler Collection 10 has seen its first
stable release in GCC 10.1, bringing new features
and functionality that include:
 C++20 language support is in much better shape
for the GCC compiler and libstdc++ library;
 GCC 10 introduces a static analyzer for helping
spot more coding problems;
 Support for the OpenACC 2.6 specification;
 Support for much of the OpenMP 5.0 specification;
 OpenMP/OpenACC offloading for Radeon GPUs;
 and more.
UPDATE BRINGS NEW FEATURES
GCC 10 COMPILER RELEASED
9
RESOURCES
Paper: Porting and Optimizing UniFrac for GPUs
Igor Sfiligoi, Daniel McDonald, and Rob Knight
READ PAPER
UniFrac is a commonly used metric in microbiome research for comparing microbiome
profiles to one another ("beta diversity"). The recently implemented Striped UniFrac
added the capability to split the problem into many independent subproblems and exhibits
near linear scaling. In this paper we describe steps undertaken in porting and optimizing
Striped Unifrac to GPUs. We reduced the run time of computing UniFrac on the published
Earth Microbiome Project dataset from 13 hours on an Intel Xeon E5-2680 v4 CPU to 12
minutes on an NVIDIA Tesla V100 GPU, and to about one hour on a laptop with NVIDIA
GTX 1050 (with minor loss in precision). Computing UniFrac on a larger dataset
containing 113k samples reduced the run time from over one month on the CPU to less
than 2 hours on the V100 and 9 hours on an NVIDIA RTX 2080TI GPU (with minor loss in
precision). This was achieved by using OpenACC for generating the GPU offload code
and by improving the memory access patterns. A BSD-licensed implementation is
available, which produces a C shared library linkable by any programming language.
10
RESOURCES
Paper: A GPU-based Algorithm for Efficient LES of
High Reynolds Number Flows in Heterogeneous
CPU/GPU Supercomputers
Guillermo Oyarzun, Iason A. Chalmoukis, Georgios A. Leftheriotis,
and Athanassios A. Dimas
n optimized MPI+OpenACC implementation model that performs efficiently in CPU/GPU systems
using large-eddy simulation is presented. The code was validated for the simulation of wave boundary-
layer flows against numerical and experimental data in the literature. A direct Fast-Fourier-Transform-
based solver was developed for the solution of the Poisson equation for pressure taking advantage of
the periodic boundary conditions. This solver was optimized for parallel execution in CPUs and
outperforms by 10 times in computational time a typical iterative preconditioned conjugate gradient
solver in GPUs. In terms of parallel performance, an overlapping strategy was developed to reduce the
overhead of performing MPI communications using GPUs. As a result, the weak scaling of the
algorithm was improved up to 30%. Finally, a large-scale simulation (Re = 2  105) using a grid of 4 
108 cells was executed, and the performance of the code was analyzed. The simulation was launched
using up to 512 nodes (512 GPUs + 6144 CPU-cores) on one of the current top 10 supercomputers of
the world (Piz Daint). A comparison of the overall computational time showed that the GPU version
was 4.2 times faster than the CPU one. The parallel efficiency of this strategy (47%) is competitive
compared with the state-of-the-art CPU implementations, and it has the potential to take advantage of
modern supercomputing capabilities.. READ PAPER
11
STAY IN THE KNOW:
JOIN THE OPENACC COMMUNITY
JOIN TODAY
The OpenACC specification is designed for, and
by, users meaning that the OpenACC organization
relies on our users active participation to shape
the specification and to educate the scientific
community on its use.
Take an active role in influencing the future of both
the OpenACC specification and the organization
itself by becoming a member of the community.
WWW.OPENACC.ORG
Learn more at

More Related Content

OpenACC Monthly Highlights: May 2020

  • 2. 2 WHAT IS OPENACC? main() { <serial code> #pragma acc kernels { <parallel code> } } Add Simple Compiler Directive POWERFUL & PORTABLE Directives-based programming model for parallel computing Designed for performance and portability on CPUs and GPUs SIMPLE Open Specification Developed by OpenACC.org Consortium
  • 3. 3 silica IFPEN, RMM-DIIS on P100 OPENACC GROWING MOMENTUM Wide Adoption Across Key HPC Codes ANSYS Fluent Gaussian VASP LSDalton MPAS GAMERA GTC XGC ACME FLASH COSMO Numeca 200 APPS* USING OpenACC Prof. Georg Kresse Computational Materials Physics University of Vienna For VASP, OpenACC is the way forward for GPU acceleration. Performance is similar to CUDA, and OpenACC dramatically decreases GPU development and maintenance efforts. Were excited to collaborate with NVIDIA and PGI as an early adopter of Unified Memory. VASP Top Quantum Chemistry and Material Science Code * Applications in production and development
  • 4. 4 READ BLOG The first completely digital GPU Hackathon with the San Diego Supercomputer Center successfully concluded on May 13th, marking a new chapter in the evolution of the hackathon program. Learn more about this seminal event and helpful hints to make your next online event a success. APART BUT TOGETHER: GPU HACKATHONS GO REMOTE The training event was fabulous and made a huge difference in terms of getting people up to speed. The overview tutorials on some of the tools such as NVIDIA Nsight Systems and NSIGHT Compute, as well as bringing people together to be more comfortable with the digital environment, mentors, and flow ahead of the event really helped to overcome what would be natural activation barriers. Mary Thomas, Computational Data Scientist and HPC Training Lead San Diego Supercomputer Center
  • 5. 5 DONT MISS THESE UPCOMING EVENTS COMPLETE LIST OF EVENTS Event Call Closes Event Date CCNU GPU Hackathon (Digital) June 12, 2020 August 5-14, 2020 BNL GPU Hackathon June 17, 2020 August 10-19, 2020 UKRSE Virtual GPU Bootcamp June 22, 2020 June 29-30, 2020 C-DAC GPU Hackathon July 7, 2020 September 7-11, 2020 Helmholtz GPU Hackathon July 14, 2020 September 14-18, 2020 Swiss National Supercomputing Center (Switzerland) July 12, 2020 September 28-October 2, 2020 New in 2020: Many of our events are happening digitally! Get the same high-touch training and mentorship without the hassle of travel!
  • 6. 6 READ BLOG Something that started as a preliminary investigation for an undergraduate class project has become the cover of a prestigious journal publication. Discover how students at the University of Delaware used OpenACC to accelerate molecular dynamics code PPM_One running a large protein complex (approximately 11.3 million atoms) from 14 hours to under 47 seconds. and became the PLOS Computational Biology journal cover. FROM UNDERGRAD PROJECT TO PLOS COVER: OPENACC FOR BIOPHYSICS Image credit: Alex Bryer and Juan R. Perilla
  • 7. 7 LEARN MORE The NVIDIA HPC SDK is a comprehensive suite of C, C++, and Fortran compilers, libraries, and tools for GPU- accelerating HPC modeling and simulation applications. It supports GPU programming with standard C++ and Fortran parallel constructs, OpenACC directives and CUDA速. GPU-accelerated math libraries maximize performance on common HPC algorithms, and optimized communications libraries enable standards-based multi-GPU and scalable systems programming. Performance profiling and debugging tools simplify porting and optimization of HPC applications, and containerization tools enable easy deployment on-premises or in the cloud. ANNOUNCING THE NVIDIA HPC SDK
  • 8. 8 VIEW NOW The GNU Compiler Collection 10 has seen its first stable release in GCC 10.1, bringing new features and functionality that include: C++20 language support is in much better shape for the GCC compiler and libstdc++ library; GCC 10 introduces a static analyzer for helping spot more coding problems; Support for the OpenACC 2.6 specification; Support for much of the OpenMP 5.0 specification; OpenMP/OpenACC offloading for Radeon GPUs; and more. UPDATE BRINGS NEW FEATURES GCC 10 COMPILER RELEASED
  • 9. 9 RESOURCES Paper: Porting and Optimizing UniFrac for GPUs Igor Sfiligoi, Daniel McDonald, and Rob Knight READ PAPER UniFrac is a commonly used metric in microbiome research for comparing microbiome profiles to one another ("beta diversity"). The recently implemented Striped UniFrac added the capability to split the problem into many independent subproblems and exhibits near linear scaling. In this paper we describe steps undertaken in porting and optimizing Striped Unifrac to GPUs. We reduced the run time of computing UniFrac on the published Earth Microbiome Project dataset from 13 hours on an Intel Xeon E5-2680 v4 CPU to 12 minutes on an NVIDIA Tesla V100 GPU, and to about one hour on a laptop with NVIDIA GTX 1050 (with minor loss in precision). Computing UniFrac on a larger dataset containing 113k samples reduced the run time from over one month on the CPU to less than 2 hours on the V100 and 9 hours on an NVIDIA RTX 2080TI GPU (with minor loss in precision). This was achieved by using OpenACC for generating the GPU offload code and by improving the memory access patterns. A BSD-licensed implementation is available, which produces a C shared library linkable by any programming language.
  • 10. 10 RESOURCES Paper: A GPU-based Algorithm for Efficient LES of High Reynolds Number Flows in Heterogeneous CPU/GPU Supercomputers Guillermo Oyarzun, Iason A. Chalmoukis, Georgios A. Leftheriotis, and Athanassios A. Dimas n optimized MPI+OpenACC implementation model that performs efficiently in CPU/GPU systems using large-eddy simulation is presented. The code was validated for the simulation of wave boundary- layer flows against numerical and experimental data in the literature. A direct Fast-Fourier-Transform- based solver was developed for the solution of the Poisson equation for pressure taking advantage of the periodic boundary conditions. This solver was optimized for parallel execution in CPUs and outperforms by 10 times in computational time a typical iterative preconditioned conjugate gradient solver in GPUs. In terms of parallel performance, an overlapping strategy was developed to reduce the overhead of performing MPI communications using GPUs. As a result, the weak scaling of the algorithm was improved up to 30%. Finally, a large-scale simulation (Re = 2 105) using a grid of 4 108 cells was executed, and the performance of the code was analyzed. The simulation was launched using up to 512 nodes (512 GPUs + 6144 CPU-cores) on one of the current top 10 supercomputers of the world (Piz Daint). A comparison of the overall computational time showed that the GPU version was 4.2 times faster than the CPU one. The parallel efficiency of this strategy (47%) is competitive compared with the state-of-the-art CPU implementations, and it has the potential to take advantage of modern supercomputing capabilities.. READ PAPER
  • 11. 11 STAY IN THE KNOW: JOIN THE OPENACC COMMUNITY JOIN TODAY The OpenACC specification is designed for, and by, users meaning that the OpenACC organization relies on our users active participation to shape the specification and to educate the scientific community on its use. Take an active role in influencing the future of both the OpenACC specification and the organization itself by becoming a member of the community.