�ݺ�ߣ

May 2020
OPENACC MONTHLY
HIGHLIGHTS

2
WHAT IS OPENACC?
main()
{
<serial code>
#pragma acc kernels
{
<parallel code>
}
}
Add Simple Compiler Directive
POWERFUL & PORTABLE
Directives-based
programming model for
parallel
computing
Designed for
performance and
portability on
CPUs and GPUs
SIMPLE
Open Specification Developed by OpenACC.org Consortium

3
silica IFPEN, RMM-DIIS on P100
OPENACC GROWING MOMENTUM
Wide Adoption Across Key HPC Codes
ANSYS Fluent
Gaussian
VASP
LSDalton
MPAS
GAMERA
GTC
XGC
ACME
FLASH
COSMO
Numeca
200 APPS* USING OpenACC
Prof. Georg Kresse
Computational Materials Physics
University of Vienna
For VASP, OpenACC is the way forward for GPU
acceleration. Performance is similar to CUDA, and
OpenACC dramatically decreases GPU
development and maintenance efforts. We’re
excited to collaborate with NVIDIA and PGI as an
early adopter of Unified Memory.
“ “
VASP
Top Quantum Chemistry and Material Science Code
* Applications in production and development

4
READ BLOG
The first completely digital GPU Hackathon with the San
Diego Supercomputer Center successfully concluded on
May 13th, marking a new chapter in the evolution of the
hackathon program.
Learn more about this seminal event and helpful hints to
make your next online event a success.
APART BUT TOGETHER:
GPU HACKATHONS GO REMOTE
“The training event was fabulous and made a huge difference in terms of getting
people up to speed. The overview tutorials on some of the tools such as NVIDIA Nsight
Systems and NSIGHT Compute, as well as bringing people together to be more
comfortable with the digital environment, mentors, and flow ahead of the event really
helped to overcome what would be natural activation barriers.”
Mary Thomas, Computational Data Scientist and HPC Training Lead
San Diego Supercomputer Center

5
DON’T MISS THESE UPCOMING EVENTS
COMPLETE LIST OF EVENTS
Event Call Closes Event Date
CCNU GPU Hackathon (Digital) June 12, 2020 August 5-14, 2020
BNL GPU Hackathon June 17, 2020 August 10-19, 2020
UKRSE Virtual GPU Bootcamp June 22, 2020 June 29-30, 2020
C-DAC GPU Hackathon July 7, 2020 September 7-11, 2020
Helmholtz GPU Hackathon July 14, 2020 September 14-18, 2020
Swiss National Supercomputing Center (Switzerland) July 12, 2020 September 28-October 2, 2020
New in 2020: Many of our events are happening digitally! Get the same high-touch training and
mentorship without the hassle of travel!

6
READ BLOG
Something that started as a preliminary investigation
for an undergraduate class project has become the
cover of a prestigious journal publication.
Discover how students at the University of Delaware
used OpenACC to accelerate molecular dynamics
code PPM_One running a large protein complex
(approximately 11.3 million atoms) from 14 hours to
under 47 seconds. and became the PLOS
Computational Biology journal cover.
FROM UNDERGRAD PROJECT TO PLOS
COVER: OPENACC FOR BIOPHYSICS
Image credit: Alex Bryer and Juan R. Perilla

7
LEARN MORE
The NVIDIA HPC SDK™ is a comprehensive suite of C,
C++, and Fortran compilers, libraries, and tools for GPU-
accelerating HPC modeling and simulation applications. It
supports GPU programming with standard C++ and
Fortran parallel constructs, OpenACC directives and
CUDA®.
GPU-accelerated math libraries maximize performance on
common HPC algorithms, and optimized communications
libraries enable standards-based multi-GPU and scalable
systems programming. Performance profiling and
debugging tools simplify porting and optimization of HPC
applications, and containerization tools enable easy
deployment on-premises or in the cloud.
ANNOUNCING THE NVIDIA HPC SDK

8
VIEW NOW
The GNU Compiler Collection 10 has seen its first
stable release in GCC 10.1, bringing new features
and functionality that include:
• C++20 language support is in much better shape
for the GCC compiler and libstdc++ library;
• GCC 10 introduces a static analyzer for helping
spot more coding problems;
• Support for the OpenACC 2.6 specification;
• Support for much of the OpenMP 5.0 specification;
• OpenMP/OpenACC offloading for Radeon GPUs;
• and more.
UPDATE BRINGS NEW FEATURES
GCC 10 COMPILER RELEASED

9
RESOURCES
Paper: Porting and Optimizing UniFrac for GPUs
Igor Sfiligoi, Daniel McDonald, and Rob Knight
READ PAPER
UniFrac is a commonly used metric in microbiome research for comparing microbiome
profiles to one another ("beta diversity"). The recently implemented Striped UniFrac
added the capability to split the problem into many independent subproblems and exhibits
near linear scaling. In this paper we describe steps undertaken in porting and optimizing
Striped Unifrac to GPUs. We reduced the run time of computing UniFrac on the published
Earth Microbiome Project dataset from 13 hours on an Intel Xeon E5-2680 v4 CPU to 12
minutes on an NVIDIA Tesla V100 GPU, and to about one hour on a laptop with NVIDIA
GTX 1050 (with minor loss in precision). Computing UniFrac on a larger dataset
containing 113k samples reduced the run time from over one month on the CPU to less
than 2 hours on the V100 and 9 hours on an NVIDIA RTX 2080TI GPU (with minor loss in
precision). This was achieved by using OpenACC for generating the GPU offload code
and by improving the memory access patterns. A BSD-licensed implementation is
available, which produces a C shared library linkable by any programming language.

10
RESOURCES
Paper: A GPU-based Algorithm for Efficient LES of
High Reynolds Number Flows in Heterogeneous
CPU/GPU Supercomputers
Guillermo Oyarzun, Iason A. Chalmoukis, Georgios A. Leftheriotis,
and Athanassios A. Dimas
Αn optimized MPI+OpenACC implementation model that performs efficiently in CPU/GPU systems
using large-eddy simulation is presented. The code was validated for the simulation of wave boundary-
layer flows against numerical and experimental data in the literature. A direct Fast-Fourier-Transform-
based solver was developed for the solution of the Poisson equation for pressure taking advantage of
the periodic boundary conditions. This solver was optimized for parallel execution in CPUs and
outperforms by 10 times in computational time a typical iterative preconditioned conjugate gradient
solver in GPUs. In terms of parallel performance, an overlapping strategy was developed to reduce the
overhead of performing MPI communications using GPUs. As a result, the weak scaling of the
algorithm was improved up to 30%. Finally, a large-scale simulation (Re = 2 × 105) using a grid of 4 ×
108 cells was executed, and the performance of the code was analyzed. The simulation was launched
using up to 512 nodes (512 GPUs + 6144 CPU-cores) on one of the current top 10 supercomputers of
the world (Piz Daint). A comparison of the overall computational time showed that the GPU version
was 4.2 times faster than the CPU one. The parallel efficiency of this strategy (47%) is competitive
compared with the state-of-the-art CPU implementations, and it has the potential to take advantage of
modern supercomputing capabilities.. READ PAPER

11
STAY IN THE KNOW:
JOIN THE OPENACC COMMUNITY
JOIN TODAY
The OpenACC specification is designed for, and
by, users meaning that the OpenACC organization
relies on our users’ active participation to shape
the specification and to educate the scientific
community on its use.
Take an active role in influencing the future of both
the OpenACC specification and the organization
itself by becoming a member of the community.

�ݺ�ߣ

OpenACC Monthly Highlights: May 2020

More Related Content

OpenACC Monthly Highlights: May 2020