This document provides a monthly highlights summary of OpenACC:
- OpenACC is a programming model for parallel computing on CPUs and GPUs using compiler directives to add parallelism to existing serial code.
- OpenACC is seeing wide adoption across major HPC applications and allows performance portability between CPU and GPU.
- The document highlights recent optimizations, events, publications and resources around OpenACC programming.
2. 2
WHAT IS OPENACC?
main()
{
<serial code>
#pragma acc kernels
{
<parallel code>
}
}
Add Simple Compiler Directive
POWERFUL & PORTABLE
Directives-based
programming model for
parallel
computing
Designed for
performance and
portability on
CPUs and GPUs
SIMPLE
Open Specification Developed by OpenACC.org Consortium
3. 3
silica IFPEN, RMM-DIIS on P100
OPENACC GROWING MOMENTUM
Wide Adoption Across Key HPC Codes
ANSYS Fluent
Gaussian
VASP
LSDalton
MPAS
GAMERA
GTC
XGC
ACME
FLASH
COSMO
Numeca
200 APPS* USING OpenACC
Prof. Georg Kresse
Computational Materials Physics
University of Vienna
For VASP, OpenACC is the way forward for GPU
acceleration. Performance is similar to CUDA, and
OpenACC dramatically decreases GPU
development and maintenance efforts. Were
excited to collaborate with NVIDIA and PGI as an
early adopter of Unified Memory.
VASP
Top Quantum Chemistry and Material Science Code
* Applications in production and development
4. 4
READ BLOG
Something that started as a preliminary investigation
for an undergraduate class project has become the
cover of a prestigious journal publication.
Discover how students at the University of Delaware
used OpenACC to accelerate molecular dynamics
code PPM_One running a large protein complex
(approximately 11.3 million atoms) from 14 hours to
under 47 seconds. and became the PLOS
Computational Biology journal cover.
FROM UNDERGRAD PROJECT TO PLOS
COVER: OPENACC FOR BIOPHYSICS
Image credit: Alex Bryer and Juan R. Perilla
5. 5
DONT MISS THESE UPCOMING EVENTS
COMPLETE LIST OF EVENTS
Event Call Closes Event Date
C-DAC GPU Hackathon July 7, 2020 September 7-11, 2020
Helmholtz GPU Hackathon July 14, 2020 September 14-18, 2020
USTC GPU Bootcamp (Digital) July 21, 2020 September 21-25, 2020
Swiss National Supercomputing Center (Digital) July 12, 2020 September 21-30, 2020
NASA GPU Hackathon July 31, 2020 October 5-10, 2020
NCHC GPU Hackathon (Digital) July 31, 2020 October 12-16, 2020
New in 2020: Many of our events are happening digitally! Get the same high-touch training and
mentorship without the hassle of travel!
6. 6
REGISTER
HPC Summit Digital brings leaders, developers,
scientists, and researchers from around the world
together to engage with technical experts, ask tough
questions, provide feedback, and learn from their
peers in an interactive, online setting.
Join us to learn about new trends and innovations,
engage with experts, and get answers to all your
technical questions in our webinars and breakout
forums.
Best of all, its free.
LEARN FROM LEADERS IN HPC AT
HPC SUMMIT DIGITAL
7. 7
READ BLOG
The recently concluded GPU Hackathon, hosted by the
San Diego Supercomputer Center (SDSC) and held in
partnership with Oak Ridge Leadership Computing
Facility (OLCF), NVIDIA, and the National Energy
Research Scientific Computing Center (NERSC), brought
together seven teams across multiple disciplines using a
newly launched completely remote, digital event format.
By collaborating with mentors who are experts in GPU
programming, these seven teams from 11 institutions
many of them having relatively little to no GPU
experienceworked to port and optimize their
applications. Every team achieved a speed-up.
2020 GPU HACKATHON SERIES KICKS OFF
8. 8
LEARN MORE
The Programming Models group at Barcelona
Supercomputing Center (BSC) has published
a new release (version 2020.06) of the
OmpSs-2 programming model. The release
has several new major features such as a
compiler based on LLVM, an integrated
tracing tool, and support for OpenACC
kernels. Moreover, BSC has optimized the
scheduler infrastructure, the memory allocator
and the discrete dependency system to
improve performance and scalability of
OmpSs-2 applications on many-core systems.
UPDATE BRINGS NEW FEATURES
OMPSS-2 NEW RELEASE
9. 9
UPCOMING TALKS
FortranCon 2020: Highly Parallel Fortran and
OpenACC Directives
Jeff Larkin and Michael Wolfe
REGISTER
Fortran has long been the language of computational math and science
and it has outlived many of the computer architectures on which it has
been used. Modern Fortran must be able to run on modern, highly
parallel, heterogeneous computer architectures. A significant number of
Fortran programmers have had success programming for
heterogeneous machines by pairing Fortran with the OpenACC
language for directives-based parallel programming. This includes
some of the most widely-used Fortran applications in the world, such as
VASP and Gaussian. This presentation will discuss what makes
OpenACC a good fit for Fortran programmers and what the OpenACC
language is doing to promote the use of native language parallelism in
Fortran, such as do concurrent and Co-arrays.
10. 10
UPCOMING TALKS
FortranCon 2020: Evolving Fortran for Emerging
Architectures: Lessons from the ICON-GPU
Atmospheric Model
William Sawyer
REGISTER
For decades Fortran has been on the forefront of high performance
computing. As new architectures emerged, the Fortran standard added
constructs to exploit them, but not always with complete success.
The advent of General Purpose Graphics Processing Units (GPGPUs)
has created another conundrum. They can be programmed with an
appropriate language (CUDA, CUDAFortran, or OpenCL) or with
directives (e.g., OpenMP4.5 or OpenACC3.0), each with disadvantages.
In this talk, we outline the lessons learned in porting the ICON
atmospheric model to GPUs with OpenCL, CUDAFortran and, finally,
OpenACC, with the latter now in production at the Swiss National
Supercomputing Centre (CSCS). Now that we understand the
programming challenges, it is possible to consider new extensions to the
Fortran standard to address GPUs, which are clearly not going away any
time soon. We attempt to give some future perspectives.
11. 11
RESOURCES
Paper: GPU-acceleration of A High Order Finite
Difference Code Using Curvilinear Coordinates
Marco Kupiainen, Jing Gong, Lilit Axner, Erwin Laure, and
Jan Nordstr旦m
READ PAPER
GPU-accelerated computing is becoming a popular technology due to the
emergence of techniques such as OpenACC, which makes it easy to port
codes in their original form to GPU systems using compiler directives, and
thereby speeding up computation times relatively simply. In this study we have
developed an OpenACC implementation of the high order finite difference CFD
solver ESSENSE for simulating compressible flows. The solver is based on
summation-by-part form difference operators, and the boundary and interface
conditions are weakly implemented using simultaneous approximation terms.
This case study focuses on porting code to GPUs for the most time-consuming
parts namely sparse matrix vector multiplications and the evaluations of fluxes.
The resulting OpenACC implementation is used to simulate the Taylor-Green
vortex which produces a maximum speed-up of 61.3 on a single V100 GPU by
compared to serial CPU version.
12. 12
RESOURCES
Paper: Optimization of Tensor-product Operations in
Nekbone on GPUs
Martin Karp, Niclas Jansson, Artur Podobas, Philipp Schlatter, and
Stefano Markidis
In the CFD solver Nek5000, the computation is dominated by the evaluation of small
tensor operations. Nekbone is a proxy app for Nek5000 and has previously been
ported to GPUs with a mixed OpenACC and CUDA approach. In this work, we
continue this effort and optimize the main tensor-product operation in Nekbone
further. Our optimization is done in CUDA and uses a different, 2D, thread structure
to make the computations layer by layer. This enables us to use loop unrolling as well
as utilize registers and shared memory efficiently. Our implementation is then
compared on both the Pascal and Volta GPU architectures to previous GPU versions
of Nekbone as well as a measured roofline. The results show that our implementation
outperforms previous GPU Nekbone implementations by 6-10%. Compared to the
measured roofline, we obtain 77 - 92% of the peak performance for both Nvidia P100
and V100 GPUs for inputs with 1024 - 4096 elements and polynomial degree 9.
READ PAPER
13. 13
STAY IN THE KNOW:
JOIN THE OPENACC COMMUNITY
JOIN TODAY
The OpenACC specification is designed for, and
by, users meaning that the OpenACC organization
relies on our users active participation to shape
the specification and to educate the scientific
community on its use.
Take an active role in influencing the future of both
the OpenACC specification and the organization
itself by becoming a member of the community.