Stay up-to-date on the latest news, events and resources for the OpenACC community. This months highlights covers working on applications for the new Frontier supercomputer, using OpenACC for weather forecasting, upcoming GPU Hackathons and Bootcamps, and new resources!
2. 2
WHAT IS OPENACC?
main()
{
<serial code>
#pragma acc kernels
{
<parallel code>
}
}
Add Simple Compiler Directive
POWERFUL & PORTABLE
Directives-based
programming model for
parallel
computing
Designed for
performance and
portability on
CPUs and GPUs
SIMPLE
Open Specification Developed by OpenACC.org Consortium
3. 3
silica IFPEN, RMM-DIIS on P100
OPENACC GROWING MOMENTUM
Wide Adoption Across Key HPC Codes
ANSYS Fluent
Gaussian
VASP
LSDalton
MPAS
GAMERA
GTC
XGC
ACME
FLASH
COSMO
Numeca
400+ APPS* USING OpenACC
Prof. Georg Kresse
Computational Materials Physics
University of Vienna
For VASP, OpenACC is the way forward for GPU
acceleration. Performance is similar to CUDA, and
OpenACC dramatically decreases GPU
development and maintenance efforts. Were
excited to collaborate with NVIDIA and PGI as an
early adopter of Unified Memory.
VASP
Top Quantum Chemistry and Material Science Code
* Applications in production and development
4. 4
LEARN MORE
Whats it like designing an app for the worlds fastest supercomputer, set to
come online in the United States in 2021? OpenACC Organizations own
Sunita Chandrasekaran is leading an elite international team in just that
task. For the past year, Chandrasekaran has been leading one of eight
teams working on applications for the new Frontier supercomputer being
built within Oak Ridge National Laboratory (ORNL).
PROGRAMMING THE WORLDS SOON-TO-BE
FASTEST SUPERCOMPUTER
Large-scale (and fast) simulations that couldnt be imagined just a
few years ago are now going to become possible with the massive
compute resources that Frontier is going to offer. Not just virus
research, but such compute capabilities are of paramount importance
to studies like finding a cure for Alzheimer's disease or studying
climate change.
5. 5
DONT MISS THESE UPCOMING EVENTS
COMPLETE LIST OF EVENTS
Event Call Closes Event Date
NVIDIA/ENCCS AI for Science February 22, 2021 March 8-9, 2021
SEAM AI Applied Geoscience GPU Hackathon March 1, 2021 March 22 April 9, 2021
UCL RITS & DiRAC AI Bootcamp March 15, 2021 March 29-30, 2021
EPCC GPU Hackathon 2021 February 19, 2021 April 19, 26-28, 2021
CCNU GPU Hackathon February 12, 2021 April 12, 19-21, 2021
Argonne GPU Hackathon February 19, 2021 April 19, 27-29, 2021
SDSC GPU Hackathon 2021 March 4, 2021 May 4, 11-13, 2021
Digital in 2021: Many of our events will continue to happen digitally! Get the same high-touch training and
mentorship without the hassle of travel!
6. 6
READ ARTICLE
For those who follow supercomputing, weather forecasting is one
area to watch for systems that are designed for maximum
capability. Unlike the many other areas in HPC, GPUs are not
endemic in weather forecasting. There are a few exceptions but
generally, they only exist because the underlying codes are GPU
native.
Learn how The Weather Company had the advantage of starting
natively with GPUs for its forecasts and partnered closely with
NCAR to build the code that supports their GRAF model.
THE WEATHER COMPANY RAISES GPU,
STORAGE FORECAST
7. 7
WATCH NOW
Stencil operations are used widely in HPC applications and pose an
optimization challenge on both CPUs and GPUs. On GPUs, fine-
tuned optimizations can be formulated using low-level APIs such as
CUDA, but many large established codes prefer a portable, higher-
level API such as OpenACC. Although OpenACC lacks the fine-
tuning of CUDA, it does allow for some tuning through a variety of
parallelization constructs and loop directives. Attend this webinar as
Ronald M. Caplan of Predictive Science Inc. (PSI) discusses
various OpenACC directive options to optimize the computationally
heaviest stencil operation within the production solar physics
research code Magnetohydrodynamics Around a Sphere (MAS).
ON-DEMAND WEBINAR: OPTIMIZING
STENCIL OPERATIONS WITH OPENACC
8. 8
RESOURCES
Paper: Tinker-HP: Accelerating Molecular Dynamics
Simulations of Large Complex Systems with
Advanced Point Dipole Polarizable Force Fields
using GPUs and Multi-GPUs systems
Olivier Adjoua, Louis Lagard竪re, Luc-Henri Jolly,
Arnaud Durocher, Thibaut Very, Isabelle Dupays, Zhi Wang,
Th辿o Jaffrelot Inizan, Fr辿d辿ric C辿lerse, Pengyu Ren,
Jay W. Ponder, and Jean-Philip Piquemal
We present the extension of the Tinker- HP package (Lagard竪re et al.,
Chem. Sci., 2018,9, 956-972) to the use of Graphics Processing Unit
(GPU) cards to accelerate molecular dynamics simulations using
polarizable many-body force fields. The new high-performance module
allows for an efficient use of single- and multi-GPUs architectures ranging
from research laboratories to modern supercomputer centers. After
detailing an analysis of our general scalable strategy that relies on
OpenACC and CUDA , we discuss the various capabilities of the package.
READ PAPER
Fig. 5. Illustration of compute balance due to the list reordering.
Unbalanced computation in the first image induces an issue called warp
discrepancy: a situation where all threads belonging to the same vector do
not follow the same instructions. Minimizing that can increase kernel
performance significantly since we ensure load balancing among each
thread inside the vector
9. 9
RESOURCES
Paper: An Improved Framework of GPU Computing
for CFD Applications on Structured Grids using
OpenACC
Weicheng Xue, Charles W. Jackson, and Christoper J. Roy
This paper is focused on improving multi-GPU performance of a research
CFD code on structured grids. MPI and OpenACC directives are used to
scale the code up to 16 GPUs. This paper shows that using 16 P100 GPUs
and 16 V100 GPUs can be 30 and 70 faster than 16 Xeon CPU E5-
2680v4 cores for three different test cases, respectively. A series of
performance issues related to the scaling for the multi-block CFD code are
addressed by applying various optimizations. Performance optimizations
such as the pack/unpack message method, removing temporary arrays as
arguments to procedure calls, allocating global memory for limiters and
connected boundary data, reordering non-blocking MPI I send/I recv and
Wait calls, reducing unnecessary implicit derived type member data
movement between the host and the device and the use of GPUDirect can
improve the compute utilization, memory throughput, and asynchronous
progression in the multi-block CFD code using modern programming
features. READ PAPER
Fig. 3. A 3D domain decomposition.
10. 10
RESOURCES
Presentation: OpenACC pgfortran: substantial
speedups and beyond for the O3 Condensation
algorithm for determinants and estimation
Damien Mather and Chris Scott
This milestone achievement report for NeSI project uoo02741 Extracting D-
efficient training samples.. and is an informative case study in applying Open
acceleration (OpenACC) directives to MPI fortran using PGIs smart pgfortran
compiler on NeSIs Mahuika platform. We utilise intel MPI libraries and up to 4
of Mahuikas P-100 GPUs per batch job and show (a) how substantial speedups
can be had with four additional OpenACC compiler directives, and (b) how
evolving the algorithm to optimise data locality and reduce process blocking can
achieve further substantial speedup. We also demonstrate how these can be
achieved consistently in practice across a wide variety of computing platforms
from legacy CPUs and NVIDIA accelerator cards through to NeSis HPC
platforms. The Condensation algorithm used in this demonstration has superior
scaling performance and immunity to ill-conditioning for both the calculation of
determinants, and, by extension, to the estimation of large predictive analytic
systems of linear and linearised equations, whilst retaining O(3) computational
complexity similar to the widely used Gaussian Elimination based methods. VIEW PRESENTATION
11. 11
RESOURCES
Website: GPUHackathons.org
Technical Resources
VISIT SITE
Explore a wealth of resources for GPU-accelerated
computing across HPC, AI and Big Data.
Review a collection of videos, presentations, GitHub repos,
tutorials, libraries and more to help you advance your skills
and expand your knowledge.
12. 12
STAY IN THE KNOW:
JOIN THE OPENACC COMMUNITY
JOIN TODAY
The OpenACC specification is designed for, and
by, users meaning that the OpenACC organization
relies on our users active participation to shape
the specification and to educate the scientific
community on its use.
Take an active role in influencing the future of both
the OpenACC specification and the organization
itself by becoming a member of the community.