ºÝºÝߣshows by User: fuchsto / http://www.slideshare.net/images/logo.gif ºÝºÝߣshows by User: fuchsto / Wed, 14 Dec 2016 12:44:39 GMT ºÝºÝߣShare feed for ºÝºÝߣshows by User: fuchsto A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16) /slideshow/a-multidimensional-distributed-array-abstraction-for-pgas/70133620 dashbriefpgasnarraytobiasfuchs-161214124439
DASH is a realization of the PGAS (partitioned global address space) model in the form of a C++ template library without the need for a custom PGAS (pre-)compiler. We present the DASH NArray concept, a multidimensional array abstraction designed as an underlying container for stenciland dense numerical applications. After introducing fundamental programming concepts used in DASH, we explain how these have been extended by multidimensional capabilities in the NArray abstraction. Focusing on matrix-matrix multiplication in a case study, we then discuss an implementation of the SUMMA algorithm for dense matrix multiplication to demonstrate how the DASH NArray facilitates portable efficiency and simplifies the design of efficient algorithms due to its explicit support for locality-based operations. Finally, we evaluate the performance of the SUMMA algorithm based on the NArray abstraction against established implementations of DGEMM and PDGEMM. In combination with mechanisms for automatic optimization of logical process topology and domain decomposition, our implementation yields highly competitive results without manual tuning, significantly outperforming Intel MKL and PLASMA in node-level use cases as well as ScaLAPACK in highly distributed scenarios.]]>

DASH is a realization of the PGAS (partitioned global address space) model in the form of a C++ template library without the need for a custom PGAS (pre-)compiler. We present the DASH NArray concept, a multidimensional array abstraction designed as an underlying container for stenciland dense numerical applications. After introducing fundamental programming concepts used in DASH, we explain how these have been extended by multidimensional capabilities in the NArray abstraction. Focusing on matrix-matrix multiplication in a case study, we then discuss an implementation of the SUMMA algorithm for dense matrix multiplication to demonstrate how the DASH NArray facilitates portable efficiency and simplifies the design of efficient algorithms due to its explicit support for locality-based operations. Finally, we evaluate the performance of the SUMMA algorithm based on the NArray abstraction against established implementations of DGEMM and PDGEMM. In combination with mechanisms for automatic optimization of logical process topology and domain decomposition, our implementation yields highly competitive results without manual tuning, significantly outperforming Intel MKL and PLASMA in node-level use cases as well as ScaLAPACK in highly distributed scenarios.]]>
Wed, 14 Dec 2016 12:44:39 GMT /slideshow/a-multidimensional-distributed-array-abstraction-for-pgas/70133620 fuchsto@slideshare.net(fuchsto) A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16) fuchsto DASH is a realization of the PGAS (partitioned global address space) model in the form of a C++ template library without the need for a custom PGAS (pre-)compiler. We present the DASH NArray concept, a multidimensional array abstraction designed as an underlying container for stenciland dense numerical applications. After introducing fundamental programming concepts used in DASH, we explain how these have been extended by multidimensional capabilities in the NArray abstraction. Focusing on matrix-matrix multiplication in a case study, we then discuss an implementation of the SUMMA algorithm for dense matrix multiplication to demonstrate how the DASH NArray facilitates portable efficiency and simplifies the design of efficient algorithms due to its explicit support for locality-based operations. Finally, we evaluate the performance of the SUMMA algorithm based on the NArray abstraction against established implementations of DGEMM and PDGEMM. In combination with mechanisms for automatic optimization of logical process topology and domain decomposition, our implementation yields highly competitive results without manual tuning, significantly outperforming Intel MKL and PLASMA in node-level use cases as well as ScaLAPACK in highly distributed scenarios. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/dashbriefpgasnarraytobiasfuchs-161214124439-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> DASH is a realization of the PGAS (partitioned global address space) model in the form of a C++ template library without the need for a custom PGAS (pre-)compiler. We present the DASH NArray concept, a multidimensional array abstraction designed as an underlying container for stenciland dense numerical applications. After introducing fundamental programming concepts used in DASH, we explain how these have been extended by multidimensional capabilities in the NArray abstraction. Focusing on matrix-matrix multiplication in a case study, we then discuss an implementation of the SUMMA algorithm for dense matrix multiplication to demonstrate how the DASH NArray facilitates portable efficiency and simplifies the design of efficient algorithms due to its explicit support for locality-based operations. Finally, we evaluate the performance of the SUMMA algorithm based on the NArray abstraction against established implementations of DGEMM and PDGEMM. In combination with mechanisms for automatic optimization of logical process topology and domain decomposition, our implementation yields highly competitive results without manual tuning, significantly outperforming Intel MKL and PLASMA in node-level use cases as well as ScaLAPACK in highly distributed scenarios.
A Multidimensional Distributed Array Abstraction for PGAS (HPCC'16) from Menlo Systems GmbH
]]>
308 2 https://cdn.slidesharecdn.com/ss_thumbnails/dashbriefpgasnarraytobiasfuchs-161214124439-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorithms (HPCC'16) /slideshow/dash-a-c-pgas-library-for-distributed-data-structures-and-parallel-algorithms/70133615 dashbriefoverviewtobiasfuchs-161214124426
We present DASH, a C++ template library that offers distributed data structures and parallel algorithms and implements a compiler-free PGAS (partitioned global address space) approach. DASH offers many productivity and performance features such as global-view data structures, efficient support for the owner-computes model, flexible multidimensional data distribution schemes and inter-operability with STL (standard template library) algorithms. DASH also features a flexible representation of the parallel target machine and allows the exploitation of several hierarchically organized levels of locality through a concept of Teams. We evaluate DASH on a number of benchmark applications and we port a scientific proxy application using the MPI two-sided model to DASH. We find that DASH offers excellent productivity and performance and demonstrate scalability up to 9800 cores.]]>

We present DASH, a C++ template library that offers distributed data structures and parallel algorithms and implements a compiler-free PGAS (partitioned global address space) approach. DASH offers many productivity and performance features such as global-view data structures, efficient support for the owner-computes model, flexible multidimensional data distribution schemes and inter-operability with STL (standard template library) algorithms. DASH also features a flexible representation of the parallel target machine and allows the exploitation of several hierarchically organized levels of locality through a concept of Teams. We evaluate DASH on a number of benchmark applications and we port a scientific proxy application using the MPI two-sided model to DASH. We find that DASH offers excellent productivity and performance and demonstrate scalability up to 9800 cores.]]>
Wed, 14 Dec 2016 12:44:25 GMT /slideshow/dash-a-c-pgas-library-for-distributed-data-structures-and-parallel-algorithms/70133615 fuchsto@slideshare.net(fuchsto) DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorithms (HPCC'16) fuchsto We present DASH, a C++ template library that offers distributed data structures and parallel algorithms and implements a compiler-free PGAS (partitioned global address space) approach. DASH offers many productivity and performance features such as global-view data structures, efficient support for the owner-computes model, flexible multidimensional data distribution schemes and inter-operability with STL (standard template library) algorithms. DASH also features a flexible representation of the parallel target machine and allows the exploitation of several hierarchically organized levels of locality through a concept of Teams. We evaluate DASH on a number of benchmark applications and we port a scientific proxy application using the MPI two-sided model to DASH. We find that DASH offers excellent productivity and performance and demonstrate scalability up to 9800 cores. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/dashbriefoverviewtobiasfuchs-161214124426-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> We present DASH, a C++ template library that offers distributed data structures and parallel algorithms and implements a compiler-free PGAS (partitioned global address space) approach. DASH offers many productivity and performance features such as global-view data structures, efficient support for the owner-computes model, flexible multidimensional data distribution schemes and inter-operability with STL (standard template library) algorithms. DASH also features a flexible representation of the parallel target machine and allows the exploitation of several hierarchically organized levels of locality through a concept of Teams. We evaluate DASH on a number of benchmark applications and we port a scientific proxy application using the MPI two-sided model to DASH. We find that DASH offers excellent productivity and performance and demonstrate scalability up to 9800 cores.
DASH: A C++ PGAS Library for Distributed Data Structures and Parallel Algorithms (HPCC'16) from Menlo Systems GmbH
]]>
470 5 https://cdn.slidesharecdn.com/ss_thumbnails/dashbriefoverviewtobiasfuchs-161214124426-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
DASH Locality Hierarchies (PADAL'16) /slideshow/dash-locality-hierarchies-padal16/67875756 tobias-fuchs-padal2016-161030124631
Single node hardware design is shifting to a heterogeneous nature and many of today’s largest HPC systems are clusters that combine accelerators in heterogeneous compute device architectures. The need for new programming abstractions in the advancements to the Exascale era has been widely recognized and variants of the Partitioned Global Address Space (PGAS) programming model are discussed as a promising approach in this respect DASH is a C++ template library that provides distributed data structures with support for hierarchical locality in a PGAS programming model. Portable efficiency, an essential goal in the design of DASH, can only be achieved with programming abstractions of hardware locality that allow to optimize data structures and algorithms to the underlying system at compile- and run time. Established tools like LIKWID and hwloc provide reliable interfaces to query the hardware topology on node level but fail to construct a global representation of distributed locality domains and do not support accelerator architectures like Intel MIC. We present Locality Hierarchies, an abstraction of distributed, hierarchical locality represented as a modifiable data structure. The underlying model supports heterogeneous systems as first-class use case and introduces a well-defined concept of distance for arbitrary distributed hardware hierarchies. Using common range-based algorithms as motivating examples, we explain how our approach facilitates locality-aware load-balancing and process mapping on SuperMIC compute nodes.]]>

Single node hardware design is shifting to a heterogeneous nature and many of today’s largest HPC systems are clusters that combine accelerators in heterogeneous compute device architectures. The need for new programming abstractions in the advancements to the Exascale era has been widely recognized and variants of the Partitioned Global Address Space (PGAS) programming model are discussed as a promising approach in this respect DASH is a C++ template library that provides distributed data structures with support for hierarchical locality in a PGAS programming model. Portable efficiency, an essential goal in the design of DASH, can only be achieved with programming abstractions of hardware locality that allow to optimize data structures and algorithms to the underlying system at compile- and run time. Established tools like LIKWID and hwloc provide reliable interfaces to query the hardware topology on node level but fail to construct a global representation of distributed locality domains and do not support accelerator architectures like Intel MIC. We present Locality Hierarchies, an abstraction of distributed, hierarchical locality represented as a modifiable data structure. The underlying model supports heterogeneous systems as first-class use case and introduces a well-defined concept of distance for arbitrary distributed hardware hierarchies. Using common range-based algorithms as motivating examples, we explain how our approach facilitates locality-aware load-balancing and process mapping on SuperMIC compute nodes.]]>
Sun, 30 Oct 2016 12:46:30 GMT /slideshow/dash-locality-hierarchies-padal16/67875756 fuchsto@slideshare.net(fuchsto) DASH Locality Hierarchies (PADAL'16) fuchsto Single node hardware design is shifting to a heterogeneous nature and many of today’s largest HPC systems are clusters that combine accelerators in heterogeneous compute device architectures. The need for new programming abstractions in the advancements to the Exascale era has been widely recognized and variants of the Partitioned Global Address Space (PGAS) programming model are discussed as a promising approach in this respect DASH is a C++ template library that provides distributed data structures with support for hierarchical locality in a PGAS programming model. Portable efficiency, an essential goal in the design of DASH, can only be achieved with programming abstractions of hardware locality that allow to optimize data structures and algorithms to the underlying system at compile- and run time. Established tools like LIKWID and hwloc provide reliable interfaces to query the hardware topology on node level but fail to construct a global representation of distributed locality domains and do not support accelerator architectures like Intel MIC. We present Locality Hierarchies, an abstraction of distributed, hierarchical locality represented as a modifiable data structure. The underlying model supports heterogeneous systems as first-class use case and introduces a well-defined concept of distance for arbitrary distributed hardware hierarchies. Using common range-based algorithms as motivating examples, we explain how our approach facilitates locality-aware load-balancing and process mapping on SuperMIC compute nodes. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/tobias-fuchs-padal2016-161030124631-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Single node hardware design is shifting to a heterogeneous nature and many of today’s largest HPC systems are clusters that combine accelerators in heterogeneous compute device architectures. The need for new programming abstractions in the advancements to the Exascale era has been widely recognized and variants of the Partitioned Global Address Space (PGAS) programming model are discussed as a promising approach in this respect DASH is a C++ template library that provides distributed data structures with support for hierarchical locality in a PGAS programming model. Portable efficiency, an essential goal in the design of DASH, can only be achieved with programming abstractions of hardware locality that allow to optimize data structures and algorithms to the underlying system at compile- and run time. Established tools like LIKWID and hwloc provide reliable interfaces to query the hardware topology on node level but fail to construct a global representation of distributed locality domains and do not support accelerator architectures like Intel MIC. We present Locality Hierarchies, an abstraction of distributed, hierarchical locality represented as a modifiable data structure. The underlying model supports heterogeneous systems as first-class use case and introduces a well-defined concept of distance for arbitrary distributed hardware hierarchies. Using common range-based algorithms as motivating examples, we explain how our approach facilitates locality-aware load-balancing and process mapping on SuperMIC compute nodes.
DASH Locality Hierarchies (PADAL'16) from Menlo Systems GmbH
]]>
163 3 https://cdn.slidesharecdn.com/ss_thumbnails/tobias-fuchs-padal2016-161030124631-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Expressing and Exploiting Multi-Dimensional Locality in DASH /slideshow/expressing-and-exploiting-multidimensional-locality-in-dash/63358721 zs5wh8zoteaegffrolsn-signature-dd8db26a3345db1c3d0e60b99c8abec1b4be068df553cc41ec94c680e4f01f0e-poli-160623014939
DASH is a realization of the PGAS (partitioned global address space) programming model in the form of a C++ template library. It provides a multidimensional array abstraction which is typically used as an underlying container for stencil- and dense matrix operations. Efficiency of operations on a distributed multi-dimensional array highly depends on the distribution of its elements to processes and the communication strategy used to propagate values between them. Locality can only be improved by employing an optimal distribution that is specific to the implementation of the algorithm, run-time parameters such as node topology, and numerous additional aspects. Application developers do not know these implications which also might change in future releases of DASH. In the following, we identify fundamental properties of distribution patterns that are prevalent in existing HPC applications. We describe a classification scheme of multi-dimensional distributions based on these properties and demonstrate how distribution patterns can be optimized for locality and communication avoidance automatically and, to a great extent, at compile time.]]>

DASH is a realization of the PGAS (partitioned global address space) programming model in the form of a C++ template library. It provides a multidimensional array abstraction which is typically used as an underlying container for stencil- and dense matrix operations. Efficiency of operations on a distributed multi-dimensional array highly depends on the distribution of its elements to processes and the communication strategy used to propagate values between them. Locality can only be improved by employing an optimal distribution that is specific to the implementation of the algorithm, run-time parameters such as node topology, and numerous additional aspects. Application developers do not know these implications which also might change in future releases of DASH. In the following, we identify fundamental properties of distribution patterns that are prevalent in existing HPC applications. We describe a classification scheme of multi-dimensional distributions based on these properties and demonstrate how distribution patterns can be optimized for locality and communication avoidance automatically and, to a great extent, at compile time.]]>
Thu, 23 Jun 2016 01:49:39 GMT /slideshow/expressing-and-exploiting-multidimensional-locality-in-dash/63358721 fuchsto@slideshare.net(fuchsto) Expressing and Exploiting Multi-Dimensional Locality in DASH fuchsto DASH is a realization of the PGAS (partitioned global address space) programming model in the form of a C++ template library. It provides a multidimensional array abstraction which is typically used as an underlying container for stencil- and dense matrix operations. Efficiency of operations on a distributed multi-dimensional array highly depends on the distribution of its elements to processes and the communication strategy used to propagate values between them. Locality can only be improved by employing an optimal distribution that is specific to the implementation of the algorithm, run-time parameters such as node topology, and numerous additional aspects. Application developers do not know these implications which also might change in future releases of DASH. In the following, we identify fundamental properties of distribution patterns that are prevalent in existing HPC applications. We describe a classification scheme of multi-dimensional distributions based on these properties and demonstrate how distribution patterns can be optimized for locality and communication avoidance automatically and, to a great extent, at compile time. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/zs5wh8zoteaegffrolsn-signature-dd8db26a3345db1c3d0e60b99c8abec1b4be068df553cc41ec94c680e4f01f0e-poli-160623014939-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> DASH is a realization of the PGAS (partitioned global address space) programming model in the form of a C++ template library. It provides a multidimensional array abstraction which is typically used as an underlying container for stencil- and dense matrix operations. Efficiency of operations on a distributed multi-dimensional array highly depends on the distribution of its elements to processes and the communication strategy used to propagate values between them. Locality can only be improved by employing an optimal distribution that is specific to the implementation of the algorithm, run-time parameters such as node topology, and numerous additional aspects. Application developers do not know these implications which also might change in future releases of DASH. In the following, we identify fundamental properties of distribution patterns that are prevalent in existing HPC applications. We describe a classification scheme of multi-dimensional distributions based on these properties and demonstrate how distribution patterns can be optimized for locality and communication avoidance automatically and, to a great extent, at compile time.
Expressing and Exploiting Multi-Dimensional Locality in DASH from Menlo Systems GmbH
]]>
290 4 https://cdn.slidesharecdn.com/ss_thumbnails/zs5wh8zoteaegffrolsn-signature-dd8db26a3345db1c3d0e60b99c8abec1b4be068df553cc41ec94c680e4f01f0e-poli-160623014939-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Wait-free data structures on embedded multi-core systems /slideshow/masterseminar-fuchs-brief/41300884 masterseminarfuchs-brief-141108151204-conversion-gate02
Presentation for my master's thesis on wait-free data structures for embedded multi-core systems.]]>

Presentation for my master's thesis on wait-free data structures for embedded multi-core systems.]]>
Sat, 08 Nov 2014 15:12:04 GMT /slideshow/masterseminar-fuchs-brief/41300884 fuchsto@slideshare.net(fuchsto) Wait-free data structures on embedded multi-core systems fuchsto Presentation for my master's thesis on wait-free data structures for embedded multi-core systems. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/masterseminarfuchs-brief-141108151204-conversion-gate02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Presentation for my master&#39;s thesis on wait-free data structures for embedded multi-core systems.
Wait-free data structures on embedded multi-core systems from Menlo Systems GmbH
]]>
966 5 https://cdn.slidesharecdn.com/ss_thumbnails/masterseminarfuchs-brief-141108151204-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Das Spiel (Eigen, Winker, 1975) https://de.slideshare.net/slideshow/presentation-6090205/6090205 presentation-101209075951-phpapp01
]]>

]]>
Thu, 09 Dec 2010 07:59:44 GMT https://de.slideshare.net/slideshow/presentation-6090205/6090205 fuchsto@slideshare.net(fuchsto) Das Spiel (Eigen, Winker, 1975) fuchsto <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/presentation-101209075951-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br>
from Menlo Systems GmbH
]]>
1624 120 https://cdn.slidesharecdn.com/ss_thumbnails/presentation-101209075951-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-fuchsto-48x48.jpg?cb=1620284010 Research interests: • High Performance Computing, Partitioned Global Address Space (PGAS) • Computer networks: Software Defined Networks (SDN), Network Function Virtualization (NFV) • Wait-free and Lock-free progress properties • Parallel programming models • Embedded- and real-time application development, especially digital signal processing Industry focus: • Software engineering for medical platforms (C++, C#) • Communication interfaces (TCP/IP, USB, Bluetooth, ..., RS-232, ... ) • Embedded software for medical devices (ARM STM32, M16, Renesas SH-4, ...) • Embedded multicore systems (ARM Cortex A9) • Real-time signal processing in medical applications, mostly using RTAI www.mnm-team.org/~fuchst https://cdn.slidesharecdn.com/ss_thumbnails/dashbriefpgasnarraytobiasfuchs-161214124439-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/a-multidimensional-distributed-array-abstraction-for-pgas/70133620 A Multidimensional Dis... https://cdn.slidesharecdn.com/ss_thumbnails/dashbriefoverviewtobiasfuchs-161214124426-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/dash-a-c-pgas-library-for-distributed-data-structures-and-parallel-algorithms/70133615 DASH: A C++ PGAS Libra... https://cdn.slidesharecdn.com/ss_thumbnails/tobias-fuchs-padal2016-161030124631-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/dash-locality-hierarchies-padal16/67875756 DASH Locality Hierarch...