Workshop Overview
Workshop Scope
- All types of general-purpose processor legacy-software optimizations for HPC,
- Changes to (collective) communication algorithms or implementations to enable the use of different numerical methods (for example: Lagrangian vs. Eulerian),
- Accelerating of pre-/post-processing in a scientific workflows or axillary tools used in HPC environments,
- Improved maintainability and performance through the use of existing production libraries,
- Revisiting and applying modern compiler (flag) techniques, performance analysis tools, moderate usage of OpenMP pragmas, etc., for performance gains,
- Manual code refactoring, such as loop transformations or changing data structures, to acknowledge the shifting ratio in memory vs. compute capabilities of modern architectures, and
- Using mixed or adaptive precision wherever possible.
The list of contributions which we consider to be outside the focus of this workshop's theme, due to the fact that already many venues for such contributions exist, are:
- Software ported and tailored to specific hardware (unless these changes benefit all state-of-the-art HPC architectures),
- Application auto-tuned for dedicated hardware,
- Proposals of entirely new optimization techniques, and
- Entire scientific software translated to another language, e.g. a conventional language such as C++, or a custom domain-specific language (with the exception to pre-/post-processing frameworks if it shows profound performance benefits).
Workshop Organization
- Mohamed Wahib (RWBC-OIL, AIST, Japan)
- Jens Domke (R-CCS, RIKEN, Japan)
- Artur Podobas (R-CCS, RIKEN, Japan)
- Andreas Knüpfer (ZIH, TU Dresden, Germany)
- Anshu Dubey (ANL, USA)
- Barna Bihari (LC, LLNL, USA)
- Bernd Mohr (JSC, FZJ, Germany)
- Dali Wang (CCSI, ORNL, USA)
- Daniel Molka (DLR, Germany)
- Didem Unat (Koç University, Turkey)
- Hisashi Yashiro (R-CCS, RIKEN, Japan)
- Saurabh Chawdhary (ANL, USA)
- Seyong Lee (ORNL, USA)
Important Dates
- Release 1st CFP: December 19, 2018
Submission deadline: April 18, 2019 (23:59, AOE)- Final Submission Deadline: April 25, 2019 (23:59, AOE; No extensions!)
- Author notification: May 10, 2019
- Camera ready: May 31, 2019
- Workshop date:
- June 20, 2019
- Timeframe: 9:00 - 13:00 (see agenda for details)
- Location: Frankfurt Marriott Hotel (Room: Alabaster 2), Hamburger Allee 2, 60486 Frankfurt am Main
Submission, Review, and Proceedings
Submission must adhere to:
- Only accepted style: LNCS (see Springer's website)
- Single column format
- No modification to font size of LNCS template
- Maximum of 10 pages (min. 6) in PDF format, including figures and references
- Incorrectly formatted papers will be excluded
- Minimum 2 reviewers per submission
- Single-blind peer-review
- Review criteria: relevance to the WS, scientific method, impact on time-to-solution, novelty
Workshop Agenda
- 09:00 – 09:10 Opening remarks
- Invited Keynote
- 09:10 – 09:55:
Benoit Marchand
Abstract: TBA - Two Paper/Invited presentations
- 10:00 – 10:30:
M.N. Farooqi, T. Nguyen, W. Zhang, A.S. Almgren, J. Shalf, and D. Unat
Abstract: Adaptive Mesh Refinement (AMR) is a computational and memory efficient technique for solving partial differential equations. As many of the supercomputers employ GPUs in their systems, AMR frameworks have to be evolved to adapt to large-scale heterogeneous systems. However, it is challenging to employ multiple GPUs and achieve good scalability in AMR because of its complex communication pattern. In this paper, we present our asynchronous AMR runtime system that simultaneously schedules tasks on both CPUs and GPUs and coordinates data movement between different processing units. Our runtime is adaptive to various machine configurations and uses a host resident data model. It helps facilitate using streams to overlap CPU-GPU data transfers with computation and increase device occupancy. We perform strong and weak scaling studies using an Advection solver on Piz Daint supercomputer and achieve good performance. - 10:30 – 11:00:
M. Bianco
Abstract: TBA - 11:00 – 11:30 Morning break
- Two Paper/Invited presentations:
- 11:30 – 12:00:
N.A. Simakov, R.L. Jones-Ivey, A. Akhavan-Safaei, H. Aghakhani, M.D. Jones, A.K. Patra
Abstract: In this work, we report on strategies and results of our initial approach for modernization of Titan2D code. Titan2D is a geophysical mass flow simulation code designed for modeling of volcanic flows, debris avalanches and landslides over a realistic terrain model. It solves an underlying hyperbolic system of partial differential equations using parallel adaptive mesh Godunov scheme. The following work was done during code refactoring and modernization. To facilitate user input two level python interface was developed. Such design permits large changes in C++ and Python low-level while maintaining stable high-level interface exposed to the end user. Multiple diverged forks implementing different material models were merged back together. Data storage layout was changed from a linked list of structures to a structure of arrays representation for better memory access and in preparation for further work on better utilization of vectorized instruction. Existing MPI parallelization was augmented with OpenMP parallelization. The performance of a hash table used to store mesh elements and nodes references was improved by switching from a linked list for overflow entries to dynamic arrays allowing the implementation of the binary search algorithm. The introduction of the new data layout made possible to reduce the number of hash table look-ups by replacing them with direct use of indexes from the storage class. The modifications lead to 8-9 times performance improvement for serial execution. - 12:00 – 12:30:
A. Knüpfer
Abstract: Refactoring is always time consuming, therefore the need to undertake this task again in the future should be minimized. We'll look at the C++ abstract parallel programming models Kokkos, Raja, and Alpaka an how they can provide performance portability between different compute devices like CPUs or accelerators today. Hopefully, they will also be somewhat future-proof for coming compute architectures, avoiding additional effort down the road. We'll also ask the question if/how they make refactoring more complicated and more expensive. - 12:30 – 13:00 Moderated discussion and closing remarks