First International Workshop on Legacy Software Refactoring for Performance (REFAC'19) in conjunction with ISC'19

Tentative Agenda

Workshop Overview

The "First International Workshop on Legacy Software REFACtoring for Performance" held in conjunction with the ISC High Performance conference (ISC'19) in Frankfurt am Main is the first event of its kind that is dedicated to the much needed shift in focus from hardware to software to achieve performance gains. Modernizing hardware has too long been the primary method of accelerating legacy software, and close to half of the expected performance improvement in legacy codes can be attributed to improve processor technology. More than half of this improvement was based on Moore's law and its observation that transistors will continue to become smaller every few (originally two) years. The remaining hardware improvements came from architectural innovations, such as deeper cache hierarchies, the migration to more exotic architectures (e.g. GPUs), or the utilization of larger and wider vector-units (SIMD), as well as scaling the HPC systems up by giving them more processors and cores. Unfortunately, we are no longer seeing the consistent technology scaling that Gordon Moore observed. Instead, the technology scaling has significantly slowed down, and is expected to continue only for a few more years. Consequently, in the so-called Post-Moore era, the "performance road" forks three-ways, yielding the following alternatives: (1) architectural innovations will attempt to close the performance gap, and an explosion of diverging architectures tailored for specific science domains will emerge, (2) alternative materials and technologies (e.g. non-CMOS technologies) allow the spirit of Moore's law to continue for a foreseeable future, or (3) we abandon the von-Neumann paradigm together and move to a neuromorphic or quantum-like computer (which, in time, might or might not become practical). Independent on what direction we will end up taking in the future, the following will hold: software and algorithmic optimization will be transferable to the first two out of the three identified directions. It is these architecture-oblivious software optimizations that are the primary scope of the proposed workshop.

Workshop Scope

The list of topics we will highly encourage for submissions includes, but is not limited to, the following interdisciplinary research areas:

  • All types of general-purpose processor legacy-software optimizations for HPC,
  • Changes to (collective) communication algorithms or implementations to enable the use of different numerical methods (for example: Lagrangian vs. Eulerian),
  • Accelerating of pre-/post-processing in a scientific workflows or axillary tools used in HPC environments,
  • Improved maintainability and performance through the use of existing production libraries,
  • Revisiting and applying modern compiler (flag) techniques, performance analysis tools, moderate usage of OpenMP pragmas, etc., for performance gains,
  • Manual code refactoring, such as loop transformations or changing data structures, to acknowledge the shifting ratio in memory vs. compute capabilities of modern architectures, and
  • Using mixed or adaptive precision wherever possible.
It is important to mention that all time-to-solution optimizations must be performed under the premise that the results produced by the scientific code are either 1:1 comparable, won't break numerical stability, or pass a given set of verifications tests, in case the application/library includes such correctness checking. Hence, HPC experts submitting to our workshop are advised to collaborate with domain experts while performing such optimizations. Furthermore, we look forward to cost saving estimates, based on CPU cycles spend by the software vs. CPU cycles saved through optimization while using realistic data/input sets, in the submitted manuscripts.

The list of contributions which we consider to be outside the focus of this workshop's theme, due to the fact that already many venues for such contributions exist, are:

  • Software ported and tailored to specific hardware (unless these changes benefit all state-of-the-art HPC architectures),
  • Application auto-tuned for dedicated hardware,
  • Proposals of entirely new optimization techniques, and
  • Entire scientific software translated to another language, e.g. a conventional language such as C++, or a custom domain-specific language (with the exception to pre-/post-processing frameworks if it shows profound performance benefits).
Potential submission in these areas will be ranked accordingly in the peer-review process and may only be accepted if slots are available.

Workshop Organization

  • Mohamed Wahib (RWBC-OIL, AIST, Japan)
  • Jens Domke (R-CCS, RIKEN, Japan)
  • Artur Podobas (R-CCS, RIKEN, Japan)
Program Committee (tentative):
  • Andreas Knüpfer (ZIH, TU Dresden, Germany)
  • Anshu Dubey (ANL, USA)
  • Barna Bihari (LC, LLNL, USA)
  • Bernd Mohr (JSC, FZJ, Germany)
  • Dali Wang (CCSI, ORNL, USA)
  • Daniel Molka (DLR, Germany)
  • Didem Unat (Koç University, Turkey)
  • Hisashi Yashiro (R-CCS, RIKEN, Japan)
  • Saurabh Chawdhary (ANL, USA)
  • Seyong Lee (ORNL, USA)

Important Dates

The tentative time-line for the REFAC'19 workshop will be as follows:
  • Release 1st CFP: December 19, 2018
  • Submission deadline: April 18, 2019 (23:59, AOE)
  • Final Submission Deadline: April 25, 2019 (23:59, AOE; No extensions!)
  • Author notification: May 10, 2019
  • Camera ready: May 31, 2019
  • Workshop date:

Submission, Review, and Proceedings

Submission website: REFAC'19 (EasyChair)

Submission must adhere to:
  • Only accepted style: LNCS (see Springer's website)
  • Single column format
  • No modification to font size of LNCS template
  • Maximum of 10 pages (min. 6) in PDF format, including figures and references
  • Incorrectly formatted papers will be excluded
Review and notification process:
  • Minimum 2 reviewers per submission
  • Single-blind peer-review
  • Review criteria: relevance to the WS, scientific method, impact on time-to-solution, novelty
Accepted papers will be published in ISC's Workshop Proceedings (link: TBA).

Workshop Agenda

  • 09:00 – 09:10 Opening remarks

  • Invited Keynote

    • 09:10 – 09:55: Benoit Marchand
      Abstract: TBA
  • Two Paper/Invited presentations

    • 10:00 – 10:30: M.N. Farooqi, T. Nguyen, W. Zhang, A.S. Almgren, J. Shalf, and D. Unat
      Abstract: Adaptive Mesh Refinement (AMR) is a computational and memory efficient technique for solving partial differential equations. As many of the supercomputers employ GPUs in their systems, AMR frameworks have to be evolved to adapt to large-scale heterogeneous systems. However, it is challenging to employ multiple GPUs and achieve good scalability in AMR because of its complex communication pattern. In this paper, we present our asynchronous AMR runtime system that simultaneously schedules tasks on both CPUs and GPUs and coordinates data movement between different processing units. Our runtime is adaptive to various machine configurations and uses a host resident data model. It helps facilitate using streams to overlap CPU-GPU data transfers with computation and increase device occupancy. We perform strong and weak scaling studies using an Advection solver on Piz Daint supercomputer and achieve good performance.
    • 10:30 – 11:00: M. Bianco
      Abstract: TBA
  • 11:00 – 11:30 Morning break

  • Two Paper/Invited presentations:

    • 11:30 – 12:00: N.A. Simakov, R.L. Jones-Ivey, A. Akhavan-Safaei, H. Aghakhani, M.D. Jones, A.K. Patra
      Abstract: In this work, we report on strategies and results of our initial approach for modernization of Titan2D code. Titan2D is a geophysical mass flow simulation code designed for modeling of volcanic flows, debris avalanches and landslides over a realistic terrain model. It solves an underlying hyperbolic system of partial differential equations using parallel adaptive mesh Godunov scheme. The following work was done during code refactoring and modernization. To facilitate user input two level python interface was developed. Such design permits large changes in C++ and Python low-level while maintaining stable high-level interface exposed to the end user. Multiple diverged forks implementing different material models were merged back together. Data storage layout was changed from a linked list of structures to a structure of arrays representation for better memory access and in preparation for further work on better utilization of vectorized instruction. Existing MPI parallelization was augmented with OpenMP parallelization. The performance of a hash table used to store mesh elements and nodes references was improved by switching from a linked list for overflow entries to dynamic arrays allowing the implementation of the binary search algorithm. The introduction of the new data layout made possible to reduce the number of hash table look-ups by replacing them with direct use of indexes from the storage class. The modifications lead to 8-9 times performance improvement for serial execution.
    • 12:00 – 12:30: A. Knüpfer
      Abstract: Refactoring is always time consuming, therefore the need to undertake this task again in the future should be minimized. We'll look at the C++ abstract parallel programming models Kokkos, Raja, and Alpaka an how they can provide performance portability between different compute devices like CPUs or accelerators today. Hopefully, they will also be somewhat future-proof for coming compute architectures, avoiding additional effort down the road. We'll also ask the question if/how they make refactoring more complicated and more expensive.
  • 12:30 – 13:00 Moderated discussion and closing remarks