Agenda – 2020 Performance, Portability, and Productivity in HPC Forum

Start	*End*	*Session Title*	*Speakers*
10:00 AM	10:10 AM	Day1: AM Plenary: Introduction	Yasaman Ghadar; Nick Romero
10:10 AM	10:30 AM	Day1: AM Plenary: An Overview of the Argonne Aurora Exascale System (Slides), (Video)	Scott Parker
10:30 AM	10:50 AM	Day1: AM Plenary: Oak Ridge Leadership Computing: Science on Summit and Preparing for Frontier (Slides), (Video)	Tjerk Straatsma
10:50 AM	11:10 AM	Day1: AM Plenary: Preparing Apps for the Perlmutter System and beyond at NERSC (Slides), (Video)	Jack Deslippe
11:10 AM	11:30 AM	Session 1: Talk 1 – A Comparison of GPU Programming Models (Slides), (Video)	Trey White
11:10 AM	12:10 PM	Session 2: Talk 1 – Prioritzing OpenMP Features to Provide for Performance, Portability and Productivity (Slides), (Video)	Oscar Hernandez; Vivek Kale
11:10 AM	11:30 AM	Session 3: Talk 1 – AMReX in 2020: Porting for Performance to GPGPU Systems (Slides), (Video)	Kevin Gott
11:30 AM	11:50 AM	Session 1: Talk 2 – Wrong Way: Successes, Failures, and Lessons Learned from Using the “Wrong” Programming Approach for Summit (Slides), (Video)	Philip Roth
11:30 AM	11:50 AM	Session 3: Talk 2 – An overview of FleCSI: a compile-time configurable framework designed to support multi-physics application development (Slides), (Video)	Irina Demeshko
11:50 AM	12:10 PM	Session 1: Talk 3 – The Importance of Kernels for Performance Portability, or How I Learned to Stop Looping and Love the Kernel (Slides), (Video)	Tom Scogland
11:50 AM	12:10 PM	Session 3: Talk 3 – Performance Portable Implementations for Kernels from Geometric Multigrid Methods (Slides)	JaeHyuk Kwack
12:10 PM	12:30 PM	Session 1: Talk 4 – A Study of Cross-Platform, Cross-Compiler Performance and Portability (Slides), (Video)	Veronica G. Vergara Larrea
12:10 PM	12:30 PM	Session 2: Talk 2 – Experience with using OpenACC and OpenMP to achieve performance portability for the Grid lattice QCD library (Slides), (Video)	Meifeng Lin
12:10 PM	12:30 PM	Session 3: Talk 4 – Porting QUDA from CUDA to other backends (Slides), (Video)	Xiaoyong Jin
12:30 PM	01:25 PM	Lunch Break
01:25 PM	01:45 PM	Day 1: PM Plenary: The Fruits of the Interplay between DOE Software Stacks and C++ Standardization (Video)	Daisy Hollman
01:25 PM	02:40 PM	Day 1: PM Plenary: Programming Methodologies for Performance Portability, Current State of Practice and Futures	Douglas Doerfler; Brandon Cook
01:45 PM	02:05 PM	Day 1: PM Plenary: LLNL Portability Abstractions Update (Slides), (Video)	Rich Hurnung
02:05 PM	02:25 PM	Day 1: PM Plenary: Kokkos: Present and Future (Slides), (Video)	Christian Trott
02:25 PM	02:40 PM	Day 1: PM Plenary: Panel Q&A
02:40 PM	03:00 PM	Session 4: Talk 1 – Update on the performance portability of a Wilson Dslash Kernel using Kokkos and SYCL (Slides), (Video)	Balint Joo
02:40 PM	03:00 PM	Session 5: Talk 1 – Preparing performance portable QMCPACK for Exascale (Slides), (Video)	Ye Luo
02:40 PM	03:00 PM	Session 6: Talk 1 – Evaluating the performance of a portable version of HPGMG benchmark for accelerators (Slides), (Video)	Christopher Daley
03:00 PM	03:20 PM	Session 4: Talk 2 – Early SYCL results from the Bristol Performance Portability Study (Slides), (Video)	Tom Deakin
03:00 PM	03:20 PM	Session 5: Talk 2 – Performance portability experience with CUDA and OpenMP offload in GAMESS (Slides) (Video)	Colleen Bertoni
03:00 PM	03:20 PM	Session 6: Talk 2 – SU3_bench, a Micro-benchmark for Exploring Exascale Era Programming Models, Compilers and Runtimes (Slides), (Video)	Douglas Doerfler
03:20 PM	03:40 PM	Session 4: Talk 3 – Experiences tuning SYCL libraries for varied hardware (Slides), (Video)	John Lawson
03:20 PM	03:40 PM	Session 5: Talk 3 – OpenMP Offloading For Density Matrix Renormalization Group Hamiltonian Application Kernel (Slides), (Video)	Wael Elwasif
03:20 PM	03:40 PM	Session 6: Talk 3 – Performance Portability Issues for a Large-Scale Computational Fluid Dynamics Application on Emerging High-Performance Architectures (Slides), (Video)	Eric Nielsen
03:40 PM	04:00 PM	Session 4: Talk 4 – Investigation of the performance of SYCL kernels across various architectures (Slides), (Video)	Brian Homerding
03:40 PM	04:00 PM	Session 5: Talk 4 – GPU I-TASSER: Replica Exchange Monte Carlo (Slides), (Video)	Elijah MacCarthy
03:40 PM	04:00 PM	Session 6: Talk 4 – Performance and portability of abstract algebra operations in C++, Python, and Julia (Slides), (Video)	Jess Woods
04:00 PM	04:20 PM	Afternoon Break
04:20 PM	05:00 PM	Poster Session 1 – Poster 1: Comparative Performance and Porting Effort of HIP and CUDA for an Implicit Monte Carlo Code (Slides), (Video)	Alex Long
04:20 PM	05:00 PM	Poster Session 1 – Poster 2: Experiences with CUDA Streaming in Teton’s Linear Sweep (Slides), (Video)	Robert Chen
04:20 PM	05:00 PM	Poster Session 1 – Poster 3: Kokkos-HClib: enabling high-performance and resiliency for HPC systems (Slides), (Video)	Akihiro Hayashi; Nicolas Morales
04:20 PM	05:00 PM	Poster Session 1 – Poster 4: Parallel Training of Large Knowledge Graph Convolution Networks (Slides), (Video)	Hong-Jun Yoon
04:20 PM	05:00 PM	Poster Session 1 – Poster 5: Productive Auto-Tuning of GPU Codes via Iterative Machine Learning	Wu Feng
04:20 PM	05:00 PM	Poster Session 1 – Poster 6: Halo Exchange Performance on the Sierra Supercomputer (Slides), (Video)	Jason Burmark
04:20 PM	05:00 PM	Poster Session 1 – Poster 7: Accelerating Your Application I/O on HPC Systems (Slides), (Video)	Kathryn Mohror; Cameron Stanavige
04:20 PM	05:00 PM	Poster Session 1 – Poster 8: Preparing the SUNDIALS Library for Heterogeneous Architectures (Slides), (Video)	Cody Balos
10:00 AM	10:35 AM	Day 2: AM Plenary: Intel Data Parallel C++ (Slides), (Video)	Jeff Hammond
10:35 AM	11:10 AM	Day 2: AM Plenary: Lessons Learned in the Sierra Center of Excellence Migrating to Heterogenous Computing (Slides), (Video)	David Richards
11:10 AM	11:30 AM	Session 7: Talk 1 – XGC with Kokkos/Cabana: Plasma Physics on Summit and Beyond (Slides), (Video)	Aaron Scheinberg
11:10 AM	11:30 AM	Session 8: Talk 1 – Towards Performance Portability through an Integrated Programming Eco-System for Tensor Algebra (Slides), (Video)	Roberto Gioiosa
11:10 AM	11:30 AM	Session 9: Talk 1 – Application Development and Readiness for Sierra: An MPI Challenge (Slides), (Video)	James Elliott
11:30 AM	11:50 AM	Session 7: Talk 2 – Experiences incrementally porting a large legacy finite element application to Sierra using Kokkos (Slides), (Video)	Victor Brunini
11:30 AM	11:50 AM	Session 8: Talk 2 – Enabling High-level Parallel Abstractions to Dynamical Cluster Approximation (DCA++) using HPX and GPUDirect on the Summit Supercomputer (Slides), (Video)	Weile Wei
11:30 AM	11:50 AM	Session 9: Talk 2 – What workloads are coming: new workflow mashups, HPC*AI, HPC at the Edge (Slides), (Video)	CJ Newburn
11:50 AM	12:10 PM	Session 7: Talk 3 – Fortran Language Compatibility Library for Kokkos (Slides), (Video)	Geoff Womeldorff
11:50 AM	12:10 PM	Session 8: Talk 3 – From PeleC to PeleACC, to PeleC++: What we learned porting our AMReX application to two modern GPU programming models (Slides), (Video)	Jon Rood
11:50 AM	12:10 PM	Session 9: Talk 3 – Developing Applications for Aurora (Slides), (Video)	Scott Parker
12:10 PM	12:30 PM	Session 7: Talk 4 – Resilient Kokkos: Productive and Performance Portable User-Level Checkpointing for High Performance Computing (Slides), (Video)	Nicolas Morales
12:10 PM	12:30 PM	Session 8: Talk 4 – Porting EOSPAC 6 to Sierra (Slides), (Video)	Anna Pietarila Graham
12:10 PM	12:30 PM	Session 9: Talk 4 – Modern problems require modern solutions: How modern CMake supports modern C++ in performance portability (Slides), (Video)	Jeremiah Wilke
12:30 PM	01:25 PM	Lunch Break
01:25 PM	01:35 PM	Day 2: PM Plenary: El Capitan (Slides), (Video)	Ian Karlin
01:35 PM	01:45 PM	Day 2: PM Plenary: Crossroads (Slides), (Video)	Hai Ah Nam
01:45 PM	02:20 PM	Day 2: PM Plenary: Best practices from the Summit application readiness efforts, with porting of GronOR as example (Slides), (Video)	Tjerk Straatsma
02:20 PM	02:40 PM	Session 10: Talk 1 – Achieving portability for a highly optimized GPU code for 3D Fourier Transforms at extreme problem sizes (Slides), (Video)	Kiran Ravikumar
02:20 PM	02:40 PM	Session 11: Talk 1 – Achieving Performance Portability on Hybrid GPU-CPU Architectures for a Large Scale Material Science Code: the BerkeleyGW Case Study (Slides), (Video)	Mauro Del Ben
02:20 PM	02:40 PM	Session 12: Talk 1 – Timemory: Modular Performance Analysis for HPC (Slides), (Video)	Jonathan Madsen
02:40 PM	03:00 PM	Session 10: Talk 2 – Asynchronous Programming in Modern C++ (Slides), (Video)	Hartmut Kaiser
02:40 PM	03:00 PM	Session 11: Talk 2 – Performance Portability of Remote I/O in Distributed Workflows (Slides), (Video)	Nathan Tallent
02:40 PM	03:00 PM	Session 12: Talk 2 – The Development and Uses of Metrics for Performance, Portability, and Productivity (Slides), (Video)	John Pennycook
03:00 PM	03:20 PM	Session 10: Talk 3 – Porting Numerical Linear Algebra Libraries across Exascale Hardware Platforms and Beyond (Slides), (Video)	Piotr Luszczek
03:00 PM	03:20 PM	Session 11: Talk 3 – Machine Learning Guided Optimal Use of GPU Unified Memory (Slides), (Video)	Murali Emani
03:00 PM	03:20 PM	Session 12: Talk 3 – Techniques for Holistic Quantification of Performance, Portability, and Productivity (Slides), (Video)	Jason Sewall
03:20 PM	03:40 PM	Concluding Remarks	Scott Parker
03:40 PM	04:00 PM	Afternoon Break
04:00 PM	04:40 PM	Poster Session 2 – Poster 1: Porting the NOAA global weather forecast system to the cloud (Slides)	Daniel Abdi
04:00 PM	04:40 PM	Poster Session 2 – Poster 2: SHAD: Productive Programming for High Performance Systems in Standard C++ (Slides), (Video)	Vito Giovanni Castellana
04:00 PM	04:40 PM	Poster Session 2 – Poster 3: A MetaCL Tool for Productive FPGA Programming via Automated Code Generation (Slides), (Video)	Paul Sathre
04:00 PM	04:40 PM	Poster Session 2 – Poster 4: Development of GPU accelerated Smith-Waterman kernel for metagenomics workflow (Slides), (Video)	Muaaz Awan
04:00 PM	04:40 PM	Poster Session 2 – Poster 5: Considerations for performance portability in a commercial particle-in-cell code (Slides), (Video)	Benjamin Cowan
04:00 PM	04:40 PM	Poster Session 2 – Poster 6: CAMP: Compiler Agnostic MetaProgramming, or Portable Performance at Compile Time (Slides), (Video)	Tom Scogland
04:00 PM	04:40 PM	Poster Session 2 – Poster 7: Performance Analysis of NALU on Arch64 (Slides), (Video)	Srinath Vadlamani
04:00 PM	04:40 PM	Poster Session 2 – Poster 8: A Cosine Similarity Methodology to Characterize Proxy-Parent Application Correspondence (Slides)	Omar Aaziz; Jeffery Kuehn