WATCH VIDEOThis presentation focuses on adding GPU support to the finite difference domain
specific language Devito. Devito is already capable of generating highly
optimized finite difference code for CPUs, including the Intel KNL, ARM and
Power architectures, parallelized using OpenMP and MPI. It is typically used
as a wave propagator and to calculate gradients using the adjoint-state
method in RTM and FWI. We consider a range of GPU programming models for
automatic code generation with respect to simplicity, performance and
portability to GPUs from different chip manufacturers.
We present early performance results on GPUs using automatically generated
code with OpenMP 5 offloading. We also illustrate how to use the software
graceful degradation feature in Devito to enable HPC programming specialists
to augment the generated code and hand-tune performance. Successful (e.g.
reduced time to solution) code augmentation experiments can then be analyzed
to identify strategies that are then incorporated to the compiler.