Section: New Results
Bridging the gap between performance and bounds of Cholesky factorization on heterogeneous platforms
Participants : Emmanuel Agullo [Inria Bordeaux Sud-Ouest] , Olivier Beaumont [Inria Bordeaux Sud-Ouest] , Lionel Eyraud-Dubois [Inria Bordeaux Sud-Ouest] , Julien Herrmann, Suraj Kumar [Inria Bordeaux Sud-Ouest] , Loris Marchal, Samuel Thibault [Inria Bordeaux Sud-Ouest] .
In this work, we consider the problem of allocating and scheduling dense linear application on fully heterogeneous platforms made of CPUs and GPUs. More specifically, we focus on the Cholesky factorization since it exhibits the main features of such problems. Indeed, the relative performance of CPU and GPU highly depends on the sub-routine: GPUs are for instance much more efficient to process regular kernels such as matrix-matrix multiplications rather than more irregular kernels such as matrix factorization. In this context, one solution consists in relying on dynamic scheduling and resource allocation mechanisms such as the ones provided by PaRSEC or StarPU. We analyze the performance of dynamic schedulers based on both actual executions and simulations, and we investigate how adding static rules based on an offline analysis of the problem to their decision process can indeed improve their performance, up to reaching some improved theoretical performance bounds which we introduce [17] .