Section:
Application Domains
Resource Allocation for High Performance and Cloud Computing
In the context of numerical simulations on high performance machines, optimizing data
locality and resource usage is very important for faster execution times and lower energy
consumption. This optimization can be seen as a special case of scheduling problem on
parallel resource, with several challenges. First, instances are typically large: a large
matrix factorization (with blocks) involves about tasks. Then,
HPC platforms consist of heterogeneous and unrelated resources, what is known to make
scheduling problems hard to approximate. Finally, due to co-scheduling effects and shared
communication resources, it is not realistic to accurately model the exact duration of
tasks. All these observations make it impossible to rely on static optimal solutions, and HPC
applications have gone from simple generic static allocations to runtime dynamic scheduling
strategies that make their decisions based on the current state of the platform (the location
of input data), the expected transfer and running times for the tasks, and some affinity and
priority information that have possibly been computed offline. In this context, we are
strongly involved in the design of scheduling strategies for the StarPU runtime, with two
goals: proving that it is possible to design approximation algorithms whose complexity is
extremely small (typically sub-linear in the number of ready tasks), and show that they can
be used in practice with good performance results. We are pursuing collaborations both with
teams developing the StarPU system (Storm) by designing algorithms for the generic scheduling
problems [28], and with teams developing linear algebra algorithms
over the runtime (Hiepacs), by proposing specialized algorithms for specific cases. For
example, in the case of linear algebra applications on heterogeneous platforms, we have
considered the combinatorial optimization problem associated to matrix multiplication, that
is amenable to partitioning the unit square into zones of prescribed areas while minimizing
the overall size of the boundaries. We have improved the best known approximation ratio to
1.15 in [27] and we have shown that the resulting distribution
schemes can indeed be used to design efficient implementations using StarPU
in [34].