EN FR
EN FR
New Software and Platforms
Bibliography
New Software and Platforms
Bibliography


Section: Application Domains

Resource Allocation for High Performance and Cloud Computing

In the context of numerical simulations on high performance machines, optimizing data locality and resource usage is very important for faster execution times and lower energy consumption. This optimization can be seen as a special case of scheduling problem on parallel resource, with several challenges. First, instances are typically large: a large matrix factorization (with 50×50 blocks) involves about 30·103 tasks. Then, HPC platforms consist of heterogeneous and unrelated resources, what is known to make scheduling problems hard to approximate. Finally, due to co-scheduling effects and shared communication resources, it is not realistic to accurately model the exact duration of tasks. All these observations make it impossible to rely on static optimal solutions, and HPC applications have gone from simple generic static allocations to runtime dynamic scheduling strategies that make their decisions based on the current state of the platform (the location of input data), the expected transfer and running times for the tasks, and some affinity and priority information that have possibly been computed offline. In this context, we are strongly involved in the design of scheduling strategies for the StarPU runtime, with two goals: proving that it is possible to design approximation algorithms whose complexity is extremely small (typically sub-linear in the number of ready tasks), and show that they can be used in practice with good performance results. We are pursuing collaborations both with teams developing the StarPU system (Storm) by designing algorithms for the generic scheduling problems [20], and with teams developing linear algebra algorithms over the runtime (Hiepacs), by proposing specialized algorithms for specific cases. For example, in the case of linear algebra applications on heterogeneous platforms, we have considered the combinatorial optimization problem associated to matrix multiplication, that is amenable to partitioning the unit square into zones of prescribed areas while minimizing the overall size of the boundaries. We have improved the best known approximation ratio to 1.15 in [40] and we have shown that the resulting distribution schemes can indeed be used to design efficient implementations using StarPU in [41].

In the context of Cloud Computing platforms and Data Science, we are interested in resource allocation and data placement strategies. For BigData applications running on data centers platforms, data locality is the largest contributing factor to application performance. In practice, allocation decisions are made at runtime based on strategies that tend to favor local tasks. Our goal is to assess and to improve the efficiency of these runtime strategies. In particular, we have proven that the problem of maximizing locality when allocating map tasks is amenable to a graph orientation and a semi matching problem, what enabled us to assess the efficiency of classical MapReduce allocation algorithm in [34] for the map phase and to propose a low cost algorithm to compute optimal allocation schemes. We also consider more generic VM placement problem for large-scale datacenters. This placement is often handled with naive greedy rules, whereas it is possible to propose online or offline efficient allocation algorithms, for example to optimize the reliability of all applications in a platform with faults  [48] or to co-locate applications with compatible periodic load variations  [47].