KERDATA

KERDATA - 2025

2025Activity reportProject-Team‌KERDATA

RNSR: 200920935W

Research center Inria Centre at‌ Rennes University
In partnership with:Institut national des‌ sciences appliquées de Rennes
Team name: Enabling the‌ Edge-Cloud-HPC Data Continuum
In collaboration with:Institut de‌ recherche en informatique et systèmes aléatoires (IRISA)

Creation‌ of the Project-Team: 2025 January 01

Each year,‌ Inria research teams publish an Activity Report presenting‌ their work and results over the reporting period.‌ These reports follow a common structure, with some‌ optional sections depending on the specific team. They‌ typically begin by outlining the overall objectives and‌ research programme, including the main research themes, goals,‌ and methodological approaches. They also describe the application‌ domains targeted by the team, highlighting the scientific‌ or societal contexts in which their work is‌ situated.

The reports then present the highlights of‌ the year, covering major scientific achievements, software developments,‌ or teaching contributions. When relevant, they include sections‌ on software, platforms, and open data, detailing the‌ tools developed and how they are shared. A‌ substantial part is dedicated to new results, where‌ scientific contributions are described in detail, often with‌ subsections specifying participants and associated keywords.

Finally, the‌ Activity Report addresses funding, contracts, partnerships, and collaborations‌ at various levels, from industrial agreements to international‌ cooperations. It also covers dissemination and teaching activities,‌ such as participation in scientific events, outreach, and‌ supervision. The document concludes with a presentation of‌ scientific production, including major publications and those produced‌ during the year.

Keywords

Computer Science and Digital‌ Science

A1.1.1. Multicore, Manycore
A1.1.4. High performance computing‌
A1.1.5. Exascale
A1.1.9. Fault tolerant systems
A1.3. Distributed‌ Systems
A1.3.5. Cloud
A1.3.6. Fog, Edge
A2.6.2. Middleware‌
A3.1.2. Data management, quering and storage
A3.1.3. Distributed‌ data
A3.1.8. Big data (production, storage, transfer)
A6.2.7.‌ HPC for machine learning
A6.3. Computation-data interaction
A7.1.1.‌ Distributed algorithms
A9.2. Machine learning
A9.7. AI algorithmics‌

1 Team members, visitors, external collaborators

Research‌ Scientists

Gabriel Antoniu [Team leader, INRIA‌, Senior Researcher, HDR]
Silvina Caino‌ Lores [INRIA, ISFP]
Jakob Luettgau‌ [INRIA, Researcher, from Oct 2025]
Jakob Luettgau [‌INRIA, Starting Research‌ Position, until Sep‌‌ 2025]
Guillaume Pallez [INRIA, Researcher‌, HDR]
François‌ Tessier [INRIA,‌‌ ISFP]

Faculty Member

Alexandru Costan [INSA‌ RENNES, Associate Professor‌, until Sep 2025‌‌, HDR]

PhD Students

Robin Boezennec [‌INRIA]
Arthur Jaquard‌ [INRIA]
Theo‌‌ Jolivel [INRIA]
Cedric Prigent [INRIA‌, until Feb 2025‌]
Simon Renard [‌‌INRIA, from Oct 2025]
Alix Tremodeux‌ [UNIV RENNES,‌ from Sep 2025]‌‌
Mathis Valli [INRIA]

Technical Staff

Thomas‌ Badts [INRIA,‌ Engineer]
Julien Monniot‌‌ [INRIA, Engineer, until May 2025‌]
Jean Etienne Ndamlabin‌ Mboula [INRIA,‌‌ Engineer]

Interns and Apprentices

Remy Chiv [‌INRIA, Intern,‌ from May 2025 until‌‌ Oct 2025]
Alix Tremodeux [ENS DE‌ LYON, Intern,‌ until Feb 2025]‌‌

Administrative Assistants

Laurence Dinh [INRIA]
Armelle‌ Mozziconacci [CNRS]‌
Gunther Tessier [INRIA‌‌]

Visiting Scientists

Elias Del Pozo Punal [‌UNIV CARLOS III,‌ from Mar 2025 until‌‌ Jul 2025]
Tomasz Kanas [UNIV VARSOVIE‌, from Sep 2025‌ until Nov 2025]‌‌

2 Overall objectives

2.1 Context: the emergence of‌ the Edge-Cloud-HPC Continuum.

As‌ witnessed in industry and‌‌ science and highlighted in strategic documents such as‌ the European ETP4HPC Strategic‌ Research Agenda 90,‌‌ there is a clear trend to combine numerical‌ computations, large-scale data analytics‌ and AI techniques to‌‌ improve the results and efficiency of traditional HPC‌ applications, and to advance‌ new applications in fields‌‌ such as autonomous vehicles, digital twins, smart buildings/towns,‌ etc. A typical scenario‌ consists in Edge devices‌‌ creating streams of input data, which are processed‌ by data analytics and‌ machine learning applications in‌‌ the Cloud; alternatively (or in parallel!) they can‌ feed simulations on large,‌ specialised HPC systems, to‌‌ provide insights and help for prediction of some‌ future system state. Such‌ emerging applications typically need‌‌ to be implemented as complex workflows and require‌ the coordinated use of‌ supercomputers, Cloud data centres‌‌ and Edge-processing devices. This assembly is called the‌ Computing Continuum (CC). It‌ raises challenges at multiple‌‌ levels: at the application/workflow level, to bridge simulations,‌ machine learning and data-driven‌ analytics; at the middleware‌‌ level, adequate tools must enable efficient deployment and‌ orchestration of the workflow‌ components across the whole‌‌ distributed infrastructure; and, finally, a capable resource management‌ system must allocate a‌ suitable set of components‌‌ of the infrastructure to run the application workflow,‌ preferably in a dynamic‌ and adaptive way, taking‌‌ into account the specific capabilities of each component‌ of the underlying heterogeneous‌ infrastructure.

While each level‌‌ exhibits specific associated challenges, there are also common,‌ cross-layer concerns, among which‌ we specifically highlight two.‌‌ The first cross-layer concern regards sustainability, understood‌ as an optimization goal‌ encompassing energy efficiency and‌‌ the reduction of the‌ environmental impact. The second cross-layer concern is related‌ to the rapid development of AI-related workflows, which‌ creates specific needs at multiple levels.

Our objective:‌ Enable the Data Continuum.

Our research project aims‌ to address some open challenges at each of‌ the aforementioned three levels, while considering the two‌ aforementioned transverse concerns. We specifically focus on data-related‌ challenges posed by the requirements (storage, processing, analytics)‌ of complex workflows executed on the Edge-Cloud-HPC continuum‌ and propose innovative algorithms and software architecture solutions‌ towards a Data Continuum.

2.2 Application/workflow-level challenges

In‌ the current state, multitudes of software development stacks‌ are tailored to specific use cases, with no‌ guarantee of interoperability between them. This greatly impedes‌ application software development for integrated CC use cases.‌ Moreover, specific software stacks have been developed for‌ HPC (e.g., based on optimized MPI libraries able‌ to leverage high-end network interconnects), data analytics (e.g.,‌ based on Spark, designed for commodity clusters available‌ in cloud datacenters) and AI (e.g., TensorFlow or‌ PyTorch), with different requirements for their initial target‌ execution infrastructures. Components based on such software stacks‌ cannot be integrated efficiently together to support CC‌ workflows, as their assumptions about the underlying infrastructure‌ are different. Programming a complex, hybrid workflow at‌ the highest level requires the ability to consistently‌ combine such workflow components in a unified framework.‌ This requires flexible programming models and supporting environments,‌ which also safeguard performance and energy efficiency. Composability‌ (the ability to combine multiple programming models or‌ software stacks for a single application with defined‌ rules) and reproducibility of workflow execution will be‌ very valuable in this context.

2.3 Middleware-level challenges‌

Similarly, compatibility and interoperability across all parts of‌ a CC infrastructure must be assured; in particular,‌ this includes data formats, storage abstractions, communication, data‌ processing and data analysis paradigms. It first requires‌ a deep understanding of the I/O behaviour of‌ the distributed workflows. As an illustrative example, upcoming‌ Exascale HPC workflows deployed on supercomputers as part‌ of the continuum will continue to highlight the‌ lack of infrastructures and methodologies to store and‌ analyze the huge results of running simulations -‌ should this storage or analysis be performed on‌ HPC systems or on cloud-based infrastructures. This can‌ limit the scalability potential and lead to sub-optimal‌ usage of the computing infrastructures. As in some‌ cases storing all data (originated from sensors or‌ generated by simulations) may be infeasible, thus new‌ scalable approaches are needed. The goal is to‌ enable processing and analysis of such massive outputs‌ of data on various parts of the continuum‌ infrastructure during and after the HPC simulations through‌ asynchronous I/O and in-situ or in-transit processing inside‌ or outside the HPC system, thus avoiding storage.‌

2.4 Resource management challenges

Large-scale heterogeneity must be‌ managed in an effective and efficient way. This‌ again cuts across compute, storage and communication systems,‌ and the scheduling/orchestration has to optimize the mapping‌ of workflows onto the CC resources with regard to performance and energy‌ use. A challenge here‌ is to enable the‌‌ design of adequate data storage architectures coping in‌ particular with capacity-related or‌ energy-related constraints that may‌‌ diversely concern certain parts of the continuum (Edge,‌ but also energy-bound supercomputers‌ at the post-Exascale age,‌‌ where sustainability is a primary consideration).

2.5 Approach,‌ methodology, platforms

KerData's global‌ approach consists in modelling,‌‌ designing, implementing and evaluating distributed algorithms and software‌ architectures to address some‌ of the data-related challenges‌‌ described above. A specific description of the research‌ questions we address is‌ provided in the next‌‌ section. We will generally focus on hybrid infrastructures‌ (Edge/Cloud/supercomputers), although some of‌ our research may not‌‌ span across the complete spectrum of the continuum.‌

Our research balances theoretical‌ modelling (thanks to the‌‌ recent arrival of Guillaume Pallez) with a predominantly‌ experimental validation methodology (traditionally‌ carried out by most‌‌ team members as part of the former KerData‌ team). Overall, to validate‌ our proposed algorithms and‌‌ architectures, we build software prototypes, then validate them‌ at large scale on‌ real testbeds and experimental‌‌ platforms.

We will strongly rely on the Grid'5000/SLICES‌ FR platform. Moreover, thanks‌ to our projects and‌‌ partnerships (in particular in EuroHPC projets building pre-Exascale‌ platforms, such as ACROSS‌ and EUPEX), we have‌‌ access to reference supercomputer testbeds, such as the‌ Karolina1 and Irene‌ (CEA). More importantly, the‌‌ team is leading Exa-DoST (2023-2029), the project of‌ the NumPEx program focused‌ on data-related challenges for‌‌ the Exascale, as part of a national effort‌ to design and build‌ the software infrastructure for‌‌ Jules Verne, the first Exascale machine to be‌ installed in France. All‌ these are excellent opportunities‌‌ to validate our results on advanced realistic platforms.‌

2.6 Collaboration strategy

We‌ chose to work in‌‌ close collaboration with some of the leading international‌ teams in the areas‌ of data management for‌‌ Edge, Clouds and HPC systems in Academia. As‌ an example, we have‌ been building and maintaining‌‌ a long-term, privileged partnership with Argonne National Laboratory‌ (USA), a top player‌ in USA HPC research‌‌ field, through a series of Associate Team projects‌ (Data@Exascale, UNIFY, UNIFY 2)‌ in the framework of‌‌ the JLESC international laboratory. More recently we initiated‌ collaborations including Oak Ridge‌ National Laboratory (USA) -‌‌ where the most powerful supercomputer available today (Frontier)‌ is running; we also‌ collaborate with DFKI (Germany),‌‌ a strategic Inria partner in the AI area.‌ In industry, formal collaborations‌ are currently in place‌‌ with ATOS/Eviden, a strategic HPC stakeholder in France‌ and DataDirectNetworks (DDN), a‌ major storage company, in‌‌ the context of national (PEPR) and European collaborative‌ projects.

2.7 Alignment with‌ institutional, national and European‌‌ strategies

Data-intensive applications exhibit several common requirements with‌ respect to the need‌ for data storage and‌‌ I/O processing at very large scales, to support‌ complex workflows combining scientific‌ simulation and data analytics.‌‌ While our past activity was already aligned with‌ Inria's strategic objectives 62‌, which acknowledged HPC-Big‌‌ Data convergence as one‌ of the priorities of our institute, our project‌ for the future goes beyond. It explicitly leverages‌ the challenges identified in the latest edition of‌ the ETP4HPC agenda 90, which highlights the‌ evolution of HPC from a traditional supercomputer-centric vision‌ to an enlarged vision where complex workflows are‌ distributed across interconnected supercomputers, Clouds and Edge infrastructures.‌ Our research program is addressing some of these‌ challenges. In addition, at the national level, our‌ team is leading two strategic PEPR projects whose‌ respective scientific programs have been defined based on‌ this continuum-aware vision. The first one is the‌ Exa-DoST project (Exascale Data-Oriented Software and Tools), a‌ 6.2 M€ project within the NumPEx PEPR program‌ (2023-2029), which aims to provide the software infrastructure‌ for the future Exascale supercomputer expected to be‌ installed in France in 2025 (Jules Vernes). The‌ second one is STEEL, a 2.8 M€ project‌ (Secure and efficient daTa storagE and procEssing on‌ cLoud-based infrastructures) within the CLOUD PEPR program (2023-2030).‌ These projects (defined for 7+ years) are structuring‌ many of our long-term activities.

In addition, some‌ of our concrete collaborative projects involve some of‌ Inria's main strategic partners: DFKI (the main German‌ research center in artificial intelligence) through the ENGAGE‌ Inria-DFKI project started in 2022; and ATOS/Eviden, through‌ the ACROSS and EUPEX H2020 EuroHPC projects.

3‌ Research program

Figure‌ 1: Overview of the research program.

The‌ emergence of the Computing Continuum raises challenges at‌ multiple levels: at the application/workflow level, at the‌ middleware level and at the resource management level.‌ We structured our research program accordingly, in three‌ axes.

The first axis covers workflow-level/application-level research directions.‌ It addresses questions like: how to enable workflow‌ composition across the continuum? How to ensure the‌ reproducibility of workflow execution? How to leverage different‌ sources of metadata to establish a provenance chain‌ (i.e., a record trail of the overall state‌ of the application and its intermediate results) that‌ builds trust on the workflow's results? How could‌ data models support data volume and transfer reduction‌ as a step towards resource sustainability of applications‌ in the continuum? It also includes some more‌ specific research directions related to the execution of‌ distributed AI workflows across the Computing Continuum (involving‌ parallel learning and federated learning).

The second axis‌ addresses research challenges related to middleware-level data management‌ across the continuum, where workflows combining simulation, data‌ analytics and AI are being deployed. In particular,‌ this axis plans to cover topics such as‌ I/O behaviour characterization, storage-centric hybrid infrastructure convergence, and‌ data interoperability across hybrid HPC/Cloud/Edge infrastructures. It also‌ addresses the question: how to perform in-situ data‌ analysis for post-Exascale workflows processing continuous data flows,‌ while considering both performance and energy efficiency?

Figure 2:‌ Storage heterogeneity across the Computing Continuum.

The third‌ (lower-level) axis focuses on resource management, with a‌ strong focus on storage resources, but not only. It addresses questions such‌ as: how to provision‌ heterogeneous storage resources across‌‌ hybrid HPC/Cloud/Edge infrastructures (Figure 2)? What would be‌ a frugal data storage‌ architecture enabling the transition‌‌ to post-Exascale workflows? How to leverage emerging storage‌ approaches such as disaggregated‌ storage (i.e. a set‌‌ of storage units physically separated from the compute‌ units) and computational storage‌ (i.e. storage units augmented‌‌ with some limited integrated computational capabilities)? Finally: how‌ can resource managers and‌ HPC transform/evolve to better‌‌ adapt to climate change?

We identified two transverse‌ (vertical) themes that are‌ present in some of‌‌ the research topics of the three (horizontal) axes:‌ artificial intelligence (as a‌ target type of workflow‌‌ to be supported, but also as an enabling‌ technique) and sustainability (including‌ aspects related to energy‌‌ efficiency, frugality and adaptation to emerging applications and‌ hardware technologies in response‌ to climate change).

3.1‌‌ Axis 1: Supporting Data-Centric Applications and Workflows Running‌ Across the Computing Continuum‌

Today, there is a‌‌ need to efficiently integrate simulations, data analytics and‌ learning, which requires interoperable‌ solutions for data processing‌‌ 90. As an example, upcoming large-scale scientific‌ experiments like the Square‌ Kilometer Array (SKA) 2‌‌ are expected to process raw data in the‌ order of an exabyte‌ per day 3.‌‌ Processing these data volumes requires complex scientific workflows‌ able to extract knowledge‌ and produce insight at‌‌ every stage: from the instruments and devices producing‌ data that needs to‌ be reduced and pre-processed‌‌ in situ, to the service-oriented visualization and exploration‌ dashboards that need to‌ be customized for the‌‌ use case of the domain scientist. Existing works‌ on workflow composition and‌ deployment in the continuum‌‌ focus on task-flow control and are disconnected from‌ data patterns and structures‌ beyond domain-specific applications 40‌‌, 31. Moreover, general approaches for representing‌ knowledge and provenance in‌ the form of metadata‌‌ are also lacking for such workflows.

As different‌ communities leverage the Computing‌ Continuum, they express the‌‌ need to make their research verifiable by others.‌ This is exacerbated by‌ the pervasive usage of‌‌ AI, as there is increasing awareness about potential‌ ethical and practical implications‌ 105. The explainability‌‌ (i.e., making AI's decision-making process understandable) and transparency‌ of AI (i.e., ensuring‌ clarity in AI's design,‌‌ data and operation) are particularly concerning 30.‌ Advancing explainability and transparency‌ in AI is currently‌‌ an essential priority for responsible and trustworthy AI-powered‌ applications. This requires advances‌ in repeatability, replicability, and‌‌ reproducibility (3R’s) accross the Computing Continuum 41,‌ 111.

As AI-oriented‌ workflows tend to gain‌‌ an increasing share, it becomes important to address‌ the performance and scalability‌ of machine learning (ML)‌‌ distributed algorithms executed across the Computing Continuum. Methods‌ like deep learning (DL)‌ and federated learning (FL)‌‌ leverage different technologies to produce insight from large‌ volumes of data. Despite‌ increasing convergence between DL‌‌ and HPC 76, 44, the training‌ of DL models remains‌ time-consuming and resource-intensive. In‌‌ FL, powerful facilities (Cloud‌ or HPC) are used to train a global‌ model, while the local, personalized training is typically‌ done close to the data production sites on‌ less powerful computational resources (Edge). This yields the‌ challenge of managing heterogeneity (e.g., differences in computation‌ capacity, network latency and node volatility) as well‌ as variability in data distributions among clients, while‌ respecting the privacy requirements of potentially malicious devices.‌

In summary, this axis covers research directions to‌ support the composition of scalable and reproducible workflows‌ comprising diverse applications –from simulations to AI–, while‌ addressing the challenges of an heterogeneous environment.

Data-Centric‌ Workflow Composition in the Computing Continuum.

The scientific‌ community has reached consensus that common interfaces for‌ data management in the continuum are necessary 70‌. Unified data abstractions can enable the interoperability‌ of data storage and processing across the continuum‌ and facilitate data analytics at all levels 53‌, alleviating the disconnect between application- and storage-oriented‌ approaches to interoperability. However, no unified data modeling‌ approaches exist for structuring and representing data on‌ a logical level across the computing continuum.

The‌ first steps in this research direction involve establishing‌ what are the essential attributes needed to represent‌ data in the different programming models coexisting in‌ the continuum (e.g., ML models, simulation data, annotations‌ resulting from analysis). We will systematically categorize these‌ attributes to deliver data abstractions and models that‌ can be specialized for different tasks. In addition,‌ we will investigate how to embed metadata in‌ these abstractions so that future work can explore‌ new ways of describing, processing, and tracking data‌ at the workflow level, which aligns with the‌ topics of workflow instrumentation and reproducibility. On a‌ longer term, we will also study how these‌ data models could support data volume and transfer‌ reduction as a step towards resource sustainability of‌ applications in the continuum.

Enabling Reproducibility and Trustworthiness‌ in Complex Workflows Across the Computing Continuum.

Current‌ approaches to support workflow reproducibility are based on‌ workflow modelling 104 or simulation 26, 112‌. These approaches raise some important challenges in‌ terms of specification, modelling, and validation to support‌ reproducibility in the Computing Continuum. For example, it‌ is increasingly difficult to model the heterogeneity and‌ volatility of Edge devices or to assess the‌ impact of the inherent complexity of hybrid Edge-Cloud‌ deployments on performance. With the rise of AI‌ workflows, the issue of reproducibility is aggravated by‌ the limitations to our ability to reason about‌ the decision-making process of many machine learning models‌ that act as a black box 33,‌ and the lack of comprehensive specifications to the‌ data that needs to be collected to establish‌ a provenance chain (i.e., a record trail of‌ the overall state of the application and its‌ intermediate results) that builds trust on the workflow's‌ results 46.

We aim to tackle these‌ challenges through a rigorous methodology supporting the automatic‌ deployment, the complete analysis cycle and the optimization of applications on the‌ Computing Continuum. We started‌ to implement this methodology‌‌ in the E2Clab 102 software tool for workflow‌ lifecycle management across the‌ Continuum. For the short‌‌ term, in the framework of the STEEL project‌ of the PEPR Cloud,‌ we plan to investigate‌‌ how to further enrich both the methodology and‌ the tool, in order‌ to support the next‌‌ generation of scientific testbeds (e.g., SLICES-FR) and‌ the non-trivial reproducibility of‌ ML and DL workflows.‌‌ This is a particularly challenging direction due to‌ the increased degree of‌ randomness (i.e., in terms‌‌ of initial parameters and hyperparameters settings) incurred by‌ such applications. We will‌ expand E2Clab to capture‌‌ provenance metadata during the execution of AI workflows,‌ which includes a detailed‌ record of data sources,‌‌ processing steps, model configurations, and computational resources utilized.‌ This provenance metadata can‌ be leveraged not only‌‌ to ensure transparency and traceability throughout the AI‌ lifecycle, but also to‌ conduct resource, energy and‌‌ performance optimizations. For the longer term we will‌ work towards the definition‌ of ontologies and taxonomies‌‌ for AI workflow provenance data to build a‌ theoretical foundation for developing‌ provenance data management systems‌‌ tailored for the different stakeholders involved in AI‌ applications.

Efficient Parallel Continual‌ Learning.

Some scenarios of‌‌ DL training involve the need to assimilate new‌ training data arriving continuously.‌ This kind of incremental‌‌ training suffers from catastrophic forgetting (i.e., new patterns‌ are reinforced at the‌ expense of previously acquired‌‌ knowledge). Training from scratch each time new training‌ data becomes available would‌ result in extremely long‌‌ training times and massive data accumulation. Rehearsal-based continual‌ learning mixes samples from‌ previous training tasks with‌‌ samples from new training tasks to alleviate catastrophic‌ forgetting, but research to‌ date has not addressed‌‌ performance and scalability of these methods 39,‌ 101, 54,‌ 59.

We propose‌‌ asynchronous data management techniques that enable the design‌ and implementation of a‌ scalable distributed rehearsal buffer‌‌ abstraction, which is instrumental in enabling continual learning‌ to take advantage of‌ data-parallel techniques. So far,‌‌ this solution was validated for class-incremental classification problems.‌ The approach could however‌ be easily applied to‌‌ generative models (in which case we can simply‌ use one class to‌ store all representatives). This‌‌ is a short-term research direction we intend to‌ explore in the context‌ of our new research‌‌ project. For the longer term, we plan to‌ further explore other parallelization‌ approaches (model-based and hybrid)‌‌ to address the challenges posed by evolving datasets‌ in DL models.

Scalable,‌ Secure and Resource-Efficient Federated‌‌ Learning.

FL aims to achieve an accuracy close‌ to the one achieved‌ by centralized models but‌‌ in a scalable and resource-efficient manner. Simultaneously, FL‌ is subject to security‌ threats coming from the‌‌ edge of the network since malicious peers may‌ attempt to manipulate the‌ learning process, compromise the‌‌ privacy of other peers, or disrupt the training‌ altogether. Clustered FL –grouping‌ clients with similar data‌‌ distributions and by training‌ personalized models in each identified cluster– is a‌ mechanism to support low resource utilization, but existing‌ approaches 72, 51 mainly focus on the‌ achieved accuracy of the clustering mechanisms, overlooking system‌ and infrastructure resource constraints like energy consumption. At‌ the same time, current threat mitigation approaches 47‌, 116, 36, 84 rely on‌ robust aggregation, anomaly detection and generative models for‌ defending against poisoning attacks. Yet, they either have‌ limited defensive capabilities due to their underlying design‌ or are impractical to use as they rely‌ on constraining building blocks.

For the short term,‌ we plan to explore approaches to scalable, secure‌ and resource-efficient FL considering for the first time‌ the device heterogeneity, the training accuracy and the‌ robustness against malicious activity simultaneously. A first direction‌ consists in devising resource-constrained clustering algorithms, specifically tailored‌ for FL executed at the edge. The goal‌ is to enable transparent adaptation to the execution‌ environment (e.g., node volatility, malicious attacks, network congestion)‌ by automatically tuning the FL parameters in order‌ to improve user-defined performance metrics (e.g., energy efficiency,‌ execution time, accuracy). Generative model based approaches have‌ been gaining increasing interest, and are shown to‌ be more resilient against a wider range of‌ attacks. In this context, we will continue to‌ extend FedGuard 60, our novel FL framework‌ that utilizes the generative capabilities of Conditional Variational‌ AutoEncoders (CVAE) to effectively defend against poisoning attacks‌ with tuneable overhead in communication and computation. We‌ plan to enhance the robustness of this approach‌ through new aggregation operators and under different levels‌ of dataset imbalance, including highly imbalanced datasets with‌ very few samples per client. For the longer‌ term, a more challenging direction that we plan‌ to explore is how FedGuard and other strategies‌ perform in a setup where clients get access‌ to a stream of incoming data (i.e., dynamic‌ datasets).

3.2 Axis 2: Data-Aware Middleware Approaches for‌ the Computing Continuum

Supporting emerging scenarios over the‌ domains of “modelling and simulation”, “AI”, “Analytics” and‌ “Internet of things” (IoT) across the Computing Continuum‌ leads to new data movement challenges. This is‌ due in part to the variety of storage‌ systems and to the increasing gaps between the‌ processing paradigms that developed separately in these environments.‌ In this context, it is necessary to explore‌ the different ways in which storage models can‌ converge, notably through a thorough understanding of workloads‌ on the one hand, and efficient data-aware middleware‌ on the other. Within KerData, we propose to‌ address this problem from multiple complementary perspectives.

Firstly,‌ we will study the I/O behavior of scientific‌ workloads running across the Computing Continuum. Understanding what,‌ when, where, and how I/O-intensive applications read or‌ write data is decisive for making the right‌ decisions, especially when it comes to scheduling. Optimization‌ goals need to consider both performance and energy‌ efficiency and potential necessary trade-offs between both. We‌ will then study I/O optimization techniques for extreme-scale workloads. Although a lot‌ of research has been‌ produced on this subject,‌‌ the expansion to very large scale as in‌ the case of HPC‌ systems raises new challenges‌‌ that must be addressed to accelerate the time‌ to solution while maintaining‌ sustainability. Next, we will‌‌ look at the abstraction of distributed storage resources‌ on hybrid infrastructures as‌ a first step towards‌‌ general-purpose middleware solutions that are necessary to interoperate‌ the storage resources in‌ the continuum. Finally, building‌‌ on the above, we will propose a data‌ exchange layer that can‌ interoperate with the various‌‌ platforms on the Computing Continuum. This layer will‌ be central to the‌ composition of hybrid workflows.‌‌

Workflow I/O Behavior Analysis Methods for Sustainability.

Understanding‌ how workflows and applications‌ use staging areas on‌‌ the Computing Continuum is decisive for improving scheduling‌ algorithms and deploying an‌ optimized I/O software stack‌‌ 63. This requires first characterizing these applications‌ and workflows from an‌ I/O point of view,‌‌ i.e. determine through performance evaluation and empirical study‌ a relatively high-level set‌ of characteristics that describes‌‌ the data access pattern 100, 97.‌ Data collection and analysis‌ will also leverage semi-supervised‌‌ clustering methods and federated learning techniques from Axis‌ 1. The result of‌ this characterization can then‌‌ be used to feed job and I/O scheduling‌ algorithms and improve data‌ movement efficiency. The preliminary‌‌ step in this characterization is the collection of‌ execution traces, from which‌ detailed studies can be‌‌ carried out 55. In the field of‌ high-performance computing, Darshan109‌ is the reference tool‌‌ for I/O monitoring.

We will extend our existing‌ work on PyDarshan 88‌, 87, a‌‌ Python library for querying Darshan log records, and‌ develop new tools and‌ abstractions applicable throughout the‌‌ computing continuum that generalize "decision support services" that‌ allow the augmentation of‌ workflow execution plans with‌‌ I/O and energy behavior information which support our‌ resource management and system‌ architecture research. For example,‌‌ by identifying the most energy-intense operations, candidates for‌ hardware acceleration can be‌ determined. This short-term direction‌‌ is the first step to provide support for‌ the design of future‌ strategies for I/O scheduling‌‌ considering performance/energy trade-offs, that we further plan to‌ investigate for the longer‌ term. This line of‌‌ research also supports Axis 3 on resource management.‌

Abstraction for HPC/Cloud Storage‌ Convergence.

On HPC and‌‌ Cloud infrastructures, while the number of processing units‌ has grown to meet‌ the computing power requirements‌‌ of large-scale applications, the I/O capacity as well‌ as the I/O bandwidth‌ per core have drastically‌‌ decreased. Thus, data management and analytics becoming the‌ critical bottleneck on large-scale‌ systems, vendors have overcome‌‌ this problem by deploying new tiers of intermediate‌ storage between the applications‌ and the global shared‌‌ storage system, usually along with a dedicated (and‌ sometimes proprietary) software layer.‌ These new levels of‌‌ storage hierarchy feature various capacities, characteristics and performance‌ one has to be‌ aware of to fully‌‌ utilize them 86.‌ This is especially true in the context of‌ complex hybrid workflows such as in-situ analysis, visualization‌ or code-coupling: the unawareness of those underlying tiers‌ is a serious loss of performance 71.‌ An approach focusing on storage convergence across HPC‌ and Cloud infrastructure is decisive to glue this‌ deep hierarchy and to make the most of‌ these new technologies on one side and ensure‌ effective data sharing between components running across the‌ computing continuum 34, 43.

Identifying a‌ good storage abstraction that is accurate enough to‌ properly describe the wide variety of devices and‌ sufficiently general to be portable on various systems‌ is crucial. In that context, we will work‌ on the development of a two-stage abstraction layer‌ above local (system) and remote (distant platform) storage‌ resources. To do so, we will follow a‌ co-design approach, whereby the HPC, Cloud, and Edge-computing‌ architectures would all benefit from an infrastructure-wide level‌ of abstraction. This work will be a continuation‌ of the research undertaken on data aggregation 115‌, 114, for which an abstraction of‌ network topology and memory and storage levels was‌ necessary for the algorithm's portability, and will also‌ build on existing work in the community on‌ resource abstraction 52, 113. In the‌ longer term, we plan to extend this library,‌ which focuses on physical resources, with a logical‌ layer. More concretely, we will build on top‌ of that library a tier-to-tier data transfer layer‌ enabling compatibility between several storage paradigms (block, file,‌ object).

Exascale In-Situ Analytics.

Without a major change‌ in practices, the increased computing capacity of the‌ next generation of computers will lead to an‌ explosion in the volume of data produced by‌ numerical simulations. Managing this data, from production to‌ analysis, is a major challenge. While it is‌ not conceivable to do without a storage system,‌ many experiments are aimed at reducing its use.‌ Thus emerged approaches leveraging in-situ processing, in-transit‌ processing, staging nodes, helper cores 45‌, 66. All of these approaches aim‌ to replace the usual write-read process by a‌ means to perform analysis at the same time‌ as simulation, a capability of particular interest‌ to physicists. This need has led to the‌ first implementations of in-situ or in-transit analysis systems‌ in simulation codes, and to the creation of‌ specific middleware for asynchronous, scalable post-Exascale systems, such‌ as Damaris CITATION NOT FOUND: dorier:hal-00715252, 67‌, 49.

Developed by the KerData team‌ since 2011, Damaris is the team's flagship software‌ for Exascale HPC. Damaris 4 proposes a middleware-level‌ approach to scalable asynchronous I/O management and real-time‌ in situ processing of data from large-scale MPI-based‌ HPC simulations. It leverages the idea of dedicated‌ cores for such tasks performed asynchronously within multicore‌ nodes. Initial feedback from application users clearly shows‌ the need to design a system that can‌ dynamically trigger the activation of new analyses during the simulation run. The‌ timing can be decided‌ either by the simulation‌‌ code or by an analysis. To maintain high‌ performance results, it is‌ also essential to appropriately‌‌ leverage the possibility to place analysis tasks on‌ GPUs. These are challenges‌ we plan to address‌‌ by extending our previous work based on the‌ Damaris approach, in the‌ context of the Exa-DoST‌‌ project of the NumPEx PEPR (2023-2030), to support‌ the needs of Exascale‌ workloads. In particular, two‌‌ applications are targeted: SKA 65 in collaboration with‌ the CNRS, Observatoire de‌ Paris and Observatoire de‌‌ la Côte d'Azur and Gysela 74, in‌ collaboration with the CEA.‌ For the longer term,‌‌ we expect additional application requirements to emerge during‌ the execution of the‌ NumPEx PEPR program (in‌‌ particular, in collaboration with the NumPEx Exa-DI project‌ 5, which has‌ set up a dedicated‌‌ process to support new applications, not identified yet).‌ We plan to contribute‌ to the support of‌‌ such applications that could exhibit new patterns with‌ respect to in situ‌ analysis.

Sustainable Interoperability Across‌‌ the Computing Continuum.

New endeavors towards interoperability in‌ the continuum are addressing‌ the need for common‌‌ data spaces through federated data infrastructures in the‌ cloud (e.g., Gaia-X 50‌, European Open Science‌‌ Cloud (EOSC) 37, German National Research Data‌ Infrastructure (NFDI) 75)‌ and converged research infrastructures‌‌ for leadership-class supercomputing and cloud resources (e.g., FENIX‌ 6, EuroHPC 107‌, PRACE-RI 85,‌‌ European Grid Initiative (EGI) 80). For the‌ long term, key players‌ in the public and‌‌ private sectors are making strong investments in long-term‌ strategic decisions about how‌ the computing continuum will‌‌ develop. Specifically, quantum computing is receiving massive support,‌ and one of the‌ mandates of the EuroHPC‌‌ JU is to acquire and deploy quantum technologies‌ in HPC environments once‌ they reach sufficient maturity‌‌ 7. Furthermore, other new technologies like neuromorphic‌ accelerators 56 will increase‌ the heterogeneity in future‌‌ HPC systems. These non-conventional architectures can also be‌ found in highly energy-efficient‌ IoT devices 95,‌‌ 108, fast scientific instruments for large-scale science‌ 110, and new‌ approaches for fast and‌‌ efficient artificial intelligence in cloud and HPC environments‌ 99, 89,‌ 96. Current works‌‌ on the integration of emerging technologies into existing‌ computing ecosystems focus on‌ the interoperability and performance‌‌ of algorithms without considering data-oriented optimizations, and workflow-specific‌ challenges such as task-resource‌ mapping, automation, and provenance‌‌ are rarely explored 64, 98.

Overall,‌ the successful interoperability between‌ the existing and projected‌‌ platforms in the Computing Continuum will depend on‌ middleware able to interoperate‌ and execute in hybrid‌‌ scenarios. In the short term, we plan to‌ investigate the design a‌ data exchange layer that‌‌ will be the core of a workflow composition‌ approach that connects with‌ established data staging and‌‌ transport layers, alleviating the disconnect between raw data‌ management and knowledge-based workflow‌ management in the continuum‌‌ for better resource balancing,‌ economy, provenance, and data reduction. In addition, the‌ applications themselves could use this information hub to‌ monitor and record the progress of their individual‌ components in smart ways, enriching existing approaches for‌ in situ analysis and workflow reproducibility. To ensure‌ the sustainability of workflow software solutions in an‌ evolving and hyper-heterogeneous landscape, on a longer term‌ we will study the data and access patterns‌ in hybrid workflows involving the interaction with emerging‌ hardware technologies. The final, long-term goal is to‌ contribute to the development of an interoperable software‌ stack by designing data models that address the‌ challenges in encoding, arrangement, locality, and mapping to‌ high-level data abstractions.

3.3 Axis 3: Sustainable Resource‌ Management for the Computing Continuum

With a growing‌ number disciplines relying on compute services in the‌ CC, data centers are faced with significant changes‌ in their workload and services. In addition to‌ “traditional” numerical simulation applications, there is a massive‌ influx of data (sometimes coupled with remote sensors),‌ analytical and learning applications. These applications present significant‌ uncertainty and dynamicity in their resource requirements, due‌ to their intrinsic behavior and data-intensive profiles. At‌ the same time, planetary limits and the ecological‌ crisis will also have a definite impact on‌ the way these computing centers are managed.

Within‌ KerData we propose to investigate next-generation resource management‌ techniques enabled by reconfigurable software-defined hardware and by‌ the recent convergence trend of industry standards for‌ the flexible integration of accelerators and disaggregated memory.‌ Memory disaggregation refers to the decoupling of physical‌ and logical memory resulting in flexibility to leverage‌ underutilized resources without physically needing reconfigure a distributed‌ system. To utilize as efficiently as possible the‌ Computing Continuum we will take advantage of compute‌ optimization down to the lowest available levels, which‌ is becoming feasible because open toolchains down to‌ the chip level are maturing.

Provisioning Storage Resources‌ on Large-Scale Infrastructures.

While for years HPC systems‌ were the predominant means of meeting the requirements‌ expressed by large-scale scientific workflows, today some components‌ have moved away from supercomputers to extend across‌ the Computing Continuum. This migration has been mainly‌ motivated by the need of specialized data processing‌ such as data filtering at the Edge or‌ data analysis on Cloud infrastructures. From an I/O‌ and storage perspective, this means having to deal‌ with very different paradigms: infrastructures where direct access‌ to resources is extremely limited due to a‌ very high level of abstraction, on-premise supercomputers offering‌ a low-level approach requiring tight user control, or‌ highly-constrained devices limited in terms of access and‌ reconfiguration. One way to address that is to‌ converge the infrastructures composing the Computing Continuum by‌ exploring ways to provision storage resources distributed across‌ hybrid HPC/Cloud/Edge systems to complex scientific workflows combining‌ data production, simulation and data analysis. However, this‌ implies low-level access to systems that are sometimes‌ difficult to reach, or to resources in production.‌ Simulation is one way of exploring storage provisioning 58, 57,‌ 82.

In this‌ context, we will continue‌‌ our work on the simulation of storage systems‌ implemented within the StorAlloc‌ 91, 92 simulator.‌‌ This work has enabled us to demonstrate the‌ correct sizing and partitioning‌ of intermediate storage resources‌‌ and to work on the modeling of storage‌ systems, including the way‌ in which they distribute‌‌ data over the available storage spaces. In KerData,‌ we will be exploring‌ new methods for I/O-aware‌‌ scheduling of jobs on hybrid infrastructures 78.‌ Based on post-mortem studies‌ of storage systems, we‌‌ will also work on characterizing workflows running on‌ the Computing Continuum 106‌ in order to refine‌‌ job scheduling decisions. In the longer term, our‌ ambition is to propose‌ a calibrated and validated‌‌ simulator of storage systems distributed across the Computing‌ Continuum. This simulator will‌ enable us to predict‌‌ the I/O performance and energy cost of complex‌ workflows leveraging Edge, Cloud‌ and HPC resources. To‌‌ achieve this, we will rely on state-of-the-art simulation‌ frameworks such as WRENCH‌ 57 and SimGrid 58‌‌.

Storage Disaggregation and Computational Storage.

A key‌ challenge for many large-scale‌ applications is the mismatch‌‌ between compute power, the dimension of caches and‌ buffers, and the available‌ I/O bandwidth from the‌‌ network down to the chip level. Important contributing‌ factors to this situation‌ include economies of scale‌‌ catering to markets with different needs, a prohibitively‌ expensive development process and‌ lack of manufacturing capacity‌‌ to consider more customized solutions. However, as computing,‌ memory, storage, and network‌ hardware are becoming increasingly‌‌ modular and re-configurable, it is possible to consider‌ system and storage architectures‌ that would have been‌‌ prohibitively expensive before 29, 38. The‌ enabling technologies for some‌ of these developments are‌‌ the emerging industry standards such as the Compute‌ Express Link (CXL) 73‌, 83 for more‌‌ flexible integration of accelerators and disaggregated storage, and‌ the P4 programming language‌ used in re-configurable networking‌‌ 48.

We will identify the most energy-intensive‌ routines used on the‌ I/O path, both from‌‌ the service perspective and from the domain perspective,‌ and curate a portable‌ reference library of key‌‌ algorithms catering to both software and hardware acceleration.‌ We will leverage, for‌ example, open container standards‌‌ and instruction set architectures such as RISC-V that‌ can be applied both‌ in data centers and‌‌ resource-constrained edge contexts 117. High-level technologies such‌ as containers facilitate the‌ development of algorithmic improvements‌‌ but often also introduce runtime overhead, while low-level‌ hardware acceleration allows to‌ reduce the energy consumption‌‌ of computations to a minimum. Unfortunately the hardware‌ acceleration of all desired‌ functionality is not possible‌‌ because of the need to retain flexibility and‌ limitations due to cost,‌ manufacturing, and physical constraints.‌‌ Instead a careful selection of routines that should‌ be hardware accelerated needs‌ to be performed for‌‌ which the priorities shift from application to application.‌ We will start by‌ exploring domain and service-specific‌‌ (e.g., compression, erasure coding,‌ encryption) approaches first and then identify generalized abstractions‌ for common functionality useful across domains. Ultimately, this‌ research enables building reusable, abstract, and fine-grained building‌ blocks that allow the construction of frugal computational‌ storage architectures including the subset of functionality optimized‌ for a particular application or workflow.

Frugal Data‌ Storage Architectures to Support Post-Exascale Workflows.

Post-exascale workflows‌ such as digital twins and machine learning require‌ fast access to increasing amounts of data in‌ long-term archives which poses challenging using existing storage‌ technologies. Especially, long-term storage is latency and bandwidth‌ constrained while high-performance storage systems tend to be‌ cost, energy, and capacity constrained. A major obstacle‌ for better utilization of existing technologies lies in‌ the requirements of legacy applications, but due to‌ applications and workflows transitioning to new programming models‌ it becomes possible to consider new storage system‌ architectures. This creates an opportunity to research frugal‌ data storage architectures that integrate computational storage allowing‌ to avoid wait times and stress on contended‌ resources such as the network and storage subsystems‌ while also increasing energy efficiency through hardware acceleration.‌

High I/O performance and energy-efficient storage designs require‌ taking a domain- or application-specific approaches and an‌ extension of computational storage to long-term data archives‌ 73, 83, 69. By applying‌ the methods for holistic workflow I/O behavior analysis‌ already discussed in Axis 2 to discover bottlenecks,‌ it becomes possible to identify service- and application-specific‌ I/O bottlenecks and apply I/O acceleration building blocks.‌ Using these identified building blocks we will develop‌ software libraries that allow their remote execution and/or‌ hardware acceleration close to the data storage location.‌ A holistic effort is necessary to combine advances‌ and consolidation in data and workflow management –‌ as promoted by the FAIR principles (Findable, Accessible,‌ Interoperable, Reusable) 118 – with emerging open technologies‌ for computation and storage 29, 38.‌ To this end, we will research low-level software‌ and hardware support for metadata queries well as‌ aggregations on top of self-describing data formats needed‌ by both the application workflow and data management‌ communities. In particular, we will investigate the integration‌ of emerging storage technologies that allow for highly‌ parallel access as found in NAND- and NVRAM-based‌ systems as manufacturing costs go down, as well‌ as DNA-based storage systems when array-based synthesis methods‌ become commercially available 81, 94.

HPC‌ Resource Management Faced with the Environmental Crisis.

It‌ is essential to consider the evolution of HPC‌ in the face of the climate crisis, and‌ its impact on our research topics. As in‌ other field, we have to consider what a‌ "lower-tech" version of HPC would be, how to‌ make it usable. This is the target of‌ this axis. The current trend in HPC has‌ been to outbid each other for new supercomputers,‌ renewing them every 6-7 years to make them‌ ever more powerful. However, this policy seems hardly‌ sustainable. Regular shortages of components 42, 77, the origin (and‌ uniqueness) of the sources‌ of certain materials, coupled‌‌ with the geopolitical context 61 alone make this‌ growth policy challenging. We‌ will start by trying‌‌ to evaluate the need for scale from a‌ social perspective: what is‌ the relation between scale‌‌ and social advance. While the scientific community traditionally‌ relies on various metrics‌ to assess the performance‌‌ of HPC systems —such as the Top500 list‌ (based on HPL performance),‌ HPCG, Graph500, IO500— these‌‌ metrics do not capture how HPC contributes to‌ social progress.

Then, in‌ front of the lack‌‌ of resources, we expect manufacturers to need to‌ take these risks into‌ account in their future‌‌ machines. American HPC laboratories already have constraints for‌ the 2050 horizon, such‌ as zero-emission procurement. So,‌‌ the first trend we can expect is an‌ extension of HPC machine‌ lifetimes. This could be‌‌ followed by a move towards refurbished machines, i.e.‌ machines that use components‌ from other machines. These‌‌ changes, and the introduction of second-hand hardware, should‌ open up several challenges‌ for system managers. Until‌‌ now, the number of faults has grown linearly‌ with the number of‌ resources 35. HPC‌‌ fault tolerance mechanism assume that the Mean Time‌ Between Failure is large‌ in front of system‌‌ characteristic time (such as the time to checkpoint‌ data). With second-hand material,‌ the number of fault‌‌ may increase at a much higher rate, while‌ machine performance would not‌ improve since we are‌‌ not updating the machines. This would render obsolete‌ existing fault-tolerance mechanisms. To‌ this end we will‌‌ explore new fault-tolerance mechanisms that could be applicable.‌ Heterogeneity linked to resource‌ unavailability and the increased‌‌ computational complexity motivates the need for a precise‌ description of available resources.‌ In this context, we‌‌ will explore alternatives for an efficient design of‌ resource management systems to‌ optimize the use of‌‌ these resources. In addition, non-fatal faults may be‌ invisible, typically slowdowns/varying performance‌ due to wear and‌‌ tear. We will investigate how one can detect‌ and manage resources that‌ slow down the calculations‌‌ performed on it. Of course, one of the‌ challenge of this axis‌ will be to work‌‌ on defining metrics to evaluate the benefits of‌ various solutions. Indeed, not‌ only is it a‌‌ multi-dimensional problem (TCO analysis), but it should also‌ consider what has been‌ long known on optimization‌‌ and Jevons paradox 28.

4 Application domains‌

The KerData team investigates‌ the design and implementation‌‌ of architectures for data storage and processing across‌ clouds, HPC and edge-based‌ systems, which address the‌‌ needs of a large spectrum of applications. The‌ use cases we target‌ to validate our research‌‌ results come from the following domains.

4.1 Radio‌ astronomy

The international SKA‌ 103 project aims to‌‌ create the largest telescope in the world in‌ order to observe a‌ part of the universe.‌‌ A very large volume of data is generated‌ at the telescope level,‌ pre-processed on local clusters‌‌ (filtering, reduction) in real‌ time and sent to a supercomputer (SDP) at‌ a rate of 1TB/s. This data feeds numerical‌ simulation, generating 1PB of daily output data that‌ needs to be saved. At this stage, the‌ computing power and storage resources required are such‌ that machines capable of reaching the exascale become‌ necessary. However, the efficient use of these systems‌ raises new challenges, especially regarding data management.

In‌ the context of the ExaDoST project (NumPEx PEPR),‌ for which SKA is one of the main‌ target demonstrators, we are working on optimizing the‌ I/O of a data processing pipeline that is‌ a serious candidate for the radio telescope. This‌ work has also taken the form of active‌ participation in the ECLAT (Extreme Computing Lab for‌ Astronomical Telescopes) joint laboratory 68.

4.2 Nuclear‌ Fusion

GYSELA-X8 is our second use case‌ explored in the Exa-DoST project. It is a‌ plasma simulation code developed at CEA as part‌ of several national and international collaborations. This code‌ is also exhibiting data-related challenges with respect to‌ the scalability of I/O, storage and in-situ processing.‌ It is part of a demonstrator for the‌ Alice Recoque Exascale supercomputer.

4.3 Material Science

Coddex‌ (Code de Dynamique des Discontinuités pour l'Étude des‌ cristaux) is a simulation code developed at CEA‌ that solves the equations of continuum mechanics in‌ dynamic hyperelasticity (for instance shocks or rapid loading).‌ It also incorporates the description of behavioral discontinuities‌ of change. Within the Exa-DoST project, this application‌ serves to evaluate the approaches that we propose‌ for efficient on-demand in-situ data analysis. The PhD‌ thesis of Arthur Jaquard explores this research direction.‌

5 Social and environmental responsibility

5.1 Footprint of‌ research activities

HPC and cloud facilities are expensive‌ in capital outlay (both monetary and human) and‌ in energy use and it is clear that‌ there is a related environmental impact inherent to‌ this area. Our work on Damaris supports the‌ efficient use of high performance computing resources. Damaris‌ 4 can help to minimize power needed in‌ running computationally demanding engineering applications and can reduce‌ the amount of storage used for results, thus‌ supporting environmental goals and improving the cost effectiveness‌ of running HPC systems. In addition, in the‌ new research program of the team, the whole‌ third research axis is dedicated to frugal and‌ sustainable HPC.

Another aspect worth mentioning is that‌ our team has strong and active international collaborations‌ which sometimes require intercontinental travels by plane. To‌ minimize carbon footprint, we are careful to keep‌ a balance between a few physical meetings (necessary‌ to maintain substantial exchanges) and remote meetings by‌ videoconference (used in most cases, when traveling is‌ not necessary).

5.2 Impact of research results

Our‌ scientific project includes specific research directions to address‌ challenges posed by sustainability and climate change, including‌ research on frugal storage and on ways to‌ leverage second-hand HPC hardware. There is a question‌ of what sufficient HPC would mean.

Social impact.

When considering sufficiency in‌ HPC, we need to‌ question the use of‌‌ the resources and if we can reduce them.‌ This is the main‌ challenge of the project‌‌ on result-scalability 11: we aim at proposing‌ ways to correctly resize‌ HPC computations by focusing‌‌ on an evaluation of the output rather than‌ by considering input-based scaling‌ models.

Environmental impact.

Part‌‌ of our research focuses on extending the lifespan‌ of HPC machines, in‌ the hope that it‌‌ could reduce the environmental impact of the field.‌ We have set-up a‌ working group with different‌‌ teams at Inria Rennes (PACAP, TARAN) to study‌ the challenges extending the‌ life of supercomputers would‌‌ raise.

6 Highlights of the year

Silvina Caino‌ Lores was the recipient‌ of an ANR JCJC‌‌ project.
François Tessier served as a Program Chair‌ of the HiPC'25 international‌ conference.
Guillaume Pallez was‌‌ appointed Associate Editor at IEEE TPDS and IEEE‌ TOPC. He was nominated‌ a member of the‌‌ steering committee of SC.
François Tessier , Alexandru‌ Costan , with the‌ help of Jakob Luettgau‌‌ , organized the 24th IEEE International Symposium on‌ Parallel and Distributed Computing‌ (ISPDC) in‌‌ Rennes, France.
Jakob Luettgau was hired as a‌ permanent Inria Researcher on‌ 1 October 2025.
Alexandru‌‌ Costan left the team on 1 October 2025.‌

7 Latest software developments,‌ platforms, open data

7.1‌‌ Latest software developments

7.1.1 Damaris

Keywords:
Visualization, I/O,‌ HPC, Exascale, High performance‌ computing
Scientific Description:

Damaris‌‌ is a middleware for I/O and data management‌ targeting large-scale, MPI-based HPC‌ simulations. It initially proposed‌‌ to dedicate cores for asynchronous I/O in multicore‌ nodes of recent HPC‌ platforms, with an emphasis‌‌ on ease of integration in existing simulations, efficient‌ resource usage (with the‌ use of shared memory)‌‌ and simplicity of extension through plug-ins.

Over the‌ years, Damaris has evolved‌ into a more elaborate‌‌ system, providing the possibility to use dedicated cores‌ or dedicated nodes to‌ in situ data processing‌‌ and visualization. It proposes a seamless connection to‌ the VisIt visualization framework‌ to enable in situ‌‌ visualization with minimum impact on run time. Damaris‌ provides an extremely simple‌ API and can be‌‌ easily integrated into the existing large-scale simulations.

Damaris‌ was at the core‌ of the PhD thesis‌‌ of Matthieu Dorier, who received an Accessit to‌ the Gilles Kahn Ph.D.‌ Thesis Award of the‌‌ SIF and the Academy of Science in 2015.‌ Developed in the framework‌ of our collaboration with‌‌ the JLESC – Joint Laboratory for Extreme-Scale Computing,‌ Damaris was the first‌ software resulted from this‌‌ joint lab validated in 2011 for integration to‌ the Blue Waters supercomputer‌ project. It scaled up‌‌ to 16,000 cores on Oak Ridge’s leadership supercomputer‌ Titan (first in the‌ Top500 supercomputer list in‌‌ 2013) before being validated on other top supercomputers.‌ Active development is currently‌ continuing within the KerData‌‌ team at Inria, where it is at the‌ center of several collaborations‌ with industry as well‌‌ as with national and‌ international academic partners.

Damaris has been selected to‌ be one of the key software pieces of‌ software for the NumPEx PEPR project, which aims‌ to provide the software infrastructure for the future‌ Exascale machine to be hosted in France in‌ 2025 (Alice Recoque, Jules Vernes project). The capabilities‌ within Damaris will further studied in collaboration with‌ CEA within the NumPEx exploratory PEPR project.
Functional‌ Description:
Damaris is a middleware for data management‌ and in-situ visualization targeting large-scale HPC simulations. Damaris‌ enables: - In-situ data analysis by using selected‌ dedicated cores/nodes of the simulation platform. - Asynchronous‌ and fast data transfer from HPC simulations to‌ Damaris. - Semantic-aware dataset processing through Damaris plug-ins,‌ - Writing aggregated data (by hdf5 format) or‌ visualizing them either by VisIt or ParaView. -‌ Dask analytics supports.
Release Contributions:
v1.12.1 of Damaris‌ provides basis for an overhaul of the the‌ Plugin layer: adding event triggers on specific hooks,‌ reorganizing event functioning, and enabling/adding data dependency for‌ events. It includes also the (missing) implementions of‌ the management of Parameter for the string and‌ label types, and the handling of some typos‌ and bugs.
News of the Year:
In 2025,‌ an extendable Scheduling layer has been added (yet‌ to be released): to reduce communication costs. Also,‌ to enable dynamic analysis handling capability in Damaris,‌ two main activities have been carry out (yet‌ to be released). The development of a dynamic‌ expression module, and an overhaul of the the‌ Plugin layer (Harmonization of plugin definition, with possibility‌ to pass specific data to each plugin, dynamic‌ event creation, triggers (with condition), event/data dependency, data‌ availability across iteration with ‘sliding window’). Furthermore, in‌ the context of NumPEx PEPR project, we enhanced‌ the Damaris / PDI interoperability. We continued the‌ development of the Damaris plugin for PDI (to‌ be release soon), and started working on the‌ PDI plugin in Damaris. With this, the simulation‌ instrumented with PDI (https://pdi.dev/main/), can use Damaris to‌ perform asynchronous data analysis using dedicated resources, and‌ the ones instrumented with Damaris could have full‌ access to PDI ecosystem. In addition, Damaris is‌ now part of the NumPEx Software Catalog (https://numpex-pc5.gitlabpages.inria.fr/tutorials/projects/catalog/index.html).‌
URL:
https://project.inria.fr/damaris/
Contact:
Gabriel Antoniu
Participant:
8 anonymous‌ participants
Partner:
ENS Rennes

7.1.2 E2Clab

Name:
Edge-to-Cloud‌ lab
Keywords:
Distributed systems, Cloud, Reproducibility, Experimentation, Computing‌ Continuum, Evaluation, Large scale, Provenance
Scientific Description:

E2Clab‌ is a framework that implements a rigorous methodology‌ that provides guidelines to move from real-life application‌ workflows to representative settings of the physical infrastructure‌ underlying this application in order to accurately reproduce‌ its relevant behaviors and therefore understand and optimize‌ end-to-end performance.

E2Clab allows a rigorous analysis of‌ possible application configurations in a controlled testbed environment‌ to understand their behavior and related performance trade-offs.‌ E2Clab can be generalized to other applications in‌ the Edge-to-Cloud Continuum. E2Clab is currently used by‌ the Pl@ntNet team to understand and optimize the‌ performance of the application. It is also used by our partners from‌ Instituto Politécnico Nacional for‌ automatic experiment deployments in‌‌ the context of the SmartFastData associate team.

In‌ an effort to enhance‌ the reproducibility capabilities of‌‌ E2Clab, we extended it to enable efficient provenance‌ date capture across the‌ Edge-to-Cloud Continuum. Specifically, we‌‌ leverage simplified data models, data compression and grouping,‌ and lightweight transmission protocols‌ to reduce overheads for‌‌ collecting such data on the IoT/Edge. This integration‌ makes E2Clab a promising‌ platform for the performance‌‌ optimization of applications through reproducible experiments.
Functional Description:‌
E2Clab is a framework‌ that implements a rigorous‌‌ methodology that provides guidelines to move from real-life‌ application workflows to representative‌ settings of the physical‌‌ infrastructure underlying this application in order to accurately‌ reproduce its relevant behaviors‌ and therefore understand end-to-end‌‌ performance. Understanding end-to-end performance means rigorously mapping the‌ scenario characteristics to the‌ experimental environment, identifying and‌‌ controlling the relevant configuration parameters of applications and‌ system components, and defining‌ the relevant performance metrics.‌‌
Release Contributions:

Changelog: https://gitlab.inria.fr/E2Clab/e2clab/-/blob/master/CHANGELOG.rst?ref_type=heads

Features (release 1.0.0):

(i)‌ the configuration of the‌ experimental environment, libraries and‌‌ frameworks, (ii) the mapping between the application parts‌ and machines on the‌ Edge, Fog and Cloud,‌‌ (iii) the deployment of the application on the‌ infrastructure, (iv) Edge-to-Cloud network‌ emulation, (v) the automated‌‌ execution and monitoring, (vi) the application optimization, and‌ (vii) the gathering of‌ experiment metrics.
News of‌‌ the Year:

In an effort coordinated within the‌ PEPR Cloud, we have‌ worked towards adapting E2Clab‌‌ to run experiment leveraging commercial computing resources provided‌ by Scaleway. Enabling users‌ to occasionally leverage resources‌‌ provided by Scaleway would give them access to‌ state-of-the art GPU nodes‌ and diversity of computing‌‌ resources.

Additional contributions include: - Ongoing experiments with‌ the ECLAT laboratory to‌ provide experimental support to‌‌ their simulation pipeline - Improved software reliability through‌ testing and usability through‌ easy ssh access to‌‌ deployed resources - Documented use of the software‌ for new use-cases such‌ as Federated Learning.

Latest‌‌ release archive: https://gitlab.inria.fr/E2Clab/e2clab/-/releases/v3.6.0
URL:
https://e2clab.gitlabpages.inria.fr/e2clab/
Publications:
hal-04208787,‌ hal-04779813, hal-04698619,‌ hal-02916032, hal-03310540,‌‌ hal-03269852, hal-03332524, hal-03270129, hal-03338520,‌ hal-03324177, hal-03259975,‌ hal-03409405, hal-03510012,‌‌ hal-04659211
Contact:
Gabriel Antoniu
Participant:
5 anonymous participants‌

7.1.3 Fives

Name:
Simulator‌ for Scheduling on Storage‌‌ System at Scale
Keywords:
Simulation, HPC, Distributed Storage‌ Systems
Scientific Description:
Development‌ of Fives began in‌‌ 2023, given the limitations of our previous StorAlloc‌ simulator. At the end‌ of 2023, Fives is‌‌ still in active development, while its design and‌ initial results are being‌ submitted to a conference‌‌ in the field.
Functional Description:

Fives is a‌ storage resource scheduling simulator‌ for supercomputers based on‌‌ WRENCH and SimGrid, two state-of-the-art simulation frameworks. In‌ particular, Fives can model‌ a parallel file system‌‌ such as Lustre, a computing partition, and simulate‌ a set of jobs‌ performing I/O on the‌‌ resulting HPC system.

Fives is based on several‌ components. Firstly, as part‌ of the development of‌‌ this simulator, an abstraction‌ called "Compound Storage Service" was proposed to represent‌ a distributed storage system, and integrated into WRENCH.‌ Within Fives, a job model was designed to‌ represent a history of jobs and submit them‌ to the scheduler present in WRENCH. Finally, a‌ model of an existing supercomputer, Theta at Argonne‌ National Laboratory, and a reverse-engineered version of its‌ Lustre file system were developed in our simulator.‌

Experiments are underway to calibrate and validate Fives.‌
Publication:
hal-04784808
Contact:
François Tessier

7.1.4 MOSAIC

Name:‌
Merging Operations and SegmentAtion for I/O Categorization
Keywords:‌
Categorization, HPC, I/O
Scientific Description:

MOSAIC is a‌ Python categorizer that takes I/O traces as input‌ and assigns classes to describe the patterns found‌ inside.

Those classes form a general description of‌ applications' I/O activity, giving information about the temporality‌ of I/O, whether periodic operations occur, and an‌ estimation of the impact on the metadata servers.‌

One of MOSAIC's building blocks is the automatic‌ detection of recurring operations. This is achieved with‌ a clustering algorithm that groups operations sharing the‌ same characteristics (duration, I/O amount, etc.) into one‌ single recurring operation.

MOSAIC automatically finds the traces‌ that were generated by the same program to‌ reduce the number of files to be processed‌ and speed up a system-scale categorization.

MOSAIC works‌ for now with traces from the Darshan monitoring‌ tool but can be easily extended to fit‌ other trace formats.

MOSAIC was used to process‌ the 2019 traces from the BlueWaters supercomputer trace‌ dataset (National Center for Supercomputing Applications - University‌ of Illinois).
Functional Description:

MOSAIC is a tool‌ for categorizing HPC application storage activity. It processes‌ traces containing all application storage operations and assigns‌ classes to describe how they are performed.

MOSAIC‌ can describe when the activity is performed (when‌ the application starts, at the end, throughout the‌ execution, etc.), find if some operations are recurring‌ (e.g., saving data to a file every 10‌ minutes), and estimate the overhead caused by the‌ metadata operations.

It can analyze large datasets of‌ I/O traces coming from a supercomputer to find‌ the general behavior of the applications that were‌ carried out on the machine.
News of the‌ Year:
Support of file temperature, better detection of‌ periodic behavior and improved performance for very large‌ datasets were implemented in 2025. An intermediate data‌ format, based on the so-called Trace Event Format,‌ was developed for MOSAIC to convert traces from‌ I/O monitoring tools (such as Darshan, Recorder, and‌ so on) to a common abstraction.
Publication:
hal-04808300‌
Contact:
François Tessier

7.1.5 FLAdversary

Name:
Emulation of‌ Federated Learning Scenarios with Adversarial Clients
Keywords:
Federated‌ learning, Emulation, Adversarial attack
Functional Description:

Federated Learning‌ (FL) is subject to diverse threats from the‌ Edge of the network where local training runs‌ on widely distributed, heterogeneous and volatile resources.

FLAdversary‌ provides tools to dynamically introduce adversarial attacks into‌ the FL training phase. Different (model and data)‌ poisoning attacks can be introduced at the client level to emulate adversaries‌ in the FL training.‌ Several defensive strategies are‌‌ provided as baselines.
Publication:
hal-04208787
Contact:
Gabriel Antoniu‌
Partner:
DFKI (German Research‌ Center for Artificial Intelligence)‌‌

7.1.6 FLDrift

Name:
Emulation of Federated Learning Scenarios‌ with Client Drift
Keywords:‌
Federated learning, Emulation, Heterogeneous‌‌ Data
Functional Description:

When deploying Federated Learning (FL)‌ on the Computing Continuum,‌ devices are subject to‌‌ high variations in local data distributions. This limits‌ the capacity of the‌ system to generate a‌‌ single model optimized for the entire federation of‌ devices.

FLDrift provides support‌ for various Non-IID scenarios‌‌ (i.e., introducing concept-drift and label-shift between federated peers)‌ for FL experiments. Several‌ personalization/clustering strategies are provided‌‌ as baselines.
News of the Year:
We implemented‌ several baseline clustering strategies‌ improving personalization in Federated‌‌ Learning to address client drift. FLDrift proposes 4‌ scenarios to evaluate the‌ performance of clustering approaches.‌‌ Each scenario introduces a different form of concept‌ drift between client local‌ datasets.
Publication:
hal-04779813
Contact:‌‌
Gabriel Antoniu
Partner:
DFKI (German Research Center for‌ Artificial Intelligence)

7.2 Open‌ data

7.2.1 I/O Traces‌‌

For our IPDPS'25 paper 13, we used‌ traces of I/O activity‌ from four different systems‌‌ to answer a set of questions about temporal‌ I/O behavior. To focus‌ on realistic workloads, we‌‌ gathered traces from jobs running over a period‌ of time instead of‌ profiling a limited set‌‌ of selected applications.

Two of these data sets‌ were Darshan traces available‌ online (from the Intrepid‌‌9 and Blue Waters systems10), while‌ two others were obtained‌ by us, using file‌‌ system monitoring tools:

PlaFRIM (BeeGFS): a 192-nodes experimental‌ platform in Bordeaux, monitored‌ during 26 months (2022–2024).‌‌
SDumont (Lustre): the largest supercomputer in Latin America,‌ monitored during 12 months‌ (2020).

The collected file‌‌ system data was correlated with the batch scheduler‌ logs to obtain two‌ time series of I/O‌‌ bandwidth per job (for "reads" and "writes"), with‌ a value per second‌ for PlaFRIM and a‌‌ value every 15 seconds for SDumont. The two‌ datasets as well as‌ all code and instructions‌‌ on how to reproduce our experiments are provided‌ in Zenodo: https://doi.org/10.5281/zenodo.14965920.‌ As explained in the‌‌ instructions, additional information can be obtained from https://github.com/tuda-parallel/FTIO/tree/main/artifacts/ipdps25‌ for FTIO, and https://zenodo.org/records/13785395‌ for MOSAIC.

8 New‌‌ results

8.1 Supporting Data-Centric Applications and Workflows Running‌ Across the Computing Continuum‌

8.1.1 On the Reproducibility‌‌ Challenges of Federated Learning: Investigating the Gap between‌ Simulation, Emulation and Real-World‌ Deployments

Participants: Cédric Prigent‌‌, Alexandru Costan, Gabriel Antoniu.

Collaboration.‌
This work has been‌ carried out in co-operation‌‌ with Cédric Tedeschi (University of Rennes, MAGELLAN team),‌ Loïc Cudennec (DGA MI)‌ and Kate Keahey (Argonne‌‌ National Laboratory), in the framework of the STEEL‌ project of the PEPR‌ CLOUD program and of‌‌ the UNIFY 2 Associate Team with ANL, associated‌ to teh JLESC international‌ laboratory.

Federated Learning (FL)‌‌ is an emerging paradigm for decentralized training of‌ Machine Learning models. It‌ has been the subject‌‌ of a large corpus‌ of research due to its innovative approach to‌ handling sensitive data. A common practice in the‌ FL literature is to run simulations on a‌ single compute node to assess the performance of‌ FL algorithms. While simulation enables fast prototyping and‌ validation of algorithmic concepts, it may face limitations‌ in reproducing the real system's performance in heterogeneous‌ environments such as the Computing Continuum, and particularly‌ on resource-constrained Edge devices. Conversely, emulation on distributed‌ testbeds offers more effective means to accurately reproduce‌ the performance of real-world devices. However, to the‌ best of our knowledge, no prior research has‌ investigated the differences between simulation and emulation in‌ FL experiments. In this work, we study the‌ complementarity of these approaches and discuss their respective‌ challenges, as a first step towards reproducibility of‌ FL experiments. We illustrate our study with a‌ real-life application used as a baseline: an outdoor‌ air quality forecasting framework with real-world sensors. Our‌ results show that simulation can be used to‌ accurately reproduce model performance metrics, while emulation can‌ effectively reproduce the system performance of real-world experiments.‌ Finally, we present a set of lessons learned‌ on the challenges of FL reproducibility and the‌ selection of experimental infrastructures for FL experiments and‌ applications. This work has been published as 16‌.

8.1.2 Evaluating Federated Learning Workflows Beyond Simulation:‌ A Deployment-Aware Methodology

Participants: Mathis Valli, Alexandru‌ Costan, Gabriel Antoniu.

Collaboration.
This work‌ has been carried out in co-operation with Cédric‌ Tedeschi (University of Rennes, MAGELLAN team) and Loïc‌ Cudennec (DGA MI).

Federated Learning (FL) is often‌ evaluated in simulation, which overlooks network variability, system‌ heterogeneity, and energy costs in geo-distributed settings. We‌ propose a deployment-aware methodology that triangulates analytical modeling‌, simulation, and real-world deployments within a‌ unified FL evaluation framework. For a given series‌ of experimental scenarios, the methodology allows to assess‌ the consistency of performance trends across the three‌ evaluation approaches, quantifying deviations in key metrics such‌ as run time, communication overhead, and energy consumption.‌ This further enables cross-validation of the reliability of‌ multiple measurement tools, highlighting discrepancies in commonly reported‌ metrics such as the energy usage.

The methodology‌ is validated on FL workloads by comparing analytical‌ predictions and simulations against large-scale deployments on the‌ Grid’5000 testbed, spanning 51 nodes across four geographically‌ distant sites. By varying key FL components such‌ as aggregation algorithms, client sampling rates, and datasets,‌ we characterize how different FL design choices affect‌ the reliability of the three evaluation approaches. Our‌ findings reveal significant divergences: analytical models accurately capture‌ communication patterns and preserve the relative performance of‌ the scenarios, simulations reflect broad trends but often‌ lead to performance rankings of different configurations inconsistent‌ with those found through actual deployment, while only‌ the latter uncovers hidden costs, such as increased‌ energy consumption due to data imbalances.

This work‌ has been submitted for publication to a conference‌ (currently under evaluation).

8.1.3 Supporting SKA data processing workflows with the E2CLab‌ approach to workflow lifecycle‌ management across the continuum‌‌

Participants: Thomas Badts, Gabriel Antoniu.

Collaboration.‌
This work has been‌ carried out in co-operation‌‌ with Baptiste Besnard and Damien Gratadour (LIRA, Observatoire‌ de Paris)

Tu support‌ automatic deployment, the complete‌‌ analysis cycle and the optimization of applications on‌ the Computing, we have‌ proposed the E2Clab methodology‌‌ and its supporting software tool for workflow lifecycle‌ management across the Continuum.‌ We aim to assist‌‌ the execution of the Karabo pipeline 32 for‌ radioastronomy simulation by enabling‌ reproducible distributed deployments and‌‌ experiments on academic testbeds. The Karabo pipeline is‌ being developped to support‌ simulation of the future‌‌ SKA radiotelescope within the ECLAT laboratory. E2Clab also‌ provides the workflow capabilities‌ to run optimization loops‌‌ over end-to-end experiments and improve parameter discovery and‌ fine-tuning in a complex,‌ cross-disciplinary, stack of software‌‌ components ranging from distributed computing frameworks to astrophysics‌ simulations.

This collaboration started‌ in 2005 is still‌‌ in the exploratory stages, further work is expected‌ in the following year.‌

8.1.4 Methodology for Automated‌‌ IoT Experimentation in Controlled Testbeds Prior to Real-World‌ Deployments

Participants: Elias Del‌ Pozo Punal, Silvina‌‌ Caino Lores, Thomas Badts, Gabriel Antoniu‌.

Collaboration.
This work‌ has been carried out‌‌ in co-operation with Felix Garcia-Carballeira and Alejandro Calderon‌ from University Carlos III‌ of Madrid.

Several tools‌‌ and frameworks have been proposed to automate deployments‌ in distributed systems. Infrastructure-as-Code‌ (IaC) approaches such as‌‌ Ansible, Puppet, Salstack, or Chef are widely used‌ to abstract low-level configuration‌ details. In parallel, some‌‌ frameworks support experiment description and execution in specific‌ research testbeds, such as‌ the cOntrol and Management‌‌ Framework (OMF) and the OMF Measurement Library (OML)‌ 27. Despite these‌ advances, existing solutions often‌‌ remain limited to specific domains or infrastructures, and‌ integrating heterogeneous environments remains‌ a challenge when considering‌‌ the broader computing continuum, and in particular for‌ IoT deployments. As a‌ result, researchers frequently rely‌‌ on fragmented tools or manual procedures, which hinder‌ the repeatability and scalability‌ of experiments and ultimately‌‌ limit the ability to perform consistent pre-deployment validation‌ of IoT systems in‌ controlled environments.

This work‌‌ proposes a general methodology for automated IoT experimentation‌ and validation in controlled‌ environments. The approach provides‌‌ a structured workflow for designing, deploying, and executing‌ experiments across different testbeds‌ in a reproducible, scalable‌‌ manner. It allows researchers to evaluate IoT deployments‌ through controlled simulations and‌ pre-deployment testing, bridging the‌‌ gap between conceptual design and real-world implementation.

8.2‌ Data-Aware Middleware Approaches for‌ the Computing Continuum

8.2.1‌‌ Multi-level analysis of the I/O pattern of HPC‌ applications

Participants: François Tessier‌, Théo Jolivel,‌‌ Jakob Luettgau, Julien Monniot, Gabriel Antoniu‌.

Collaboration.
This work‌ has been carried out‌‌ in close co-operation with Philippe Deniel from CEA,‌ the Inria TADaaM team‌ in Bordeaux within the‌‌ Exa-DoST project of the PEPR NumPEx program. It‌ also involves a collaboration‌ with Ahmad Tarraf from‌‌ the Technical University of‌ Darmstadt, Germany.

While the ratio of I/O performance‌ to computing power has declined by a factor‌ of 10 in the last decade 11,‌ the volume of data generated by scientific workflows‌ and applications has significantly grown. In some supercomputing‌ centers for instance, this volume has increased almost‌ 40-fold in ten years. This has made access‌ to storage resources a major bottleneck to scaling‌ up applications.

Several levers exist along the data‌ path to mitigate this burden. For example, optimizations‌ can be applied at the I/O library level‌ or within the application source code to improve‌ I/O performance. At the job scheduler level, decisions‌ can be taken when allocating resources to avoid‌ I/O interference between jobs. However, all these optimizations‌ require a good upstream understanding of application I/O‌ behavior.

In this research axis, we are working‌ on analyzing the I/O behavior of large-scale applications‌ at various levels. The thesis that Théo Jolivel‌ started in October 2024 proposes to tackle this‌ question. One approach is to exploit public datasets‌ containing several years of I/O execution traces of‌ applications running on supercomputers. We developed multiple methodologies‌ and tools to pre-process those datasets, extract the‌ relevant data, and analyse the data access behavior.‌ In particular, we extended MOSAIC 79, a‌ categorizer that detects I/O patterns from execution traces.‌ MOSAIC extracts I/O operations contained in I/O traces‌ and assigns classes to describe how I/O operations‌ are performed throughout the execution. The description is‌ based on three distinct axes: I/O temporality (when‌ was data read or written?), access periodicity (are‌ there recurring operations?), and metadata overhead (what is‌ the impact of metadata operations?). This extended version‌ is under submission in a conference 21 and‌ has been presented as a poster during the‌ annual meeting of the ExaDoST project 22 (an‌ updated version of this poster is also under‌ submission for the PASC 2026 conference). A complementary‌ work on the temporal I/O behavior of HPC‌ applications, in collaboration with Inria Bordeaux and TU‌ Darmsdadt, has been presented at IPDPS'2025, an A-rank‌ conference in the field 19.

8.2.2 Study‌ of I/O interference between jobs

Participants: François Tessier‌, Méline Trochon.

Collaboration.
This work has‌ been carried out in close co-operation with the‌ Inria TADaaM team in Bordeaux and Jean-Thomas Acquaviva,‌ from DDN within the Exa-DoST project of the‌ PEPR NumPEx program.

High-performance computing is a key‌ component for accelerating scientific discovery and innovation by‌ enabling rapid processing of complex simulations and large-scale‌ data analyses. As HPC applications grow in scale,‌ the performance of the underlying storage infrastructure, particularly‌ parallel file systems (PFS), becomes critical. These shared‌ systems distribute data across multiple storage targets (OST),‌ but concurrent access by multiple jobs can lead‌ to interference, reducing performance compared to isolated operations.‌ Interference varies depending on application characteristics, often degrading‌ overall bandwidth and causing significant performance variability, sometimes‌ by orders of magnitude.

In the context of Méline Trochon's PhD thesis‌ (CIFRE DDN-Inria) we studied‌ how interference impacts checkpointing,‌‌ a key fault-tolerance technique in HPC. Checkpointing involves‌ periodically saving application data‌ to persistent storage to‌‌ recover from failures. As applications handle more data,‌ checkpoint files grow larger,‌ making I/O performance even‌‌ more crucial. Interference during these operations can severely‌ affect their efficiency, highlighting‌ the need to understand‌‌ and mitigate its effects.

To do this, we‌ launched a large number‌ of experiments with an‌‌ application that simulates checkpoint phases and one or‌ more applications that simulate‌ interference. Since the checkpoint‌‌ application has fixed parameters, we looked at how‌ different configurations of interference‌ workloads may or may‌‌ not affect I/O performance and to what extent.‌ This work is currently‌ being finalized and a‌‌ paper is expected to be published in 2026.‌ A pre-print is already‌ available online 23.‌‌ This work will continue in 2026, notably through‌ the development of a‌ simulator that will allow‌‌ us to test more configurations.

8.2.3 Enabling Efficient‌ Runtime Data Analysis to‌ a Crystal Deformation Simulation‌‌

Participants: Arthur Jaquard, Silvina Caino Lores,‌ Gabriel Antoniu.

Collaboration.‌
This work has been‌‌ carried out in close co-operation with Laurent Colombet‌ (from CEA DAM) and‌ Julien Bigot (CEA/Maison de‌‌ la Simulation) within the Exa-DoST project of the‌ PEPR NumPEx program.

Exascale‌ simulations generate massive data‌‌ volumes that strain I/O and post-hoc analysis. In‌ the framework of Arthur‌ Jaquard's PhD thesis we‌‌ explore how in-situ analysis as supported by the‌ Damaris in situ middleware‌ can benefit to Coddex,‌‌ a crystal deformation code, to offload data movement‌ and analysis to dedicated‌ processes. This is achieved‌‌ by enabling runtime extraction of key diagnostics without‌ writing intermediate files. We‌ evaluated tin hysteresis cases‌‌ on CEA's INTI cluster (with 14 nodes, 1,728‌ cores) and compare against‌ a ParaView-based post-hoc pipeline.‌‌ In situ analysis eliminates per-iteration I/O stalls and‌ reduces output time by‌ up to 5x while‌‌ preserving overall iteration time, with benefits increasing with‌ the number of tracked‌ variables. This work is‌‌ conducted within the Exa-DoST project of the PEPR‌ NumPEx program, which aims‌ to build the software‌‌ infrastructure for the first Exascale machine expected to‌ be set up in‌ France (Alice Recoque, Jules‌‌ Verne project). It has been published as a‌ poster at the SC25‌ conference 24.

8.3‌‌ Sustainable Resource Management for the Computing Continuum

8.3.1‌ Result-Scalability: Following the Evolution‌ of Selected Social Impact‌‌ of HPC.

Participants: Guillaume Pallez.

Collaboration.
This‌ work has been carried‌ out in collaboration with‌‌ Sally Rose Ellingson from the medical college of‌ the University of Kentucky.‌

While the scientific community‌‌ traditionally relies on various computational metrics to assess‌ the performance of HPC‌ systems –such as the‌‌ TOP500 list (based on HPL performance), HPCG, Graph500,‌ IO500– these metrics do‌ not capture how HPC‌‌ contributes to social progress. We propose 11 a‌ novel approach to follow‌ how the growth of‌‌ HPC systems and the‌ advances of HPC research address concrete social challenges.‌ The uniqueness of these new metrics lies in‌ their ability to not only measure the capabilities‌ of HPC architectures but also to gauge the‌ concrete social advancements achieved through their use: it‌ focuses on the output of the computation instead‌ of its input. Contrarily to current measure, it‌ also promotes the diversity of machines by evaluating‌ the Pareto front created between size and result.‌ We emphasize the need for dynamic, community-driven metrics‌ that can evolve with emerging social needs.

8.3.2‌ Increasing the Lifetime of HPC Machines: Issues, Implications,‌ and Open Challenges

Participants: Guillaume Pallez, Robin‌ Boezennec.

Collaboration.
This work has been carried‌ out as a large collaboration in Rennes including‌ two different teams: PACAP and TARAN, as well‌ as with Brice Goglin (TADAAM in Bordeaux)

Extending‌ the lifetime of High-Performance Computing (HPC) machines is‌ becoming an important concern for a variety of‌ reasons. These include the environmental and human costs‌ associated with chip manufacturing, the rising demands by‌ AI workloads, the soaring prices of accelerator chips,‌ political blocks, and delays in the delivery of‌ next-generation supercomputers. As a community, we must reconsider‌ the traditional HPC paradigm and explore new strategies‌ for making existing HPC infrastructure viable for longer‌ periods. In 18, we highlight the current‌ barriers in prolonging HPC machines lifespan and discuss‌ key technical and operational challenges towards this goal.‌

8.3.3 Improving Supercomputer Usage with Aging Awareness.

Participants:‌ Guillaume Pallez, Robin Boezennec, Alix Tremodeux‌.

Lifetime of electronic devices has a critical‌ impact on their environmental footprint. In addition, the‌ high-demand by AI companies of GPU has reduced‌ tremendously their availability for supercomputing centers. Consequently, improving‌ the duration of CPUs and GPUs is becoming‌ a major issue in High Performance Computing (HPC)‌ domain. This contribution 12 investigates the optimization of‌ a machine usage before a fatal failure and‌ the trade-offs with performance. The lifetime of computing‌ devices is strongly connected with the temperature and‌ thus with the running frequency. We investigate the‌ node frequency reconfiguration to optimize HPC usage. We‌ estimate the benefit of a dedicated scheduling algorithm‌ compared with a constant frequency.

We show that‌ a correct decision can increase considerably the number‌ of FLOP of a machine with a trade-off‌ in terms of performance. Because aging models are‌ currently inaccurate, we consider different models and discuss‌ the robustness of our algorithms to inaccuracy

8.3.4‌ Priority-BF: a Task Manager for Priority-Based Scheduling

Participants:‌ Guillaume Pallez.

Collaboration.
This work has been‌ carried out with Ana Gainaru and Scott Patkin‌ (Oak Ridge National Laboratory).

The increasing demand for‌ computational resources, particularly in High-Performance Computing environments, necessitates‌ to rethink how we handle job scheduling strategies.‌ In 14, we address the challenge of‌ managing concurrent jobs with differing priorities on overloaded‌ parallel systems, where strict QoS constraints are often‌ difficult for users to define. Our solution relies on a qualitative description‌ of priorities and pulls‌ from two key approaches:‌‌ the Easy-BF algorithm and the Conservative Backfilling algorithms.‌ This solution improves the‌ response time for high-priority‌‌ jobs by 50% without affecting the overall system‌ utilization. We show its‌ applicability in several critical‌‌ scenarios such as High-Performance Computing (HPC) resource management‌ and in-situ computing.

8.3.5‌ Scheduling multiple task-based applications‌‌ on distributed heterogeneous computing nodes

Participants: Etienne Ndamlabin‌.

Collaboration.
This work‌ has been carried out‌‌ with Bérenger Bramas (CAMUS team, Inria Nancy).

Modern‌ high-performance computing platforms combine‌ extreme parallelism with growing‌‌ size, complexity, and cost, making inefficient resource usage‌ increasingly critical in terms‌ of performance and energy.‌‌ Our research addresses this challenge by focusing on‌ the concurrent execution of‌ multiple task-based applications on‌‌ shared heterogeneous (CPU/GPU) environments. We created load-balancing heuristics‌ to distribute the task‌ graphs over the processing‌‌ units and designed and implemented RSCHED, an adaptive‌ scheduling framework integrated into‌ the StarPU runtime system‌‌ 93. RSCHED dynamically reorganizes resource allocation in‌ response to application progress‌ and completion, while jointly‌‌ optimizing application makespan and resource utilization during concurrent‌ execution. Experimental results real‌ applications show up ,to‌‌ a 10× reduction in overall makespan compared to‌ consecutive execution, while increasing‌ resource utilization. RSCHED also‌‌ highlights the benefits of system-level coordination on top‌ of independent application schedulers,‌ compared to unsupervised concurrent‌‌ execution.

8.4 Methodological study over the practice of‌ HPC Research

The following‌ contributions are not necessarily‌‌ building on the team project but are more‌ adjacent. They both discuss‌ how our community performs‌‌ research, the first one by studying the reproducibility‌ evaluation process of a‌ large HPC conference (SC'24),‌‌ and the second one by stuying some claims‌ behind the use of‌ LLM to generate scheduling‌‌ algorithms.

8.4.1 Implementing a Reproducibility Initiative in HPC:‌ Experiences from SC24.

Participants:‌ Guillaume Pallez.

Collaboration.‌‌
This work has been carried out with Sascha‌ Hunold (University of Vienna)‌ and Judith Hill (Lawrence‌‌ Livermore National Laboratory).

Reproducibility is fundamental to scientific‌ research, but can be‌ particularly challenging in research‌‌ that involves High Performance Computing (HPC) due to‌ the unique characteristics of‌ supercomputers. Performance-based metrics such‌‌ as execution time, energy consumption, and throughput further‌ complicate reproducibility, especially on‌ shared systems. In 15‌‌, we present our experience implementing a reproducibility‌ initiative at SC24, with‌ particular emphasis on changes‌‌ made compared to prior SC conferences. We outline‌ HPC-specific challenges, describe the‌ measures adopted to address‌‌ them, and reflect on the limitations of reproducibility‌ badges. Faced with the‌ constraints of the existing‌‌ badging nomenclature, we discuss our implementation of a‌ reproducibility report, which aims‌ to provide more context‌‌ about the reproducibility of each paper. We conclude‌ by recommending that the‌ “Artifact Replicable” badge be‌‌ dropped by HPC conferences at this time, and‌ discuss alternate ways of‌ ensuring replicability evaluation.

8.4.2‌‌ An In-depth Study of LLM Contributions to the‌ Bin Packing Problem

Participants:‌ Guillaume Pallez.

Collaboration.‌‌
This work has been‌ carried out with Julien Herrmann (CNRS).

Recent studies‌ have suggested that Large Language Models (LLMs) could‌ provide interesting ideas contributing to mathematical discovery. This‌ claim was motivated by reports that LLM-based genetic‌ algorithms produced heuristics offering new insights into the‌ online bin packing problem under uniform and Weibull‌ distributions. In 20, we reassess this claim‌ through a detailed analysis of the heuristics produced‌ by LLMs, examining both their behavior and interpretability.‌ Despite being human-readable, these heuristics remain largely opaque‌ even to domain experts. Building on this analysis,‌ we propose a new class of algorithms tailored‌ to these specific bin packing instances. The derived‌ algorithms are significantly simpler, more efficient, more interpretable,‌ and more generalizable, suggesting that the considered instances‌ are themselves relatively simple. We then discuss the‌ limitations of the claim regarding LLMs' contribution to‌ this problem, which appears to rest on the‌ mistaken assumption that the instances had previously been‌ studied. Our findings instead emphasize the need for‌ rigorous validation and contextualization when assessing the scientific‌ value of LLM-generated outputs.

9 Partnerships and cooperations‌

9.1 International initiatives

9.1.1 Associate Teams in the‌ framework of an Inria International Lab or in‌ the framework of an Inria International Program

UNIFY‌ 2

Title:
Intelligent Unified Data Services for Hybrid‌ Workflows Combining Compute-Intensive Simulations and Data-Intensive Analytics at‌ Extreme Scales - 2
Duration:
2023 ->
Coordinator:‌
Tom PETERKA (tpeterka@mcs.anl.gov)
Partners:
- Argonne National Laboratory Argonne‌ (États-Unis)
Inria contact:
Gabriel Antoniu
Summary:
Since several‌ years we have been witnessing the emergence of‌ complex workflows combining simulations with data analysis, potentially‌ leveraging machine-learning techniques. Such complex workflows seem to‌ naturally need to jointly use supercomputers interconnected with‌ clouds and potentially Edge-based systems. This assembly is‌ called the Computing Continuum. In a general scheme,‌ Edge devices create streams of input data, which‌ are processed by data analytics and machine learning‌ applications in the Cloud, whereas simulations on large,‌ specialised HPC systems provide insights into and prediction‌ of future system state. The emergence of such‌ workflows is reshaping the traditional vision on the‌ areas involved, as described in the ETP4HPC Research‌ Agenda published in 2020. Building software ecosystems addressing‌ the needs of such workflows poses multiple challenges‌ at several levels. In this context, this Associate‌ Team will focus on three related challenges: 1)‌ How to adequately handle the heterogeneity of storage‌ resources within the Computing Continuum to support complex‌ science workflows? 2) How to efficiently support deep-learning‌ workloads across the Computing Continuum? 3) How to‌ provide reproducibility support for experimentation across the Computing‌ Continuum?

9.2 International research visitors

9.2.1 Visits of‌ international scientists

Swann Perarnau

Status
Senior Scientist
Institution‌ of origin:
Argonne National Laboratory
Country:
USA
Dates:‌
Dec 8-10, 2025
Context of the visit:
Jury‌ for PhD of Robin Boezennec
Mobility program/type of‌ mobility:
lecture

9.2.2 Visits to international teams

Research‌ visits abroad

Gabriel Antoniu , Jakob Luettgau ,‌ Arthur Jaquard , Robin Boezennec

Visited institution:
Argonne National Laboratory
Country:
USA‌
Dates:
13-15 May 2025‌
Context of the visit:‌‌
Exploration of research collaboration on in situ processing‌ with Tom Peterka and‌ Orçun Yildiz.
Mobility program/type‌‌ of mobility:
Visit during the JLESC workshop.

9.3‌ European initiatives

9.3.1 H2020‌ projects

EUPEX

EUPEX project‌‌ on cordis.europa.eu

Title:
EUROPEAN PILOT FOR EXASCALE
Duration:‌
From January 1, 2022‌ to December 31, 2026‌‌
Partners:
- INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET‌ AUTOMATIQUE (INRIA), France
- GRAND‌ EQUIPEMENT NATIONAL DE CALCUL‌‌ INTENSIF (GENCI), France
- VSB - TECHNICAL UNIVERSITY OF‌ OSTRAVA (VSB - TU‌ Ostrava), Czechia
- JOHANNES GUTENBERG-UNIVERSITAT‌‌ MAINZ, Germany
- FORSCHUNGSZENTRUM JULICH GMBH (FZJ), Germany
- COMMISSARIAT‌ A L ENERGIE ATOMIQUE‌ ET AUX ENERGIES ALTERNATIVES‌‌ (CEA), France
- IDRYMA TECHNOLOGIAS KAI EREVNAS (FOUNDATION FOR‌ RESEARCH AND TECHNOLOGYHELLAS), Greece‌
- SVEUCILISTE U ZAGREBU FAKULTET‌‌ ELEKTROTEHNIKE I RACUNARSTVA (UNIZG-FER), Croatia
- UNIVERSITA DEGLI STUDI‌ DI TORINO (UNITO), Italy‌
- Consortium Ubiquitous Technologies S.c.a.r.l.‌‌ (CUBIT), Italy
- CYBELETECH, France
- UNIVERSITA DI PISA (UNIPI),‌ Italy
- GRAN SASSO SCIENCE‌ INSTITUTE (GSSI), Italy
- ISTITUTO‌‌ NAZIONALE DI ASTROFISICA (INAF), Italy
- UNIVERSITA DEGLI STUDI‌ DEL MOLISE, Italy
- E‌ 4 COMPUTER ENGINEERING SPA‌‌ (E4), Italy
- CONSIGLIO NAZIONALE DELLE RICERCHE (CNR), Italy‌
- JOHANN WOLFGANG GOETHE-UNIVERSITAET FRANKFURT‌ AM MAIN (GUF), Germany‌‌
- EUROPEAN CENTRE FOR MEDIUM-RANGE WEATHER FORECASTS (ECMWF), United‌ Kingdom
- BULL SAS (BULL),‌ France
- POLITECNICO DI MILANO‌‌ (POLIMI), Italy
- EXASCALE PERFORMANCE SYSTEMS - EXAPSYS IKE,‌ Greece
- ALMA MATER STUDIORUM‌ - UNIVERSITA DI BOLOGNA‌‌ (UNIBO), Italy
- PARTEC AG (PARTEC), Germany
- ISTITUTO NAZIONALE‌ DI GEOFISICA E VULCANOLOGIA,‌ Italy
- CINECA CONSORZIO INTERUNIVERSITARIO‌‌ (CINECA), Italy
- SECO SPA (SECO SRL), Italy
- CONSORZIO‌ INTERUNIVERSITARIO NAZIONALE PER L'INFORMATICA‌ (CINI), Italy
Inria contact:‌‌
Olivier Beaumont
Coordinator:
Etienne Walter (EVIDEN)
Summary:

The‌ EUPEX consortium aims to‌ design, build, and validate‌‌ the first EU platform for HPC, covering end-to-end‌ the spectrum of required‌ technologies with European assets:‌‌ from the architecture, processor, system software, development tools‌ to the applications. The‌ EUPEX prototype will be‌‌ designed to be open, scalable and flexible, including‌ the modular OpenSequana-compliant platform‌ and the corresponding HPC‌‌ software ecosystem for the Modular Supercomputing Architecture. Scientifically,‌ EUPEX is a vehicle‌ to prepare HPC, AI,‌‌ and Big Data processing communities for upcoming European‌ Exascale systems and technologies.‌ The hardware platform is‌‌ sized to be large enough for relevant application‌ preparation and scalability forecast,‌ and a proof of‌‌ concept for a modular architecture relying on European‌ technologies in general and‌ on European Processor Technology‌‌ (EPI) in particular. In this context, a strong‌ emphasis is put on‌ the system software stack‌‌ and the applications.

Being the first of its‌ kind, EUPEX sets the‌ ambitious challenge of gathering,‌‌ distilling and integrating European technologies that the scientific‌ and industrial partners use‌ to build a production-grade‌‌ prototype. EUPEX will lay the foundations for Europe's‌ future digital sovereignty. It‌ has the potential for‌‌ the creation of a sustainable European scientific and‌ industrial HPC ecosystem and‌ should stimulate science and‌‌ technology more than any national strategy (for numerical‌ simulation, machine learning and‌ AI, Big Data processing).‌‌

The EUPEX consortium –‌ constituted of key actors on the European HPC‌ scene – has the capacity and the will‌ to provide a fundamental contribution to the consolidation‌ of European supercomputing ecosystem. EUPEX aims to directly‌ support an emerging and vibrant European entrepreneurial ecosystem‌ in AI and Big Data processing that will‌ leverage HPC as a main enabling technology.

9.3.2‌ Collaborations with Major European Organizations

Participants: Gabriel Antoniu‌, Alexandru Costan, Jakob Luettgau.

ETP4HPC:‌ Since 2019, Gabriel Antoniu has served as a‌ co-leader of the working group on Programming Environments,‌ contributing to two successive versions of the Strategic‌ Research Agenda of ETP4HPC. Alexandru Costan served as‌ a member of this working group. Jakob Luettgau‌ served as a member of the working group‌ on Data Storage and I/O. A white paper‌ of this group 25 was published in 2025.‌

9.4 National initiatives

Exa-DoST

Participants: Gabriel Antoniu,‌ François Tessier, Julien Monniot, Jakob Luetgau‌, Etienne Ndamlabin, Silvina Caino Lores,‌ Guilaume Pallez.

Exa-DoST project of the NumPEx‌ PEPR program

Title:
Data-oriented Software and Tools for‌ the Exascale
Duration:
From January 1, 2023 to‌ April 1, 2030
Partners:
- Inria
- CEA
- CNRS
- University‌ of Bordeaux
- Observatoire de Paris
- Observatoire de la‌ Côte d'Azure
- Data Direct Networks France (DDN)
Coordinator:‌
Gabriel Antoniu (KerData Team, Inria)
Summary:
The advent‌ of future Exascale supercomputers raises multiple data-related challenges.‌ To enable applications to fully leverage the upcoming‌ infrastructures, a major challenge concerns the scalability of‌ techniques used for data storage, transfer, processing and‌ analytics. Additional key challenges emerge from the need‌ to adequately exploit emerging technologies for storage and‌ processing, leading to new, more complex storage hierarchies.‌ Finally, it now becomes necessary to support more‌ and more complex hybrid workflows involving at the‌ same time simulation, analytics and learning, running at‌ extreme scales across supercomputers interconnected to clouds and‌ edgebased systems. The Exa-DoST project will address most‌ of these challenges, organized in 3 areas:
- Scalable‌ storage and I/O;
- Scalable in situ processing;
- Scalable‌ smart analytics.
As part of the NumPEx program,‌ Exa-DoST targets a much higher technology readiness level‌ than previous national projects concerning the HPC software‌ stack. It will address the major data challenges‌ by proposing operational solutions co-designed and validated in‌ French and European applications. This will allow filling‌ the gap left by previous international projects to‌ ensure that French and European needs are taken‌ into account in the roadmaps for building the‌ data-oriented Exascale software stack.

STEEL

Participants: Gabriel Antoniu‌, Alexandru Costan, Jakob Luettgau, François‌ Tessier, Mathis Valli, Thomas Badts.‌

Title:
Secure and efficient daTa storagE and procEssing‌ on cLoud-based infrastructures
Duration:
From June 1, 2023‌ to 31 August 2030
Partners:
- Inria
- CNRS
- Institut‌ Mines Télécom (IMT)
- University of Bordeaux
- University of‌ Rennes
- INSA Rennes
- INSA Lyon
Coordinator:
Gabriel Antoniu‌ (KerData Team, Inria)
Summary:
The strong development of‌ cloud computing since its emergence in 2007 and its massive adoption for‌ the storage of unprecedented‌ volumes of data in‌‌ a growing number of domains has brought to‌ light major technological challenges.‌ In this project we‌‌ will address several of these challenges, organized in‌ three research directions. The‌ first direction concerns the‌‌ exploitation of emerging technologies for efficient storage on‌ cloud infrastructures. We will‌ address this challenge through‌‌ NVRAM-based distributed performance storage solutions, as close as‌ possible to data production‌ and consumption locations (disaggregation‌‌ principle) and develop strategies to optimize the trade-off‌ between data consistency and‌ access performance. The second‌‌ direction concerns the efficient storage and processing of‌ data on hybrid, heterogeneous‌ infrastructures within the digital‌‌ edge-cloud-supercomputer continuum. In many domains (autonomous cars, predictive‌ maintenance, intelligent buildings, etc.)‌ we are witnessing the‌‌ emergence of hybrid workflows combining simulations, analysis of‌ sensor data flows and‌ machine learning. Their execution‌‌ requires storage resources ranging from the edge to‌ cloud infrastructures, and even‌ to supercomputers, which poses‌‌ challenges for unified data storage and processing. The‌ third research direction is‌ dedicated to confidential storage,‌‌ in connection with the need to store and‌ analyze large volumes of‌ data of strategic interest‌‌ or of a personal nature. For all of‌ these directions, the project‌ will take into account‌‌ the need to propose and validate interoperable approaches‌ with a potential for‌ transfer to major French‌‌ or European industrial players in cloud computing.

ECLAT‌

Participants: François Tessier,‌ Gabriel Antoniu, Théo‌‌ Jolivel, Jakob Luettgau, Thomas Badts.‌

Title:
Extreme Computing Laboratory‌ for Astronomical Telescopes
Duration:‌‌
Since May, 2024
Partners:
- Inria
- CNRS
- Université de‌ Rennes
- Eviden
- Observatoire de‌ la Côte d'Azur
- Observatoire‌‌ de Paris
- Université Paris-Saclay
- Centrale Supelec
Coordinator:
Gabriel‌ Antoniu (KerData Team, Inria)‌
Summary:
ECLAT is positioned‌‌ as a center of excellence dedicated to High-Performance‌ Computing (HPC) and Artificial‌ Intelligence (AI) technologies and‌‌ techniques applied to astronomical instrumentation. This project brings‌ together sixteen partner laboratories‌ and teams around a‌‌ common roadmap, aimed at strengthening research and development‌ (R&D) collaborations. The aim‌ is to design and‌‌ build future cyber-physical systems for astronomy, capable of‌ managing, processing and optimizing‌ gigantic volumes of data.‌‌

Grid'5000

We are members of Grid'5000 community and‌ run experiments on the‌ Grid'5000 platform on a‌‌ daily basis.

Inria Exploratory program: Repas

Participants: Guillaume‌ Pallez.

Project Acronym:‌
REPAS
Title:
New Portrayal‌‌ of HPC Applications
Coordinator:
Guillaume Pallez
Collaboration:
This‌ is done in collaboration‌ with the team DATAMOVE‌‌ (Inria Grenoble)
Duration:
2022-2025
Summary:
What is the‌ right way to represent‌ an application in order‌‌ to run it on a highly parallel (typically‌ exascale) machine? The idea‌ of project is to‌‌ completely review the models used in the development‌ scheduling algorithms and software‌ solutions to take into‌‌ account the real needs of new users of‌ HPC platforms.

10 Dissemination‌

10.1 Promoting scientific activities‌‌

10.1.1 Scientific events: organisation

General chair, scientific chair‌

François Tessier
- General co-Chair‌ of ISPDC 2025,‌‌ the 24th IEEE International‌ Symposium on Parallel and Distributed Computing (Rennes, France).‌
- Workshop co-Chair of ESSA 2025, the 6th‌ Workshop on Extreme-Scale Storage and Analysis held in‌ conjunction with IPDPS 2025 (Milan, Italy).
- Workshop co-Chair‌ of Supercompcloud, the 9th Workshop on Interoperability‌ of Supercomputing and Cloud Technologies combined with OpenCHAMI‌ held in conjunction with ISC 2025 (Hamburg, Germany).‌
Alexandru Costan:
- General co-Chair of ISPDC 2025,‌ the 24th IEEE International Symposium on Parallel and‌ Distributed Computing (Rennes, France).
- Workshop co-Chair of FlexScience‌ 2025, the 15th Workshop on AI and‌ Scientific Computing at Scale using Flexible Computing Infrastructures,‌ held in conjuncciton with ACM HPDC 2025 (Notre‌ Dame, USA).
Guillaume Pallez
- Co-General chair of IPDPS'26‌, 40th IEEE International Parallel & Distributed Processing‌ Symposium (New Orleans, USA).
- Member of the Steering‌ Committee of ICPP, International Conference on Parallel‌ Processing.
Silvina Caino Lores
- General Co-Chair of WORKS‌ 2025, the 20th Workshop on Workflows in‌ Support of Large-Scale Science, held in conjunction with‌ SC 2025 (St. Louis, USA).
Gabriel Antoniu
- Steering‌ Committee Chair of the ESSA Workshop series on‌ High-Performance Storage, held in conjunction with the IEEE‌ IPDPS conference since 2020.
- General Co-Chair of the‌ 1st Workshop on Research Infrastructures for Experimenting across‌ the HPC-Cloud-Edge Continuum(ContinuumRI), held in conjunction with‌ the ACM/IEEE CCGRI 2025.

Member of the organizing‌ committees

Jakob Luettgau:
- Proceedings Chair of ISPDC 2025,‌ the 24th IEEE International Symposium on Parallel and‌ Distributed Computing (Rennes, France).
- Co-organizer of the Birds‌ of a Feather Session Ethics in HPC held‌ in conjunction with ISC 2025 (Hamburg, Germany)
- Co-organizer‌ of the Minsymposium Ëthical and Societal Considerations for‌ Scientific Computing held in conjunction with ISC 2025‌ (Brugg, Switzerland)
- Co-organizer of the Birds of a‌ Feather Session: CSx4HPC: Computational Storage for High-Performance Computing‌ held in conjunction with SC 2025 (St. Louis,‌ USA)
- Co-organizer of the Birds of a Feather‌ Session: Ethics in HPC held in conjunction with‌ SC 2025 (St. Louis, USA)
- Co-organizer of the‌ Ethics in HPC Birds of a Feather Session‌ BoF: Ethics in HPC held in conjunction with‌ SC 2025 (St. Louis, USA)
Gabriel Antoniu:
- Co-Leader‌ of the Working group on Data management and‌ Computing Continuum at the InPEx workshop on Post-Exascale‌ Computing organized in Kanagawa, Japan.
Théo Jolivel:
- Web‌ Chair of ISPDC 2025, the 24th IEEE International‌ Symposium on Parallel and Distributed Computing (Rennes, France).‌
Arthur Jaquard:
- Web Chair of WORKS 2025, the‌ 20th Workshop on Workflows in Support of Large-Scale‌ Science (St. Louis, MO, USA)

10.1.2 Scientific events:‌ selection

Chair of conference program committees

François Tessier‌
- Program Co-Chair of HiPC 2025, the 32nd‌ edition of the IEEE International Conference on High‌ Performance Computing, Data, and Analytics (Hyderabad, India).

Member‌ of the conference program committees

François Tessier:
CCGrid2025,‌ ISC25 (Workshop proposals)
Gabriel Antoniu:
HPDC 2025, Cluster‌ 2025
Alexandru Costan:
SC'25 (Posters and ACM SRC‌ track), IPDPS 25 (PhD Forum), EuroPar 2025, BigData‌ 2025, HiPC 2025, CCGrid 2025

Reviewer

Théo Jolivel:
- IEEE CCGrid25
Mathis Valli:‌
- IEEE BigData 2025
Arthur‌ Jaquard:
- CCGRID2025

10.1.3 Journal‌‌

Member of the editorial boards

Guillaume Pallez :‌
- IEEE TPDS
- IEEE TOPC‌

10.1.4 Invited talks

Guillaume‌‌ Pallez :
- « Vers un calcul intensif plus‌ sobre », organisé par‌ Laboratoire 1.5
- “Model (co)-Design‌‌ and Accuracy for Resource Management in HPC” at‌ Co-Design workshop (Osaka, Jn)‌ co-organized by Jack Dongarra‌‌ and the Chinese Academy of Science

François Tessier‌ :
- "The Difficult Task‌ of Understanding I/O Behavior‌‌ on Large-scale Systems", Keynote talk at the 3rd‌ NHR Conference, Germany‌

10.1.5 Leadership within the‌‌ scientific community

Gabriel Antoniu :
- Large National project‌ management: Coordinator of ExaDoST,‌ one of the 5‌‌ targeted projects of the NumPEx PEPR project (started‌ in 2023, budget: 6.2‌ M€). Coordinator of STEEL,‌‌ one of the 7 high-priority projects of the‌ CLOUD PEPR project (started‌ in 2023, budget: 2.8‌‌ M€).
- ETP4HPC: Since 2019, co-leader of the working‌ group on Programming Environments,‌ lead co-author of the‌‌ corresponding chapter of the Strategic Research Agenda of‌ ETP4HPC.
- International lab management:‌ Executive Director of JLESC‌‌ for Inria since April 2024 (previously Vice Executive‌ Director). JLESC is the‌ Joint Inria-Illinois-ANL-BSC-JSC-RIKEN/AICS Laboratory for‌‌ Extreme-Scale Computing. Within JLESC, he also serves as‌ a Topic Leader for‌ Data storage, I/O and‌‌ in situ processing for Inria.
- International Working Group‌ management: Co-Leader of the‌ Working group on Data‌‌ management and Computing Continuum within the InPEX International‌ Post-Exascale Project.
- Team‌ management: Head of the‌‌ KerData Project-Team (INRIA-INSA Rennes).
- International Associate Team management:‌ Leader of the UNIFY2‌ Associate Team with Argonne‌‌ National Lab (2013–2025).
François Tessier :
- Work package‌ co-leader with Francieli Boito‌ (Associate Professor, University of‌‌ Bordeaux) within the NumPEX ExaDoST project.
- Leader for‌ KerData in the ECLAT‌ joint laboratory.
Alexandru Costan‌‌ :
- Work package leader of WP2 within the‌ PEPR CLOUD STEEL project.‌

10.1.6 Scientific expertise

Gabriel‌‌ Antoniu:
- Evaluator for a Horizon Europe project (HORIZON-CL4-2021-HUMAN-01‌ call)
Alexandru Costan:
- Evaluator‌ for several projects submitted‌‌ to FFPlus, a European initiative highlighting and promoting‌ the adoption of High-Performance‌ Computing (HPC) by SMEs‌‌ and start-ups across Europe)
- Member of the jury‌ for GDR RSD Prix‌ de thèse, Prix chercheur‌‌

10.1.7 Research administration

François Tessier
- Member of the‌ Commission on Health, Safety‌ and Working Conditions (now‌‌ called FSS) within the Inria center of Rennes‌
Guillaume Pallez:
- Member of‌ the National Commission on‌‌ Health, Safety and Working Conditions (now called FS)‌
- Member of the Scientific‌ Board of Inria
Gabriel‌‌ Antoniu:
- Member of the Inria HRS4R Steering Committee‌ (HRS4R: European Human Resources‌ Strategy for Research)

10.2‌‌ Teaching - Supervision - Juries - Educational and‌ pedagogical outreach

10.2.1 Teaching‌

Alexandru Costan
- Bachelor: Software‌‌ Engineering and Java Programming, 28 hours (lab sessions),‌ L3, INSA Rennes.
- Bachelor:‌ Databases, 68 hours (lectures‌‌ and lab sessions), L2, INSA Rennes.
- . Bachelor:‌ Practical case studies, 24‌ hours (project), L3, INSA‌‌ Rennes.
- Master: Big Data Storage and Processing, 28h‌ hours (lectures, lab sessions),‌ M1, INSA Rennes.
- Master:‌‌ Algorithms for Big Data,‌ 28 hours (lectures, lab sessions), M2, INSA Rennes.‌
- Master: Big Data Project, 28 hours (project), M2,‌ INSA Rennes.
Gabriel Antoniu:
- Master (Engineering Degree, 5th‌ year): NoSQL and Cloud technologies, 21 hours (lectures),‌ M2 level, ENSAI (École nationale supérieure de‌ la statistique et de l'analyse de l'information),‌ Bruz.
- Master: Infrastructures for Big Data, 14 hours‌ (lectures), M1 level, IBD Module, University of Rennes.‌
- Master: Cloud Computing and Big Data, 14 hours‌ (lectures), M2 level, Cloud Module, MIAGE Master Program,‌ University of Rennes.
François Tessier
- Bachelor: Computer science‌ discovery, 15 hours (lab sessions), L1 level, DIE‌ Module, ISTIC, University of Rennes.
- Master: Cloud Computing‌ and Big Data, 15 hours (lectures), M2 level,‌ Cloud Module, MIAGE Master Program, University of Rennes.‌
- Master (Engineering Degree, 4th year): Storage on Clouds,‌ 5 hours (lecture and lab session), M2 level,‌ IMT Atlantique, Rennes.
Jakob Luettgau:
- Master: Cloud and‌ Network Infrastructures (CNI), 4 hours (lectures), M2 level,‌ Master Program, University of Rennes.
Théo Jolivel:
- Master:‌ Cloud Computing and Big Data, 36 hours (lab‌ sessions), M2 level, Cloud Module, MIAGE Master Program,‌ University of Rennes.
Mathis Valli:
- Bachelor: Databases, 12‌ hours (lab sessions), L3, INSA Rennes.

10.2.2 Supervision‌

Defended PhD theses:
- Cédric Prigent, "Supporting Online Learning‌ and Inference in Parallel across the Digital Continuum",‌ thesis started in November 2021, co-advised by Alexandru‌ Costan, Gabriel Antoniu and Loïc Cudennec (DGA). Defended‌ on 25 May 2025.
- Robin Boezennec, “Reducing HPC‌ Resource Consumption”, defended on December 10th, 2025, co-advised‌ by Guillaume Pallez and Fanny Dufossé (Datamove, Grenoble).‌ Defended on 10 December 2025.

PhD in progress:‌
- Mathis Valli, "Comparative Analysis of Federated Learning: Simulations‌ Versus Real-World Testbeds in dynamic settings", thesis started‌ in April 2023, co-advised by Alexandru Costan, Cédric‌ Tedeschi (Myriads) and Loïc Cudennec (DGA).
- Théo Jolivel,‌ "Modeling and Simulation of Exascale Storage Systems", thesis‌ started in October 2024, co-advised by François Tessier,‌ Gabriel Antoniu and Philippe Deniel (CEA).
- Arthur Jaquard,‌ "Dynamic in situ and in transit data analysis‌ for Exascale Computing using Damaris", thesis started in‌ October 2024, co-advised by Gabriel Antoniu, Laurent Colombet‌ (CEA), Silvina Caino-Lores, and Julien Bigot (CEA).
- Méline‌ Trochon, "Adaptive Checkpoint-Restart System with Knowledge of the‌ Network Load", CIFRE thesis started in February 2025,‌ located at Inria Bordeaux, co-supervised by Francieli Boito,‌ Brice Goglin (TADaaM - Inria Bordeaux), Jean-Thomas Acquaviva‌ (DDN) and François Tessier.
- Serge Meurrens, "Ordonnancement des‌ E/S adapté aux applications dans les systèmes HPC",‌ thesis started in December 2025, located at Inria‌ Bordeaux, co-supervised by Francieli Boito, Luan Teylo (TADaaM‌ - Inria Bordeaux) and François Tessier.
- Simon Renard‌ , “Data Interfaces for Hybrid Quantum-Classical Computational Workflows”,‌ thesis started on October 2025, co-supervised by Silvina‌ Caino Lores ,Gabriel Antoniu and Marc Baboulin‌ (Inria Paris-Saclay).
- Alix Tremodeux , “Etude des conséquences‌ du vieillissement sur les machines HPC”, thesis started‌ on September 2025, co-supervised by Guillaume Pallez and‌ Erven Rohou (PACAP - Inria Rennes).

Internships:
- Remy‌ Chiv, "Analyse et optimisation des entrées/sorties d'un pipeline de traitement de données‌ pour la radio-astronomie à‌ grande échelle", 5-month Master‌‌ 2 internship started in May 2025, supervised by‌ François Tessier.

10.2.3 Juries‌

Gabriel Antoniu :
- HDR:‌‌ Towards Better I/O Resource Usage in HPC,‌ Francieli Zanon Boito, Université‌ de Bordeaux, defended on‌‌ 5 December 2025.
Alexandru Costan :
- PhD: Complexity‌ and Algorithmic results for‌ Translocation Distances, Maria‌‌ Constantinescu, University of Bucharest, defended on 29 May‌ 2025.

11 Scientific production‌

11.1 Major publications

1‌‌ miscG.Gabriel Antoniu, P.Patrick Valduriez‌, H.-C.Hans-Christian Hoppe‌ and J.Jens Krüger‌‌. Towards Integrated Hardware/Software Ecosystems for the Edge-Cloud-HPC‌ Continuum.2021HAL‌DOI
2 articleR.‌‌Robin Boëzennec, F.Fanny Dufossé and G.‌Guillaume Pallez. Qualitatively‌ Analyzing Optimization Objectives in‌‌ the Design of HPC Resource Manager.ACM‌ Transactions on Modeling and‌ Performance Evaluation of Computing‌‌ Systems942024, 1-28HAL DOI‌
3 inproceedingsF.Francieli‌ Boito, L.Luan‌‌ Teylo, M.Mihail Popov, T.Théo‌ Jolivel, F.François‌ Tessier, J.Jakob‌‌ Luettgau, J.Julien Monniot, A.Ahmad‌ Tarraf, A.André‌ Carneiro and C.Carla‌‌ Osthoff. A Deep Look Into the Temporal‌ I/O Behavior of HPC‌ Applications.39th IEEE‌‌ International Parallel & Distributed Processing Symposium (IPDPS)39th‌ IEEE International Parallel &‌ Distributed Processing Symposium (IPDPS)‌‌Milan, ItalyJune 2025HAL DOI
4 article‌M.Matthieu Dorier,‌ G.Gabriel Antoniu,‌‌ F.Franck Cappello, M.Marc Snir,‌ R.Robert Sisneros,‌ O.Orcun Yildiz,‌‌ S.Shadi Ibrahim, T.Tom Peterka and‌ L.Leigh Orf.‌ Damaris: Addressing Performance Variability‌‌ in Data Management for Post-Petascale Simulations.ACM‌ Transactions on Parallel Computing‌332016,‌‌ 15HAL DOI back to text
5 article‌A.Ana Gainaru,‌ B.Brice Goglin,‌‌ V.Valentin Honoré and G.Guillaume Pallez.‌ Profiles of upcoming HPC‌ Applications and their Impact‌‌ on Reservation Strategies.IEEE Transactions on Parallel‌ and Distributed Systems32‌5May 2021,‌‌ 1178-1190HAL DOI
6 bookM.Michael Malms‌, L.Laurent Cargemel‌, E.Estela Suarez‌‌, N.Nico Mittenzwey, M.Marc Duranton‌, S.Sakir Sezer‌, C.Craig Prunty‌‌, P.Pascale Rossé-Laurent, M.Maria Pérez-Harnandez‌, M.Manolis Marazakis‌, G.Guy Lonsdale‌‌, P.Paul Carpenter, G.Gabriel Antoniu‌, S.Sai Narasimharmurthy‌, A.André Brinkman‌‌, D.Dirk Pleiter, U.-U.Utz-Uwe Haus‌, J.Jens Krueger‌, H.-C.Hans-Christian Hoppe‌‌, E.Erwin Laure, A.Andreas Wierse‌, V.Valeria Bartsch‌, K.Kristel Michielsen‌‌, C.Cyril Allouche, T.Tobias Becker‌ and R.Robert Haas‌. ETP4HPC's SRA 5‌‌ - Strategic Research Agenda for High-Performance Computing in‌ Europe - 2022.‌Zenodo2022HAL DOI‌‌
7 miscS.Sarah Neuwirth, P.Philippe‌ Deniel, J.-T.Jean-Thomas‌ Acquaviva, M.Martin‌‌ Golasowski, M.Michael‌ Hennecke, A.Adrian Jackson, T.Thomas‌ Leibovici, J.Jakob Luettgau and R.Ramon‌ Nou. ETP4HPC SRA 6 White Paper -‌ I/O and Storage.January 2025HAL DOI‌
8 inproceedingsC.Cédric Prigent, K.Kate‌ Keahey, A.Alexandru Costan, L.Loïc‌ Cudennec and G.Gabriel Antoniu. On the‌ Reproducibility Challenges of Federated Learning: Investigating the Gap‌ between Simulation, Emulation and Real-World Deployments.CCGrid‌ 2025 - IEEE 25th International Symposium on Cluster,‌ Cloud and Internet ComputingTromso, Norway2025,‌ 185-194HAL DOI
9 inproceedingsD.Daniel Rosendo‌, P.Pedro Silva, M.Matthieu Simonin‌, A.Alexandru Costan and G.Gabriel Antoniu‌. E2Clab: Exploring the Computing Continuum through Repeatable,‌ Replicable and Reproducible Edge-to-Cloud Experiments.Cluster 2020‌ - IEEE International Conference on Cluster ComputingKobe,‌ JapanSeptember 2020, 1-11HAL DOI
10‌ inproceedingsR.Renan Souza, S.Silvina Caino-Lores‌, M.Mark Coletti, T. J.Tyler‌ J Skluzacek, A.Alexandru Costan, F.‌Frédéric Suter, M.Marta Mattoso and R.‌ F.Rafael Ferreira da Silva. Workflow Provenance‌ in the Computing Continuum for Responsible, Trustworthy, and‌ Energy-Efficient AI.e-Science 2024 - 20th IEEE‌ International Conference on e-ScienceOsaka, JapanIEEESeptember‌ 2024, 1-7HALDOI

11.2 Publications of‌ the year

International journals

11 articleS.Sally‌ Ellingson and G.Guillaume Pallez. Result-Scalability: Following‌ the Evolution of Selected Social Impact of HPC‌.International Journal of High Performance Computing Applications‌395April 2025, 713-721HAL DOI‌back to text back to text

International peer-reviewed‌ conferences

12 inproceedingsR.Robin Boëzennec, F.‌Fanny Dufossé, G.Guillaume Pallez and A.‌Alix Tremodeux. Improving Supercomputer Usage with Aging‌ Awareness.SC Workshops '25: Proceedings of the‌ SC '25 Workshops of the International Conference for‌ High Performance Computing, Networking, Storage and AnalysisSustainable‌ Supercomputing (Workshop of SC25)St. Louis, Missouri, United‌ StatesACMNovember 2025, 1980-1989HAL DOI‌back to text
13 inproceedingsF.Francieli Boito‌, L.Luan Teylo, M.Mihail Popov‌, T.Théo Jolivel, F.François Tessier‌, J.Jakob Luettgau, J.Julien Monniot‌, A.Ahmad Tarraf, A.André Carneiro‌ and C.Carla Osthoff. A Deep Look‌ Into the Temporal I/O Behavior of HPC Applications‌.39th IEEE International Parallel & Distributed Processing‌ Symposium (IPDPS)39th IEEE International Parallel & Distributed‌ Processing Symposium (IPDPS)Milan, ItalyJune 2025HAL‌DOI back to text
14 inproceedingsA.Ana‌ Gainaru, S.Scott Klasky and G.Guillaume‌ Pallez. Priority-BF: a Task Manager for Priority-Based‌ Scheduling.31st International European Conference on Parallel‌ and Distributed ComputingEURO-PAR 2025 - 31st International‌ European Conference on Parallel and Distributed ComputingDresden,‌ Germany2026, 219-232HAL DOI back to‌ text
15 inproceedingsG.Guillaume Pallez, J.‌Judith Hill and S.Sascha Hunold. Implementing a Reproducibility Initiative in‌ HPC: Experiences from SC24‌.REP 2025 -‌‌ 3rd ACM Conference on Reproducibility and ReplicabilityVancouver‌ (British Columbia), CanadaJuly‌ 2025, 1-8HAL‌‌DOI back to text
16 inproceedingsC.Cédric‌ Prigent, K.Kate‌ Keahey, A.Alexandru‌‌ Costan, L.Loïc Cudennec and G.Gabriel‌ Antoniu. On the‌ Reproducibility Challenges of Federated‌‌ Learning: Investigating the Gap between Simulation, Emulation and‌ Real-World Deployments.CCGrid‌ 2025 - IEEE 25th‌‌ International Symposium on Cluster, Cloud and Internet Computing‌Tromso, Norway2025,‌ 185-194HAL DOI back‌‌ to text

Doctoral dissertations and habilitation theses

17‌ thesisC.Cédric Prigent‌. Towards Efficient and‌‌ Trustworthy Federated Learning on the Computing Continuum.‌INSA de RennesMay‌ 2025HAL

Reports &‌‌ preprints

18 miscR.Robin Boëzennec, F.‌Fernando Fernandes dos Santos‌, B.Brice Goglin‌‌, A.Angeliki Kritikakou, G.Guillaume Pallez‌, E.Erven Rohou‌, O.Olivier Sentieys‌‌ and M.Marcello Traiola. Increasing the Lifetime‌ of HPC Machines: Issues,‌ Implications, and Open Challenges‌‌.2025HAL back to text
19 report‌F.Francieli Boito,‌ L.Luan Teylo,‌‌ M.Mihail Popov, T.Théo Jolivel,‌ F.François Tessier,‌ J.Jakob Luettgau,‌‌ J.Julien Monniot, A.Ahmad Tarraf,‌ A.André Carneiro and‌ C.Carla Osthoff.‌‌ A deep look into the temporal I/O behavior‌ of HPC applications -extended‌ version.RR-9577Inria‌‌ & Labri, Univ. BordeauxMarch 2025, 1-42‌HAL back to text‌
20 miscJ.Julien‌‌ Herrmann and G.Guillaume Pallez. An In-depth‌ Study of LLM Contributions‌ to the Bin Packing‌‌ Problem.October 2025HAL back to text‌
21 miscT.Théo‌ Jolivel, F.François‌‌ Tessier, J.Jakob Luettgau, G.Gabriel‌ Antoniu and P.Philippe‌ Deniel. A Methodology‌‌ for System-Scale I/O Pattern Taxonomy for HPC Workloads‌.2025HAL back‌ to text
22 misc‌‌T.Théo Jolivel and F.François Tessier.‌ G.Gabriel Antoniu and‌ P.Philippe Deniel,‌‌ eds. MOSAIC: Automatic categorization of I/O patterns.‌November 2025HAL back‌ to text
23 misc‌‌M.Méline Trochon, J.-T.Jean-Thomas Acquaviva,‌ F.Francieli Boito,‌ B.Brice Goglin,‌‌ F.François Tessier and L.Luan Teylo.‌ On the Impact of‌ Interference from Concurrent Jobs‌‌ on Checkpointing Performance.2025HAL back to‌ text

Other scientific publications‌

24 inproceedingsA.Arthur‌‌ Jaquard. Enabling Efficient Runtime Data Analysis to‌ a Crystal Deformation Simulation‌.SC 2025 -‌‌ International Conference for High Performance Computing, Networking, Storage,‌ and AnalysisSaint Louis,‌ MO, USA, United States‌‌2025HAL back to text
25 miscS.‌Sarah Neuwirth, P.‌Philippe Deniel, J.-T.‌‌Jean-Thomas Acquaviva, M.Martin Golasowski, M.‌Michael Hennecke, A.‌Adrian Jackson, T.‌‌Thomas Leibovici, J.Jakob Luettgau and R.‌Ramon Nou. ETP4HPC‌ SRA 6 White Paper‌‌ - I/O and Storage‌.January 2025HALDOI back to text‌

11.3 Cited publications

26 articleD. P.David‌ Perez Abreu, K.Karima Velasquez, M.‌Marilia Curado and E.Edmundo Monteiro. A‌ Comparative Analysis of Simulators for the Cloud to‌ Fog Continuum.Simulation Modelling Practice and Theory‌2019, 102029back to text
27 article‌F.Fatih Abut and M.Mehmet Kızıldağ.‌ Design and Implementation of a Reconfigurable Test Environment‌ for Network Measurement Tools Based on a Control‌ and Management Framework.Applied Sciences151‌2025, URL: https://www.mdpi.com/2076-3417/15/1/487DOI back to text‌
28 bookB.Blake Alcott, M.Mario‌ Giampietro, K.Kozo Mayumi and J.John‌ Polimeni. The Jevons paradox and the myth‌ of resource efficiency improvements.Routledge2012back‌ to text
29 articleA.Alon Amid,‌ D.David Biancolin, A.Abraham Gonzalez,‌ D.Daniel Grubb, S.Sagar Karandikar,‌ H.Harrison Liew, A.Albert Magyar,‌ H.Howard Mao, A.Albert Ou,‌ N.Nathan Pemberton, P.Paul Rigge,‌ C.Colin Schmidt, J.John Wright,‌ J.Jerry Zhao, Y. S.Yakun Sophia‌ Shao, K.Krste Asanović and B.Borivoje‌ Nikolić. Chipyard: Integrated Design, Simulation, and Implementation‌ Framework for Custom SoCs.IEEE Micro40‌42020, 10-21DOI back to text‌back to text
30 articleM.Marianna Anagnostou‌, O.Olga Karvounidou, C.Chrysovalantou Katritzidaki‌, C.Christina Kechagia, K.Kyriaki Melidou‌, E.Eleni Mpeza, I.Ioannis Konstantinidis‌, E.Eleni Kapantai, C.Christos Berberidis‌, I.Ioannis Magnisalis and others. Characteristics‌ and challenges in the industries towards responsible AI:‌ a systematic literature review.Ethics and Information‌ Technology2432022, 37back to‌ text
31 bookG.Gabriel Antoniu, P.‌Patrick Valduriez, H.-C.Hans-Christian Hoppe and J.‌Jens KrÃŒger. Towards Integrated Hardware/Software Ecosystems for‌ the Edge-Cloud-HPC Continuum.ETP4HPC White PapersETP4HPC:‌ European Technology Platform for High Performance Computing2021‌HAL DOI back to text
32 softwareU.‌University of Applied Sciences Northwestern Switzerland. Karabo-Pipeline‌.v0.34.0 lic: MIT.back to text‌
33 articleV.Vijay Arya, R. K.‌Rachel KE Bellamy, P.-Y.Pin-Yu Chen,‌ A.Amit Dhurandhar, M.Michael Hind,‌ S. C.Samuel C Hoffman, S.Stephanie‌ Houde, Q. V.Q Vera Liao,‌ R.Ronny Luss, A.Aleksandra Mojsilović and‌ others. One explanation does not fit all:‌ A toolkit and taxonomy of ai explainability techniques‌.arXiv preprint arXiv:1909.030122019back to text‌
34 articleM.Mark Asch, T.Terry‌ Moore, R.R Badia, M.Micah‌ Beck, P.P Beckman, T.T‌ Bidot, F.François Bodin, F.Franck‌ Cappello, A.A Choudhary, B.B‌ De Supinski and others. Big data and extreme-scale computing: Pathways to‌ convergence-toward a shaping strategy‌ for a future software‌‌ and data ecosystem for scientific inquiry.The‌ International Journal of High‌ Performance Computing Applications32‌‌42018, 435--479back to text
35‌ phdthesisG.Guillaume Aupy‌. Resilient and energy-efficient‌‌ scheduling algorithms at scale.École Normale Supérieure‌ de Lyon2014back‌ to text
36 inproceedings‌‌S.Sana Awan, B.Bo Luo and‌ F.Fengjun Li.‌ Contra: Defending against poisoning‌‌ attacks in federated learning.Computer Security--ESORICS 2021:‌ 26th European Symposium on‌ Research in Computer Security,‌‌ Darmstadt, Germany, October 4--8, 2021, Proceedings, Part I‌ 26Springer2021,‌ 455--475back to text‌‌
37 articleP.Paul Ayris, J.-Y.Jean-Yves‌ Berthou, R.Rachel‌ Bruce, S.Stefanie‌‌ Lindstaedt, A.Anna Monreale, B.Barend‌ Mons, Y.Yasuhiro‌ Murayama, C.Caj‌‌ Soedergaard, K.Klaus Tochtermann and R.Ross‌ Wilkinson. Realising the‌ European open science cloud‌‌.2016back to text
38 inproceedingsJ.‌Jonathan Bachrach, H.‌Huy Vo, B.‌‌Brian Richards, Y.Yunsup Lee, A.‌Andrew Waterman, R.‌Rimas Aviżienis, J.‌‌John Wawrzynek and K.Krste Asanović. Chisel:‌ constructing hardware in a‌ Scala embedded language.‌‌Proceedings of the 49th Annual Design Automation Conference‌DAC '12New York,‌ NY, USASan Francisco,‌‌ CaliforniaAssociation for Computing Machinery2012, 1216–1225‌URL: https://doi.org/10.1145/2228360.2228584DOI back‌ to text back to‌‌ text
39 articleY.Yogesh Balaji, M.‌Mehrdad Farajtabar, D.‌Dong Yin, A.‌‌Alex Mott and A.Ang Li. The‌ effectiveness of memory replay‌ in large scale continual‌‌ learning.arXiv preprint arXiv:2010.024182020back to‌ text
40 articleD.‌Daniel Balouek-Thomert, E.‌‌ G.Eduard Gibert Renart, A. R.Ali‌ Reza Zamani, A.‌Anthony Simonet and M.‌‌Manish Parashar. Towards a computing continuum: Enabling‌ edge-to-cloud integration for data-driven‌ workflows.The International‌‌ Journal of High Performance Computing Applications336‌2019, 1159--1174back‌ to text
41 article‌‌L. A.L. A. Barba and G. K.‌G. K. Thiruvathukal.‌ Reproducible Research for Computing‌‌ in Science Engineering.Computing in Science Engineering‌1962017,‌ 85-87back to text‌‌
42 miscE.Eamon Barrett. Taiwan’s drought‌ is exposing just how‌ much water chipmakers like‌‌ TSMC use (and reuse).2021back to‌ text
43 inproceedingsM.‌Micah Beck, T.‌‌Terry Moore, P.Piotr Luszczek and A.‌Anthony Danalis. Interoperable‌ Convergence of Storage, Networking,‌‌ and Computation.Advances in Information and Communication‌ChamSpringer International Publishing‌2020, 667--690back‌‌ to text
44 articleT.Tal Ben-Nun and‌ T.Torsten Hoefler.‌ Demystifying parallel and distributed‌‌ deep learning: An in-depth concurrency analysis.ACM‌ Computing Surveys524‌2019, 1--43back‌‌ to text
45 inproceedingsJ. C.Janine C.‌ Bennett, H.Hasan‌ Abbasi, P.-T.Peer-Timo‌‌ Bremer, R.Ray‌ Grout, A.Attila Gyulassy, T.Tong‌ Jin, S.Scott Klasky, H.Hemanth‌ Kolla, M.Manish Parashar, V.Valerio‌ Pascucci, P.Philippe Pebay, D.David‌ Thompson, H.Hongfeng Yu, F.Fan‌ Zhang and J.Jacqueline Chen. Combining in-situ‌ and in-transit processing to enable extreme-scale scientific analysis‌.SC '12: Proceedings of the International Conference‌ on High Performance Computing, Networking, Storage and Analysis‌2012, 1-9DOIback to text
46‌ articleE.Elisa Bertino, S.Suparna Bhattacharya‌, E.Elena Ferrari and D.Dejan Milojicic‌. Trustworthy AI and Data Lineage.IEEE‌ Internet Computing2762023, 5--6back‌ to text
47 articleP.Peva Blanchard,‌ E. M.El Mahdi El Mhamdi, R.‌Rachid Guerraoui and J.Julien Stainer. Machine‌ learning with adversaries: Byzantine tolerant gradient descent.‌Advances in neural information processing systems302017‌back to text
48 articleP.Pat Bosshart‌, D.Dan Daly, G.Glen Gibb‌, M.Martin Izzard, N.Nick McKeown‌, J.Jennifer Rexford, C.Cole Schlesinger‌, D.Dan Talayco, A.Amin Vahdat‌, G.George Varghese and D.David Walker‌. P4: Programming Protocol-Independent Packet Processors.ACM‌ SIGCOMM Computer Communication Review443July 2014‌, 87--95URL: https://dl.acm.org/doi/10.1145/2656877.2656890DOI back to text‌
49 miscJ. C.Joshua C Bowden,‌ F.François Tessier, C.Charles Deltel,‌ S.Simone Bnà and G.Gabriel Antoniu.‌ P. P.PRACE: Partnership for Advanced Computing in‌ Europe, eds. In-situ visualization using Damaris: the‌ Code Saturne use case.PRACE White Paper‌PRACE: Partnership for Advanced Computing in EuropeSeptember‌ 2021HAL back to text
50 articleA.‌Arnaud Braud, G.Gaël Fromentoux, B.‌Benoit Radier and O.Olivier Le Grand.‌ The road to European digital sovereignty with Gaia-X‌ and IDSA.IEEE network3522021‌, 4--5back to text
51 articleC.‌Christopher Briggs, Z.Zhong Fan and P.‌Péter András. Federated learning with hierarchical clustering‌ of local updates to improve training on non-IID‌ data.2020 International Joint Conference on Neural‌ Networks (IJCNN)2020, 1-9URL: https://api.semanticscholar.org/CorpusID:216144447back‌ to text
52 inproceedingsF.Francois Broquedis,‌ J.Jérôme Clet-Ortega, S.Stéphanie Moreaud,‌ N.Nathalie Furmento, B.Brice Goglin,‌ G.Guillaume Mercier, S.Samuel Thibault and‌ R.Raymond Namyst. hwloc: A Generic Framework‌ for Managing Hardware Affinities in HPC Applications.‌2010 18th Euromicro Conference on Parallel, Distributed and‌ Network-based Processing2010, 180-186DOI back to‌ text
53 inproceedingsP.Pablo Brox, J.‌Javier Garcia-Blas, D. E.David E Singh‌ and J.Jesus Carretero. DICE: Generic Data‌ Abstraction for Enhancing the Convergence of HPC and‌ Big Data.High Performance Computing: 8th Latin‌ American Conference, CARLA 2021, Guadalajara, Mexico, October 6--8, 2021, Revised Selected Papers‌Springer2022, 106--119‌back to text
54‌‌ inproceedingsP.Pietro Buzzega, M.Matteo Boschini‌ and S.Simone Calderara‌. Rethinking experience replay:‌‌ A bag of tricks for continual learning.‌25th International Conference on‌ Pattern Recognition (ICPR)2021‌‌, 2180--2187back to text
55 inproceedingsP.‌Philip Carns, R.‌Robert Latham, R.‌‌Robert Ross, K.Kamil Iskra, S.‌Samuel Lang and K.‌Katherine Riley. 24/7‌‌ characterization of petascale I/O workloads.2009 IEEE‌ International Conference on Cluster‌ Computing and WorkshopsIEEE‌‌2009, 1--10back to text
56 misc‌P.Paul Carpenter,‌ U.-U.Utz-Uwe Haus,‌‌ E.Erwin Laure, S.Sai Narasimhamurthy and‌ E.Estela Suarez.‌ Heterogeneity is here to‌‌ stay: Challenges and Opportunities in HPC.February‌ 2022, URL: https://www.etp4hpc.eu/pujades/files/ETP4HPC_WP_Heterogeneous-HPC_20220216.pdf‌back to text
57‌‌ articleH.Henri Casanova, R.Rafael Ferreira‌ da Silva, R.‌Ryan Tanaka, S.‌‌Suraj Pandey, G.Gautam Jethwani, W.‌William Koch, S.‌Spencer Albrecht, J.‌‌James Oeth and F.Frédéric Suter. Developing‌ Accurate and Scalable Simulators‌ of Production Workflow Management‌‌ Systems with WRENCH.Future Generation Computer Systems‌1122020, 162--175‌DOI back to text‌‌back to text
58 articleH.Henri Casanova‌, A.Arnaud Giersch‌, A.Arnaud Legrand‌‌, M.Martin Quinson and F.Frédéric Suter‌. Versatile, Scalable, and‌ Accurate Simulation of Distributed‌‌ Applications and Platforms.Journal of Parallel and‌ Distributed Computing7410‌June 2014, 2899-2917‌‌HAL back to textback to text
59‌ articleA.Arslan Chaudhry‌, A.Albert Gordo‌‌, P. K.Puneet K Dokania, P.‌Philip Torr and D.‌David Lopez-Paz. Using‌‌ hindsight to anchor past knowledge in continual learning‌.arXiv preprint arXiv:2002.08165‌32020back to‌‌ text
60 inproceedingsM.Melvin Chelli, C.‌Cédric Prigent, R.‌René Schubotz, A.‌‌Alexandru Costan, G.Gabriel Antoniu, L.‌Lo\"ic Cudennec and P.‌Philipp Slusallek. FedGuard:‌‌ Selective Parameter Aggregation for Poisoning Attack Mitigation in‌ Federated Learning.Cluster‌ 2023 - IEEE International‌‌ Conference on Cluster ComputingSanta Fe, New Mexico,‌ United StatesIEEEOctober‌ 2023, 1-10HAL‌‌back to text
61 miscE.European Commission‌. Critical Raw Materials‌ Resilience: Charting a Path‌‌ towards greater Security and Sustainability.2020back‌ to text
62 misc‌Contrat d’objectifs et de‌‌ performance 2019-2023 Entre l’État et Inria.2019‌back to text
63‌ articleC.C.S. Daley‌‌, D.D. Ghoshal, G.G.K. Lockwood‌, S.S. Dosanjh‌, L.L. Ramakrishnan‌‌ and N.N.J. Wright. Performance characterization of‌ scientific workflows for the‌ optimal use of Burst‌‌ Buffers.Future Generation Computer Systems1102020‌, 468-480URL: https://www.sciencedirect.com/science/article/pii/S0167739X16308287‌DOI back to text‌‌
64 articleA.Advait Deshpande. Assessing the‌ quantum-computing landscape.Communications‌ of the ACM65‌‌102022, 57--65‌back to text
65 articleP. E.Peter‌ E. Dewdney, P. J.Peter J. Hall‌, R. T.Richard T. Schilizzi and T.‌ J.T. Joseph L. W. Lazio. The‌ Square Kilometre Array.Proceedings of the IEEE‌9782009, 1482-1496DOI back to‌ text
66 inproceedingsE.Estelle Dirand, L.‌Laurent Colombet and B.Bruno Raffin. TINS:‌ A Task-Based Dynamic Helper Core Strategy for In‌ Situ Analytics.SCA18 - Supercomputing Frontiers Asia‌ 2018Singapore, SingaporeMarch 2018, 159-178HAL‌DOI back to text
67 articleM.Matthieu‌ Dorier, G.Gabriel Antoniu, F.Franck‌ Cappello, M.Marc Snir, R.Robert‌ Sisneros, O.Orcun Yildiz, S.Shadi‌ Ibrahim, T.Tom Peterka and L.Leigh‌ Orf. Damaris: Addressing Performance Variability in Data‌ Management for Post-Petascale Simulations.ACM Transactions on‌ Parallel Computing332016, 15HAL‌DOI back to text
68 miscECLAT -‌ Extreme Computing Lab for Astronomical Telescopes.2024‌, URL: https://eclat-lab.fr/back to text
69 article‌D.Dina Fakhry, M.Mohamed Abdelsalam,‌ M. W.M. Watheq El-Kharashi and M.Mona‌ Safar. A Review on Computational Storage Devices‌ and near Memory Computing for High Performance Applications‌.Memories - Materials, Devices, Circuits and Systems‌4July 2023, 100051URL: https://www.sciencedirect.com/science/article/pii/S2773064623000282DOI‌back to text
70 articleA.Ana Gainaru‌, L.Lipeng Wan, R.Ruonan Wang‌, E.Eric Suchyta, J.Jieyang Chen‌, N.Norbert Podhorszki, J.James Kress‌, D.David Pugmire and S.Scott Klasky‌. Understanding the Impact of Data Staging for‌ Coupled Scientific Workflows.IEEE Transactions on Parallel‌ and Distributed Systems33122022, 4134--4147‌back to text
71 articleA.Ana Gainaru‌, L.Lipeng Wan, R.Ruonan Wang‌, E.Eric Suchyta, J.Jieyang Chen‌, N.Norbert Podhorszki, J.James Kress‌, D.David Pugmire and S.Scott Klasky‌. Understanding the Impact of Data Staging for‌ Coupled Scientific Workflows.IEEE Transactions on Parallel‌ and Distributed Systems33122022, 4134-4147‌DOI back to text
72 articleA.Avishek‌ Ghosh, J.Jichan Chung, D.Dong‌ Yin and K.Kannan Ramchandran. An Efficient‌ Framework for Clustered Federated Learning.IEEE Transactions‌ on Information Theory68122022, 8076-8091‌DOI back to text
73 inproceedingsD.Donghyun‌ Gouk, S.Sangwon Lee, M.Miryeong‌ Kwon and M.Myoungsoo Jung. Direct Access,‌ High-Performance Memory Disaggregation with DirectCXL.2022 USENIX‌ Annual Technical Conference (USENIX ATC 22)Carlsbad, CA‌USENIX AssociationJuly 2022, 287--294URL: https://www.usenix.org/conference/atc22/presentation/gouk‌back to text back to text
74 article‌V.V. Grandgirard, Y.Y. Sarazin,‌ X.X. Garbet, G.G. Dif-Pradalier,‌ P.Ph. Ghendrih, N.N. Crouseilles,‌ G.G. Latu, E.E. Sonnendrücker, N.N. Besse and‌ P.P. Bertrand.‌ GYSELA, a full-f global‌‌ gyrokinetic Semi-Lagrangian code for ITG turbulence simulations.‌AIP Conference Proceedings871‌12006, 100-111‌‌URL: http://scitation.aip.org/content/aip/proceeding/aipcp/10.1063/1.2404543DOI back to text
75 article‌N.Nathalie Hartl,‌ E.Elena Wössner and‌‌ Y.York Sure-Vetter. Nationale Forschungsdateninfrastruktur (NFDI).‌Informatik Spektrum445‌2021, 370--373back‌‌ to text
76 articleE. A.E. A.‌ Huerta, A.Asad‌ Khan, E.Edward‌‌ Davis, C.Colleen Bushell, W. D.‌William D. Gropp,‌ D. S.Daniel S.‌‌ Katz, V. V.Volodymyr V. Kindratenko,‌ S.Seid Koric,‌ W. T.William T.‌‌ C. Kramer, B.Brendan McGinty, K.‌Kenton McHenry and A.‌Aaron Saxton. Convergence‌‌ of artificial intelligence and high performance computing on‌ NSF-supported cyberinfrastructure.Journal‌ of Big Data7‌‌12020, 88back to text
77‌ miscK.Kevin Jacobs‌, S.Sagar Chopra‌‌, A.Aaron Barr and B.Benjamin Boucher‌. Supply shortages and‌ an inflexible market give‌‌ rise to high power transformer lead times.‌2021back to text‌
78 articleE.Emmanuel‌‌ Jeannot, G.Guillaume Pallez and N.Nicolas‌ Vidal. IO-aware Job-Scheduling:‌ Exploiting the Impacts of‌‌ Workload Characterizations to select the Mapping Strategy.‌International Journal of High‌ Performance Computing Applications2023‌‌, 1-13HAL DOIback to text
79‌ inproceedingsT.Théo Jolivel‌, F.François Tessier‌‌, J.Julien Monniot and G.Guillaume Pallez‌. MOSAIC: Detection and‌ Categorization of I/O Patterns‌‌ in HPC Applications.SC24-W: Workshops of the‌ International Conference for High‌ Performance Computing, Networking, Storage‌‌ and AnalysisAtlanta, United StatesNovember 2024,‌ 1-7HAL DOI back‌ to text
80 inproceedings‌‌D.Dieter Kranzlmueller, J. M.J Marco‌ de Lucas and P.‌P Oester. The‌‌ European Grid Initiative (EGI) Towards a Sustainable Grid‌ Infrastructure.Remote Instrumentation‌ and Virtual Laboratories: Service‌‌ Architecture and NetworkingSpringer2010, 61--66back‌ to text
81 article‌D.Dave Landsman and‌‌ K.Karin Strauss. The DNA Data Storage‌ Model.Computer56‌7July 2023,‌‌ 78--85URL: https://ieeexplore.ieee.org/document/10154188/DOIback to text
82‌ inproceedingsA.Adrien Lebre‌, A.Arnaud Legrand‌‌, F.Frédéric Suter and P.Pierre Veyre‌. Adding Storage Simulation‌ Capacities to the SimGrid‌‌ Toolkit: Concepts, Models, and API.2015 15th‌ IEEE/ACM International Symposium on‌ Cluster, Cloud and Grid‌‌ Computing2015, 251-260DOI back to text‌
83 miscH.Huaicheng‌ Li, D. S.‌‌Daniel S. Berger, S.Stanko Novakovic,‌ L.Lisa Hsu,‌ D.Dan Ernst,‌‌ P.Pantea Zardoshti, M.Monish Shah,‌ S.Samir Rajadnya,‌ S.Scott Lee,‌‌ I.Ishwar Agarwal, M. D.Mark D.‌ Hill, M.Marcus‌ Fontoura and R.Ricardo‌‌ Bianchini. Pond: CXL-Based Memory Pooling Systems for‌ Cloud Platforms.October‌ 2022, URL: http://arxiv.org/abs/2203.00241‌‌DOI back to text‌back to text
84 articleS.Suyi Li‌, Y.Yong Cheng, W.Wei Wang‌, Y.Yang Liu and T.Tianjian Chen‌. Learning to Detect Malicious Clients for Robust‌ Federated Learning.CoRRabs/2002.002112020, URL:‌ https://arxiv.org/abs/2002.00211back to text
85 incollectionT.Thomas‌ Lippert, T.Thomas Eickermann and D.Dietmar‌ Erwin. PRACE: Europe's supercomputing research infrastructure.‌Applications, Tools and Techniques on the Road to‌ Exascale ComputingIOS Press2012, 7--18back‌ to text
86 inproceedingsG.GK. Lockwood,‌ D.D. Hazen, Q.Q. Koziol,‌ R.RS. Canon, K.K. Antypas and‌ J.J. Balewski. Storage 2020: A Vision‌ for the Future of HPC Storage.Report:‌ LBNL-2001072Lawrence Berkeley National Laboratory2017, URL:‌ https://escholarship.org/uc/item/744479dp#authorback to text
87 inproceedingsJ.Jakob‌ Luettgau, S.Shane Snyder, P.Philip‌ Carns, J. M.Justin M. Wozniak,‌ J.Julian Kunkel and T.Thomas Ludwig.‌ Toward Understanding I/O Behavior in HPC Workflows.‌2018 IEEE/ACM 3rd International Workshop on Parallel Data‌ Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS)‌Dallas, TX, USANovember 2018, 64--75DOI‌back to text
88 inproceedingsJ.Jakob Luettgau‌, S.Shane Snyder, T.Tyler Reddy‌, N.Nikolaus Awtrey, K.Kevin Harms‌, J. L.Jean Luca Bez, R.‌Rui Wang, R.Rob Latham and P.‌Philip Carns. Enabling Agile Analysis of I/O‌ Performance Data with PyDarshan.Proceedings of the‌ SC '23 Workshops of The International Conference on‌ High Performance Computing, Network, Storage, and AnalysisSC-W‌ '23New York, NY, USAAssociation for Computing‌ MachineryNovember 2023, 1380--1391URL: https://doi.org/10.1145/3624062.3624207DOI‌back to text
89 articleT.Tao Luo‌, W.-F.Weng-Fai Wong, R. S.Rick‌ Siow Mong Goh, A. T.Anh Tuan‌ Do, Z.Zhixian Chen, H.Haizhou‌ Li, W.Wenyu Jiang and W.Weiyun‌ Yau. Achieving Green AI with Energy-Efficient Deep‌ Learning Using Neuromorphic Computing.Commun. ACM66‌7jun 2023, 52–57URL: https://doi.org/10.1145/3588591DOI‌back to text
90 bookM.Michael Malms‌, L.Laurent Cargemel, E.Estela Suarez‌, N.Nico Mittenzwey, M.Marc Duranton‌, S.Sakir Sezer, C.Craig Prunty‌, P.Pascale Rossé-Laurent, M.Maria Pérez-Harnandez‌, M.Manolis Marazakis, G.Guy Lonsdale‌, P.Paul Carpenter, G.Gabriel Antoniu‌, S.Sai Narasimharmurthy, A.André Brinkman‌, D.Dirk Pleiter, U.-U.Utz-Uwe Haus‌, J.Jens Krueger, H.-C.Hans-Christian Hoppe‌, E.Erwin Laure, A.Andreas Wierse‌, V.Valeria Bartsch, K.Kristel Michielsen‌, C.Cyril Allouche, T.Tobias Becker‌ and R.Robert Haas. ETP4HPC's SRA 5‌ - Strategic Research Agenda for High-Performance Computing in‌ Europe - 2022.Zenodo2022HAL DOI‌back to text back to text back to text
91 inproceedingsJ.‌Julien Monniot, F.‌François Tessier, M.‌‌Matthieu Robert and G.Gabriel Antoniu. StorAlloc:‌ A Simulator for Job‌ Scheduling on Heterogeneous Storage‌‌ Resources.HeteroPar 2022Glasgow, United KingdomAugust‌ 2022HAL back to‌ text
92 articleJ.‌‌Julien Monniot, F.François Tessier, M.‌Matthieu Robert and G.‌Gabriel Antoniu. Supporting‌‌ dynamic allocation of heterogeneous storage resources on HPC‌ systems.Concurrency and‌ Computation: Practice and Experience‌‌35282023, e7890URL: https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.7890DOI‌back to text
93‌ articleE.Etienne Ndamlabin‌‌ and B.Berenger Bramas. RSCHED: An Effective‌ Heterogeneous Resource Management for‌ Simultaneous Execution of Task-Based‌‌ Applications.International Journal of Advanced Computer Science‌ and Applications162‌2025HAL DOI back‌‌ to text
94 articleB. H.Bichlien H.‌ Nguyen, C. N.‌Christopher N. Takahashi,‌‌ G.Gagan Gupta, J. A.Jake A.‌ Smith, R.Richard‌ Rouse, P.Paul‌‌ Berndt, S.Sergey Yekhanin, D. P.‌David P. Ward,‌ S. D.Siena D.‌‌ Ang, P.Patrick Garvan, H.-Y.Hsing-Yeh‌ Parker, R.Rob‌ Carlson, D.Douglas‌‌ Carmean, L.Luis Ceze and K.Karin‌ Strauss. Scaling DNA‌ Data Storage with Nanoscale‌‌ Electrode Wells.Science Advances748November‌ 2021, eabi6714URL:‌ https://www.science.org/doi/10.1126/sciadv.abi6714DOI back to‌‌ text
95 articleC. I.Cosmas Ifeanyi Nwakanma‌, J.-W.Jae-Woo Kim‌, J.-M.Jae-Min Lee‌‌ and D.-S.Dong-Seong Kim. Edge AI prospect‌ using the NeuroEdge computing‌ system: Introducing a novel‌‌ neuromorphic technology.ICT Express722021‌, 152--157back to‌ text
96 articleH.‌‌Harish Padmanaban. Quantum Computing and AI in‌ the Cloud.Journal‌ of Computational Intelligence and‌‌ Robotics41Mar. 2024, 14–32URL:‌ https://thesciencebrigade.com/jcir/article/view/116back to text‌
97 inproceedingsF.Fengfeng‌‌ Pan, Y.Yinliang Yue, J.Jin‌ Xiong and D.Daxiang‌ Hao. I/O Characterization‌‌ of Big Data Workloads in Data Centers.‌Big Data Benchmarks, Performance‌ Optimization, and Emerging Hardware‌‌ChamSpringer International Publishing2014, 85--97back‌ to text
98 inproceedings‌R.Robert Patton,‌‌ C.Catherine Schuman, S.Shruti Kulkarni,‌ M.Maryam Parsa,‌ J. P.J Parker‌‌ Mitchell, N. Q.N Quentin Haas,‌ C.Christopher Stahl,‌ S.Spencer Paulissen,‌‌ P.Prasanna Date, T.Thomas Potok and‌ others. Neuromorphic computing‌ for autonomous racing.‌‌International Conference on Neuromorphic Systems 20212021,‌ 1--5back to text‌
99 articleM.Mohan‌‌ Raparthi. Real-Time AI Decision Making in IoT‌ with Quantum Computing: Investigating‌ & Exploring the Development‌‌ and Implementation of Quantum-Supported AI Inference Systems for‌ IoT Applications.Internet‌ of Things and Edge‌‌ Computing Journal11Mar. 2021, 18–27‌URL: https://thesciencebrigade.com/iotecj/article/view/130back to‌ text
100 inproceedingsG.‌‌ P.Gonzalo Pedro Rodrigo Álvarez, P.-O.Per-Olov‌ Östberg, E.Erik‌ Elmroth, K.Katie‌‌ Antypas, R.Richard‌ Gerber and L.Lavanya Ramakrishnan. HPC System‌ Lifetime Story: Workload Characterization and Evolutionary Analyses on‌ NERSC Systems.Proceedings of the 24th International‌ Symposium on High-Performance Parallel and Distributed ComputingHPDC‌ '15New York, NY, USAPortland, Oregon, USA‌Association for Computing Machinery2015, 57–60URL:‌ https://doi.org/10.1145/2749246.2749270DOI back to text
101 articleD.‌David Rolnick and A.Arun Ahuja. Experience‌ replay for continual learning.Advances in Neural‌ Information Processing Systems322019back to text‌
102 inproceedingsD.Daniel Rosendo, P.Pedro‌ Silva, M.Matthieu Simonin, A.Alexandru‌ Costan and G.Gabriel Antoniu. E2Clab: Exploring‌ the Computing Continuum through Repeatable, Replicable and Reproducible‌ Edge-to-Cloud Experiments.2020 IEEE International Conference on‌ Cluster Computing (CLUSTER)2020, 176-186DOI back‌ to text
103 miscSKA - Square Kilometre‌ Array.2024, URL: https://www.skao.int/enback to‌ text
104 inproceedingsS.Shazia Sadiq, M.‌Maria Orlowska, W.Wasim Sadiq and C.‌Cameron Foulger. Data Flow and Validation in‌ Workflow Modelling.Proceedings of the 15th Australasian‌ database conference-Volume 272004, 207--214back to‌ text
105 inproceedingsC.Conrad Sanderson, Q.‌Qinghua Lu, D.David Douglas, X.‌Xiwei Xu, L.Liming Zhu and J.‌Jon Whittle. Towards Implementing Responsible AI.‌2022 IEEE International Conference on Big Data (Big‌ Data)IEEE2022, 5076--5081back to text‌
106 articleR. F.Rafael Ferreira da Silva‌, H.Henri Casanova, A.-C.Anne-Cécile Orgerie‌, R.Ryan Tanaka, E.Ewa Deelman‌ and F.Frédéric Suter. Characterizing, Modeling, and‌ Accurately Simulating Power and Energy Consumption of I/O-intensive‌ Scientific Workflows.Journal of Computational Science44‌2020, 101157URL: https://www.sciencedirect.com/science/article/pii/S1877750320304580DOI back to‌ text
107 articleT.Thomas Skordas. Toward‌ a european exascale ecosystem: the eurohpc joint undertaking‌.Communications of the ACM6242019‌, 70--70back to text
108 miscE.‌ A.European Association on Smart Systems Integration.‌ Strategic Research and Innovation Agenda.2023,‌ URL: https://ecssria.eu/ECS-SRIA%202023.pdfback to text
109 inproceedingsS.‌S. Snyder, P.P. Carns, K.‌K. Harms, R.R. Ross, G.‌ K.G. K. Lockwood and N. J.N.‌ J. Wright. Modular HPC I/O Characterization with‌ Darshan.2016 5th Workshop on Extreme-Scale Programming‌ Tools (ESPT)2016, 9-17DOI back to‌ text
110 inproceedingsL.Linghao Song, F.‌Fan Chen, S. R.Steven R Young‌, C. D.Catherine D Schuman, G.‌Gabriel Perdue and T. E.Thomas E Potok‌. Deep learning for vertex reconstruction of neutrino-nucleus‌ interaction events with combined energy and time data‌.ICASSP 2019-2019 IEEE International Conference on Acoustics,‌ Speech and Signal Processing (ICASSP)IEEE2019,‌ 3882--3886back to text
111 articleV.Victoria‌ Stodden and S.Sheila Miguez. Best Practices‌ for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research‌.Available at SSRN‌ 23222762013back to‌‌ text
112 articleS.Sergej Svorobej, P.‌Patricia Takako Endo,‌ M.Malika Bendechache,‌‌ C.Christos Filelis-Papadopoulos, K. M.Konstantinos M‌ Giannoutakis, G. A.‌George A Gravvanis,‌‌ D.Dimitrios Tzovaras, J.James Byrne and‌ T.Theo Lynn.‌ Simulating Fog and Edge‌‌ Computing Scenarios: An Overview and Research Challenges.‌Future Internet113‌2019, 55back‌‌ to text
113 inproceedingsH.Houjun Tang,‌ S.Suren Byna,‌ F.François Tessier,‌‌ T.Teng Wang, B.Bin Dong,‌ J.Jingqing Mu,‌ Q.Quincey Koziol,‌‌ J.Jerome Soumagne, V.Venkatram Vishwanath,‌ J.Jialin Liu and‌ R.Richard Warren.‌‌ Toward Scalable and Asynchronous Object-Centric Data Management for‌ HPC.2018 18th‌ IEEE/ACM International Symposium on‌‌ Cluster, Cloud and Grid Computing (CCGRID)2018,‌ 113-122DOI back to‌ text
114 inproceedingsF.‌‌François Tessier, P.Paul Gressier and V.‌Venkatram Vishwanath. Optimizing‌ Data Aggregation by Leveraging‌‌ the Deep Memory Hierarchy on Large-scale Systems.‌Proceedings of the 2018‌ International Conference on Supercomputing‌‌ICS '18New York, NY, USABeijing, China‌ACM2018, 229--239‌URL: http://doi.acm.org/10.1145/3205289.3205316DOI back‌‌ to text
115 inproceedingsF.F. Tessier,‌ V.V. Vishwanath and‌ E.E. Jeannot.‌‌ TAPIOCA: An I/O Library for Optimized Topology-Aware Data‌ Aggregation on Large-Scale Supercomputers‌.2017 IEEE International‌‌ Conference on Cluster Computing (CLUSTER)Sept 2017,‌ 70-80DOI back to‌ text
116 inproceedingsV.‌‌Vale Tolpegin, S.Stacey Truex, M.‌ E.Mehmet Emre Gursoy‌ and L.Ling Liu‌‌. Data Poisoning Attacks Against Federated Learning Systems‌.Computer Security –‌ ESORICS 2020Lecture Notes‌‌ in Computer ScienceChamSpringer International Publishing2020‌, 480--501DOI back‌ to text
117 inproceedings‌‌A.Andrew Waterman, Y.Yunsup Lee,‌ R.Rimas Avizienis,‌ H.Henry Cook,‌‌ D.David Patterson and K.Krste Asanovic.‌ The RISC-V instruction set‌.2013 IEEE Hot‌‌ Chips 25 Symposium (HCS)2013, 1-1DOI‌back to text
118‌ articleM. D.Mark‌‌ D. Wilkinson, M.Michel Dumontier, I.‌ J.IJsbrand Jan Aalbersberg‌, G.Gabrielle Appleton‌‌, M.Myles Axton, A.Arie Baak‌, N.Niklas Blomberg‌, J.-W.Jan-Willem Boiten‌‌, L. B.Luiz Bonino da Silva Santos‌, P. E.Philip‌ E. Bourne, J.‌‌Jildau Bouwman, A. J.Anthony J. Brookes‌, T.Tim Clark‌, M.Mercè Crosas‌‌, I.Ingrid Dillo, O.Olivier Dumon‌, S.Scott Edmunds‌, C. T.Chris‌‌ T. Evelo, R.Richard Finkers, A.‌Alejandra Gonzalez-Beltran, A.‌ J.Alasdair J. G.‌‌ Gray, P.Paul Groth, C.Carole‌ Goble, J. S.‌Jeffrey S. Grethe,‌‌ J.Jaap Heringa, P. A.Peter A.‌ C. 't Hoen,‌ R.Rob Hooft,‌‌ T.Tobias Kuhn,‌ R.Ruben Kok, J.Joost Kok,‌ S. J.Scott J. Lusher, M. E.‌Maryann E. Martone, A.Albert Mons,‌ A. L.Abel L. Packer, B.Bengt‌ Persson, P.Philippe Rocca-Serra, M.Marco‌ Roos, R.Rene van Schaik, S.-A.‌Susanna-Assunta Sansone, E.Erik Schultes, T.‌Thierry Sengstag, T.Ted Slater, G.‌George Strawn, M. A.Morris A. Swertz‌, M.Mark Thompson, J.Johan van‌ der Lei, E.Erik van Mulligen,‌ J.Jan Velterop, A.Andra Waagmeester,‌ P.Peter Wittenburg, K.Katherine Wolstencroft,‌ J.Jun Zhao and B.Barend Mons.‌ The FAIR Guiding Principles for Scientific Data Management‌ and Stewardship.Scientific Data31March‌ 2016, 160018URL: https://www.nature.com/articles/sdata201618DOI back to‌ text

KERDATA - 2025

KERDATA - 2025

2025Activity reportProject-Team​‌﻿﻿KERDATA

Keywords﻿​﻿﻿

Computer Science and Digital​‌﻿﻿ Science

Other Research Topics and﻿​﻿﻿ Application Domains

1 Team members,​​﻿﻿ visitors, external collaborators

Research​​​‌ Scientists

Faculty Member﻿​​﻿

PhD﻿​​﻿ Students

Technical Staff

Interns and﻿​​﻿ Apprentices

Administrative Assistants

Visiting Scientists

2 Overall objectives

2.1﻿​​﻿ Context: the emergence of​​​‌ the Edge-Cloud-HPC Continuum.

Our objective:​‌﻿﻿ Enable the Data Continuum.​​﻿﻿

2.2 Application/workflow-level challenges

2.3 Middleware-level challenges​‌﻿﻿

2.4 Resource management challenges​​﻿﻿

2.5 Approach,​​​‌ methodology, platforms

2.6 Collaboration strategy

2.7 Alignment with﻿﻿﻿‌ institutional, national and European﻿‌​‌ strategies

3​‌﻿﻿ Research program

3.1﻿‌​‌ Axis 1: Supporting Data-Centric﻿​​﻿ Applications and Workflows Running​​​‌ Across the Computing Continuum﻿﻿﻿‌

Data-Centric​​​‌ Workflow Composition in the﻿​﻿﻿ Computing Continuum.

Enabling Reproducibility and Trustworthiness​​​‌ in Complex Workflows Across﻿​﻿﻿ the Computing Continuum.

Efficient Parallel Continual﻿﻿﻿‌ Learning.

Scalable,﻿﻿﻿‌ Secure and Resource-Efficient Federated﻿‌​‌ Learning.

3.2 Axis 2:﻿​﻿﻿ Data-Aware Middleware Approaches for​‌﻿﻿ the Computing Continuum

Workflow I/O Behavior Analysis﻿​​﻿ Methods for Sustainability.

Abstraction for HPC/Cloud Storage﻿﻿﻿‌ Convergence.

Exascale In-Situ Analytics.​​﻿﻿

Sustainable Interoperability Across﻿‌​‌ the Computing Continuum.

3.3​​﻿﻿ Axis 3: Sustainable Resource​​​‌ Management for the Computing﻿​﻿﻿ Continuum

Provisioning Storage Resources​​​‌ on Large-Scale Infrastructures.

Storage Disaggregation and﻿​​﻿ Computational Storage.

Frugal Data​​​‌ Storage Architectures to Support﻿​﻿﻿ Post-Exascale Workflows.

HPC​​​‌ Resource Management Faced with﻿​﻿﻿ the Environmental Crisis.

4 Application domains​​​‌

4.1 Radio​​​‌ astronomy

4.2 Nuclear​​​‌ Fusion

4.3 Material Science

5 Social and environmental​​﻿﻿ responsibility

5.1 Footprint of​​​‌ research activities

5.2 Impact﻿​﻿﻿ of research results

Social impact.﻿​​﻿

Environmental impact.

6 Highlights of﻿​​﻿ the year

7 Latest software developments,﻿﻿﻿‌ platforms, open data

7.1﻿‌​‌ Latest software developments

7.1.1﻿​​﻿ Damaris

7.1.2 E2Clab

7.1.3 Fives

7.1.4 MOSAIC

7.1.5​​﻿﻿ FLAdversary

7.1.6 FLDrift

7.2 Open﻿﻿﻿‌ data

7.2.1 I/O Traces﻿‌​‌

8 New﻿‌​‌ results

8.1 Supporting Data-Centric﻿​​﻿ Applications and Workflows Running​​​‌ Across the Computing Continuum﻿﻿﻿‌

8.1.1 On the Reproducibility﻿‌​‌ Challenges of Federated Learning:﻿​​﻿ Investigating the Gap between​​​‌ Simulation, Emulation and Real-World﻿﻿﻿‌ Deployments

8.1.2 Evaluating Federated﻿​﻿﻿ Learning Workflows Beyond Simulation:​‌﻿﻿ A Deployment-Aware Methodology

8.1.3​​﻿﻿ Supporting SKA data processing﻿​​﻿ workflows with the E2CLab​​​‌ approach to workflow lifecycle﻿﻿﻿‌ management across the continuum﻿‌​‌

8.1.4 Methodology for Automated﻿‌​‌ IoT Experimentation in Controlled﻿​​﻿ Testbeds Prior to Real-World​​​‌ Deployments

8.2​​​‌ Data-Aware Middleware Approaches for﻿﻿﻿‌ the Computing Continuum

8.2.1﻿‌​‌ Multi-level analysis of the﻿​​﻿ I/O pattern of HPC​​​‌ applications

8.2.2 Study​‌﻿﻿ of I/O interference between​​﻿﻿ jobs

8.2.3 Enabling Efficient​​​‌ Runtime Data Analysis to﻿﻿﻿‌ a Crystal Deformation Simulation﻿‌​‌

8.3﻿‌​‌ Sustainable Resource Management for﻿​​﻿ the Computing Continuum

8.3.1​​​‌ Result-Scalability: Following the Evolution﻿﻿﻿‌ of Selected Social Impact﻿‌​‌ of HPC.

8.3.2​‌﻿﻿ Increasing the Lifetime of​​﻿﻿ HPC Machines: Issues, Implications,​​​‌ and Open Challenges

8.3.3 Improving Supercomputer Usage​​﻿﻿ with Aging Awareness.

8.3.4​​​‌ Priority-BF: a Task Manager﻿​﻿﻿ for Priority-Based Scheduling

8.3.5﻿﻿﻿‌ Scheduling multiple task-based applications﻿‌​‌ on distributed heterogeneous computing﻿​​﻿ nodes

8.4 Methodological study﻿​​﻿ over the practice of​​​‌ HPC Research

8.4.1 Implementing a﻿​​﻿ Reproducibility Initiative in HPC:​​​‌ Experiences from SC24.

8.4.2﻿‌​‌ An In-depth Study of﻿​​﻿ LLM Contributions to the​​​‌ Bin Packing Problem

9 Partnerships and cooperations​​​‌

9.1 International initiatives

2025Activity reportProject-Team‌KERDATA

Keywords

Computer Science and Digital‌ Science

Other Research Topics and Application Domains

1 Team members, visitors, external collaborators

Research‌ Scientists

Faculty Member

PhD Students

Interns and Apprentices

2.1 Context: the emergence of‌ the Edge-Cloud-HPC Continuum.

Our objective:‌ Enable the Data Continuum.

2.3 Middleware-level challenges‌

2.4 Resource management challenges

2.5 Approach,‌ methodology, platforms

2.7 Alignment with‌ institutional, national and European‌‌ strategies

3‌ Research program

3.1‌‌ Axis 1: Supporting Data-Centric Applications and Workflows Running‌ Across the Computing Continuum‌

Data-Centric‌ Workflow Composition in the Computing Continuum.

Enabling Reproducibility and Trustworthiness‌ in Complex Workflows Across the Computing Continuum.

Efficient Parallel Continual‌ Learning.

Scalable,‌ Secure and Resource-Efficient Federated‌‌ Learning.

3.2 Axis 2: Data-Aware Middleware Approaches for‌ the Computing Continuum

Workflow I/O Behavior Analysis Methods for Sustainability.

Abstraction for HPC/Cloud Storage‌ Convergence.

Exascale In-Situ Analytics.

Sustainable Interoperability Across‌‌ the Computing Continuum.

3.3 Axis 3: Sustainable Resource‌ Management for the Computing Continuum

Provisioning Storage Resources‌ on Large-Scale Infrastructures.

Storage Disaggregation and Computational Storage.

Frugal Data‌ Storage Architectures to Support Post-Exascale Workflows.

HPC‌ Resource Management Faced with the Environmental Crisis.

4 Application domains‌

4.1 Radio‌ astronomy

4.2 Nuclear‌ Fusion

5 Social and environmental responsibility

5.1 Footprint of‌ research activities

5.2 Impact of research results

Social impact.

6 Highlights of the year

7 Latest software developments,‌ platforms, open data

7.1‌‌ Latest software developments

7.1.1 Damaris

7.1.5 FLAdversary

7.2 Open‌ data

7.2.1 I/O Traces‌‌

8 New‌‌ results

8.1 Supporting Data-Centric Applications and Workflows Running‌ Across the Computing Continuum‌

8.1.1 On the Reproducibility‌‌ Challenges of Federated Learning: Investigating the Gap between‌ Simulation, Emulation and Real-World‌ Deployments

8.1.2 Evaluating Federated Learning Workflows Beyond Simulation:‌ A Deployment-Aware Methodology

8.1.3 Supporting SKA data processing workflows with the E2CLab‌ approach to workflow lifecycle‌ management across the continuum‌‌

8.1.4 Methodology for Automated‌‌ IoT Experimentation in Controlled Testbeds Prior to Real-World‌ Deployments

8.2‌ Data-Aware Middleware Approaches for‌ the Computing Continuum

8.2.1‌‌ Multi-level analysis of the I/O pattern of HPC‌ applications

8.2.2 Study‌ of I/O interference between jobs

8.2.3 Enabling Efficient‌ Runtime Data Analysis to‌ a Crystal Deformation Simulation‌‌

8.3‌‌ Sustainable Resource Management for the Computing Continuum

8.3.1‌ Result-Scalability: Following the Evolution‌ of Selected Social Impact‌‌ of HPC.

8.3.2‌ Increasing the Lifetime of HPC Machines: Issues, Implications,‌ and Open Challenges

8.3.3 Improving Supercomputer Usage with Aging Awareness.

8.3.4‌ Priority-BF: a Task Manager for Priority-Based Scheduling

8.3.5‌ Scheduling multiple task-based applications‌‌ on distributed heterogeneous computing nodes

8.4 Methodological study over the practice of‌ HPC Research

8.4.1 Implementing a Reproducibility Initiative in HPC:‌ Experiences from SC24.

8.4.2‌‌ An In-depth Study of LLM Contributions to the‌ Bin Packing Problem

9 Partnerships and cooperations‌

9.1.1 Associate Teams in the‌ framework of an Inria International Lab or in‌ the framework of an Inria International Program

UNIFY‌ 2

9.2 International research visitors

9.2.1 Visits of‌ international scientists

Swann Perarnau

9.2.2 Visits to international teams

Research‌ visits abroad

Gabriel Antoniu , Jakob Luettgau ,‌ Arthur Jaquard , Robin Boezennec

9.3‌ European initiatives

9.3.1 H2020‌ projects

9.3.2‌ Collaborations with Major European Organizations

Exa-DoST

ECLAT‌

Inria Exploratory program: Repas

10 Dissemination‌