EN FR
EN FR
KERDATA - 2025

2025Activity reportProject-Team​‌KERDATA

RNSR: 200920935W
  • Research​​ center Inria Centre at​​​‌ Rennes University
  • In partnership​ with:Institut national des​‌ sciences appliquées de Rennes​​
  • Team name: Enabling the​​​‌ Edge-Cloud-HPC Data Continuum
  • In​ collaboration with:Institut de​‌ recherche en informatique et​​ systèmes aléatoires (IRISA)

Creation​​​‌ of the Project-Team: 2025​ January 01

Each year,​‌ Inria research teams publish​​ an Activity Report presenting​​​‌ their work and results​ over the reporting period.​‌ These reports follow a​​ common structure, with some​​​‌ optional sections depending on​ the specific team. They​‌ typically begin by outlining​​ the overall objectives and​​​‌ research programme, including the​ main research themes, goals,​‌ and methodological approaches. They​​ also describe the application​​​‌ domains targeted by the​ team, highlighting the scientific​‌ or societal contexts in​​ which their work is​​​‌ situated.

The reports then​ present the highlights of​‌ the year, covering major​​ scientific achievements, software developments,​​​‌ or teaching contributions. When​ relevant, they include sections​‌ on software, platforms, and​​ open data, detailing the​​​‌ tools developed and how​ they are shared. A​‌ substantial part is dedicated​​ to new results, where​​​‌ scientific contributions are described​ in detail, often with​‌ subsections specifying participants and​​ associated keywords.

Finally, the​​​‌ Activity Report addresses funding,​ contracts, partnerships, and collaborations​‌ at various levels, from​​ industrial agreements to international​​​‌ cooperations. It also covers​ dissemination and teaching activities,​‌ such as participation in​​ scientific events, outreach, and​​​‌ supervision. The document concludes​ with a presentation of​‌ scientific production, including major​​ publications and those produced​​​‌ during the year.

Keywords​

Computer Science and Digital​‌ Science

  • A1.1.1. Multicore, Manycore​​
  • A1.1.4. High performance computing​​​‌
  • A1.1.5. Exascale
  • A1.1.9. Fault​ tolerant systems
  • A1.3. Distributed​‌ Systems
  • A1.3.5. Cloud
  • A1.3.6.​​ Fog, Edge
  • A2.6.2. Middleware​​​‌
  • A3.1.2. Data management, quering​ and storage
  • A3.1.3. Distributed​‌ data
  • A3.1.8. Big data​​ (production, storage, transfer)
  • A6.2.7.​​​‌ HPC for machine learning​
  • A6.3. Computation-data interaction
  • A7.1.1.​‌ Distributed algorithms
  • A9.2. Machine​​ learning
  • A9.7. AI algorithmics​​​‌

Other Research Topics and​ Application Domains

  • B3.2. Climate​‌ and meteorology
  • B3.3.1. Earth​​ and subsoil
  • B8.2. Connected​​​‌ city
  • B9.5.6. Data science​
  • B9.8. Reproducibility
  • B9.11.1. Environmental​‌ risks

1 Team members,​​ visitors, external collaborators

Research​​​‌ Scientists

  • Gabriel Antoniu [​Team leader, INRIA​‌, Senior Researcher,​​ HDR]
  • Silvina Caino​​​‌ Lores [INRIA,​ ISFP]
  • Jakob Luettgau​‌ [INRIA, Researcher​​, from Oct 2025​​]
  • Jakob Luettgau [​​​‌INRIA, Starting Research‌ Position, until Sep‌​‌ 2025]
  • Guillaume Pallez​​ [INRIA, Researcher​​​‌, HDR]
  • François‌ Tessier [INRIA,‌​‌ ISFP]

Faculty Member​​

  • Alexandru Costan [INSA​​​‌ RENNES, Associate Professor‌, until Sep 2025‌​‌, HDR]

PhD​​ Students

  • Robin Boezennec [​​​‌INRIA]
  • Arthur Jaquard‌ [INRIA]
  • Theo‌​‌ Jolivel [INRIA]​​
  • Cedric Prigent [INRIA​​​‌, until Feb 2025‌]
  • Simon Renard [‌​‌INRIA, from Oct​​ 2025]
  • Alix Tremodeux​​​‌ [UNIV RENNES,‌ from Sep 2025]‌​‌
  • Mathis Valli [INRIA​​]

Technical Staff

  • Thomas​​​‌ Badts [INRIA,‌ Engineer]
  • Julien Monniot‌​‌ [INRIA, Engineer​​, until May 2025​​​‌]
  • Jean Etienne Ndamlabin‌ Mboula [INRIA,‌​‌ Engineer]

Interns and​​ Apprentices

  • Remy Chiv [​​​‌INRIA, Intern,‌ from May 2025 until‌​‌ Oct 2025]
  • Alix​​ Tremodeux [ENS DE​​​‌ LYON, Intern,‌ until Feb 2025]‌​‌

Administrative Assistants

  • Laurence Dinh​​ [INRIA]
  • Armelle​​​‌ Mozziconacci [CNRS]‌
  • Gunther Tessier [INRIA‌​‌]

Visiting Scientists

  • Elias​​ Del Pozo Punal [​​​‌UNIV CARLOS III,‌ from Mar 2025 until‌​‌ Jul 2025]
  • Tomasz​​ Kanas [UNIV VARSOVIE​​​‌, from Sep 2025‌ until Nov 2025]‌​‌

2 Overall objectives

2.1​​ Context: the emergence of​​​‌ the Edge-Cloud-HPC Continuum.

As‌ witnessed in industry and‌​‌ science and highlighted in​​ strategic documents such as​​​‌ the European ETP4HPC Strategic‌ Research Agenda 90,‌​‌ there is a clear​​ trend to combine numerical​​​‌ computations, large-scale data analytics‌ and AI techniques to‌​‌ improve the results and​​ efficiency of traditional HPC​​​‌ applications, and to advance‌ new applications in fields‌​‌ such as autonomous vehicles,​​ digital twins, smart buildings/towns,​​​‌ etc. A typical scenario‌ consists in Edge devices‌​‌ creating streams of input​​ data, which are processed​​​‌ by data analytics and‌ machine learning applications in‌​‌ the Cloud; alternatively (or​​ in parallel!) they can​​​‌ feed simulations on large,‌ specialised HPC systems, to‌​‌ provide insights and help​​ for prediction of some​​​‌ future system state. Such‌ emerging applications typically need‌​‌ to be implemented as​​ complex workflows and require​​​‌ the coordinated use of‌ supercomputers, Cloud data centres‌​‌ and Edge-processing devices. This​​ assembly is called the​​​‌ Computing Continuum (CC). It‌ raises challenges at multiple‌​‌ levels: at the application/workflow​​ level, to bridge simulations,​​​‌ machine learning and data-driven‌ analytics; at the middleware‌​‌ level, adequate tools must​​ enable efficient deployment and​​​‌ orchestration of the workflow‌ components across the whole‌​‌ distributed infrastructure; and, finally,​​ a capable resource management​​​‌ system must allocate a‌ suitable set of components‌​‌ of the infrastructure to​​ run the application workflow,​​​‌ preferably in a dynamic‌ and adaptive way, taking‌​‌ into account the specific​​ capabilities of each component​​​‌ of the underlying heterogeneous‌ infrastructure.

While each level‌​‌ exhibits specific associated challenges,​​ there are also common,​​​‌ cross-layer concerns, among which‌ we specifically highlight two.‌​‌ The first cross-layer concern​​ regards sustainability, understood​​​‌ as an optimization goal‌ encompassing energy efficiency and‌​‌ the reduction of the​​​‌ environmental impact. The second​ cross-layer concern is related​‌ to the rapid development​​ of AI-related workflows, which​​​‌ creates specific needs at​ multiple levels.

Our objective:​‌ Enable the Data Continuum.​​

Our research project aims​​​‌ to address some open​ challenges at each of​‌ the aforementioned three levels,​​ while considering the two​​​‌ aforementioned transverse concerns. We​ specifically focus on data-related​‌ challenges posed by the​​ requirements (storage, processing, analytics)​​​‌ of complex workflows executed​ on the Edge-Cloud-HPC continuum​‌ and propose innovative algorithms​​ and software architecture solutions​​​‌ towards a Data Continuum.​

2.2 Application/workflow-level challenges

In​‌ the current state, multitudes​​ of software development stacks​​​‌ are tailored to specific​ use cases, with no​‌ guarantee of interoperability between​​ them. This greatly impedes​​​‌ application software development for​ integrated CC use cases.​‌ Moreover, specific software stacks​​ have been developed for​​​‌ HPC (e.g., based on​ optimized MPI libraries able​‌ to leverage high-end network​​ interconnects), data analytics (e.g.,​​​‌ based on Spark, designed​ for commodity clusters available​‌ in cloud datacenters) and​​ AI (e.g., TensorFlow or​​​‌ PyTorch), with different requirements​ for their initial target​‌ execution infrastructures. Components based​​ on such software stacks​​​‌ cannot be integrated efficiently​ together to support CC​‌ workflows, as their assumptions​​ about the underlying infrastructure​​​‌ are different. Programming a​ complex, hybrid workflow at​‌ the highest level requires​​ the ability to consistently​​​‌ combine such workflow components​ in a unified framework.​‌ This requires flexible programming​​ models and supporting environments,​​​‌ which also safeguard performance​ and energy efficiency. Composability​‌ (the ability to combine​​ multiple programming models or​​​‌ software stacks for a​ single application with defined​‌ rules) and reproducibility of​​ workflow execution will be​​​‌ very valuable in this​ context.

2.3 Middleware-level challenges​‌

Similarly, compatibility and interoperability​​ across all parts of​​​‌ a CC infrastructure must​ be assured; in particular,​‌ this includes data formats,​​ storage abstractions, communication, data​​​‌ processing and data analysis​ paradigms. It first requires​‌ a deep understanding of​​ the I/O behaviour of​​​‌ the distributed workflows. As​ an illustrative example, upcoming​‌ Exascale HPC workflows deployed​​ on supercomputers as part​​​‌ of the continuum will​ continue to highlight the​‌ lack of infrastructures and​​ methodologies to store and​​​‌ analyze the huge results​ of running simulations -​‌ should this storage or​​ analysis be performed on​​​‌ HPC systems or on​ cloud-based infrastructures. This can​‌ limit the scalability potential​​ and lead to sub-optimal​​​‌ usage of the computing​ infrastructures. As in some​‌ cases storing all data​​ (originated from sensors or​​​‌ generated by simulations) may​ be infeasible, thus new​‌ scalable approaches are needed.​​ The goal is to​​​‌ enable processing and analysis​ of such massive outputs​‌ of data on various​​ parts of the continuum​​​‌ infrastructure during and after​ the HPC simulations through​‌ asynchronous I/O and in-situ​​ or in-transit processing inside​​​‌ or outside the HPC​ system, thus avoiding storage.​‌

2.4 Resource management challenges​​

Large-scale heterogeneity must be​​​‌ managed in an effective​ and efficient way. This​‌ again cuts across compute,​​ storage and communication systems,​​​‌ and the scheduling/orchestration has​ to optimize the mapping​‌ of workflows onto the​​ CC resources with regard​​ to performance and energy​​​‌ use. A challenge here‌ is to enable the‌​‌ design of adequate data​​ storage architectures coping in​​​‌ particular with capacity-related or‌ energy-related constraints that may‌​‌ diversely concern certain parts​​ of the continuum (Edge,​​​‌ but also energy-bound supercomputers‌ at the post-Exascale age,‌​‌ where sustainability is a​​ primary consideration).

2.5 Approach,​​​‌ methodology, platforms

KerData's global‌ approach consists in modelling,‌​‌ designing, implementing and evaluating​​ distributed algorithms and software​​​‌ architectures to address some‌ of the data-related challenges‌​‌ described above. A specific​​ description of the research​​​‌ questions we address is‌ provided in the next‌​‌ section. We will generally​​ focus on hybrid infrastructures​​​‌ (Edge/Cloud/supercomputers), although some of‌ our research may not‌​‌ span across the complete​​ spectrum of the continuum.​​​‌

Our research balances theoretical‌ modelling (thanks to the‌​‌ recent arrival of Guillaume​​ Pallez) with a predominantly​​​‌ experimental validation methodology (traditionally‌ carried out by most‌​‌ team members as part​​ of the former KerData​​​‌ team). Overall, to validate‌ our proposed algorithms and‌​‌ architectures, we build software​​ prototypes, then validate them​​​‌ at large scale on‌ real testbeds and experimental‌​‌ platforms.

We will strongly​​ rely on the Grid'5000/SLICES​​​‌ FR platform. Moreover, thanks‌ to our projects and‌​‌ partnerships (in particular in​​ EuroHPC projets building pre-Exascale​​​‌ platforms, such as ACROSS‌ and EUPEX), we have‌​‌ access to reference supercomputer​​ testbeds, such as the​​​‌ Karolina1 and Irene‌ (CEA). More importantly, the‌​‌ team is leading Exa-DoST​​ (2023-2029), the project of​​​‌ the NumPEx program focused‌ on data-related challenges for‌​‌ the Exascale, as part​​ of a national effort​​​‌ to design and build‌ the software infrastructure for‌​‌ Jules Verne, the first​​ Exascale machine to be​​​‌ installed in France. All‌ these are excellent opportunities‌​‌ to validate our results​​ on advanced realistic platforms.​​​‌

2.6 Collaboration strategy

We‌ chose to work in‌​‌ close collaboration with some​​ of the leading international​​​‌ teams in the areas‌ of data management for‌​‌ Edge, Clouds and HPC​​ systems in Academia. As​​​‌ an example, we have‌ been building and maintaining‌​‌ a long-term, privileged partnership​​ with Argonne National Laboratory​​​‌ (USA), a top player‌ in USA HPC research‌​‌ field, through a series​​ of Associate Team projects​​​‌ (Data@Exascale, UNIFY, UNIFY 2)‌ in the framework of‌​‌ the JLESC international laboratory.​​ More recently we initiated​​​‌ collaborations including Oak Ridge‌ National Laboratory (USA) -‌​‌ where the most powerful​​ supercomputer available today (Frontier)​​​‌ is running; we also‌ collaborate with DFKI (Germany),‌​‌ a strategic Inria partner​​ in the AI area.​​​‌ In industry, formal collaborations‌ are currently in place‌​‌ with ATOS/Eviden, a strategic​​ HPC stakeholder in France​​​‌ and DataDirectNetworks (DDN), a‌ major storage company, in‌​‌ the context of national​​ (PEPR) and European collaborative​​​‌ projects.

2.7 Alignment with‌ institutional, national and European‌​‌ strategies

Data-intensive applications exhibit​​ several common requirements with​​​‌ respect to the need‌ for data storage and‌​‌ I/O processing at very​​ large scales, to support​​​‌ complex workflows combining scientific‌ simulation and data analytics.‌​‌ While our past activity​​ was already aligned with​​​‌ Inria's strategic objectives 62‌, which acknowledged HPC-Big‌​‌ Data convergence as one​​​‌ of the priorities of​ our institute, our project​‌ for the future goes​​ beyond. It explicitly leverages​​​‌ the challenges identified in​ the latest edition of​‌ the ETP4HPC agenda 90​​, which highlights the​​​‌ evolution of HPC from​ a traditional supercomputer-centric vision​‌ to an enlarged vision​​ where complex workflows are​​​‌ distributed across interconnected supercomputers,​ Clouds and Edge infrastructures.​‌ Our research program is​​ addressing some of these​​​‌ challenges. In addition, at​ the national level, our​‌ team is leading two​​ strategic PEPR projects whose​​​‌ respective scientific programs have​ been defined based on​‌ this continuum-aware vision. The​​ first one is the​​​‌ Exa-DoST project (Exascale Data-Oriented​ Software and Tools), a​‌ 6.2 M€ project within​​ the NumPEx PEPR program​​​‌ (2023-2029), which aims to​ provide the software infrastructure​‌ for the future Exascale​​ supercomputer expected to be​​​‌ installed in France in​ 2025 (Jules Vernes). The​‌ second one is STEEL,​​ a 2.8 M€ project​​​‌ (Secure and efficient daTa​ storagE and procEssing on​‌ cLoud-based infrastructures) within the​​ CLOUD PEPR program (2023-2030).​​​‌ These projects (defined for​ 7+ years) are structuring​‌ many of our long-term​​ activities.

In addition, some​​​‌ of our concrete collaborative​ projects involve some of​‌ Inria's main strategic partners:​​ DFKI (the main German​​​‌ research center in artificial​ intelligence) through the ENGAGE​‌ Inria-DFKI project started in​​ 2022; and ATOS/Eviden, through​​​‌ the ACROSS and EUPEX​ H2020 EuroHPC projects.

3​‌ Research program

Figure 1

Overview of​​ the research program.

Figure​​​‌ 1: Overview of​ the research program.

The​‌ emergence of the Computing​​ Continuum raises challenges at​​​‌ multiple levels: at the​ application/workflow level, at the​‌ middleware level and at​​ the resource management level.​​​‌ We structured our research​ program accordingly, in three​‌ axes.

The first axis​​ covers workflow-level/application-level research directions.​​​‌ It addresses questions like:​ how to enable workflow​‌ composition across the continuum?​​ How to ensure the​​​‌ reproducibility of workflow execution?​ How to leverage different​‌ sources of metadata to​​ establish a provenance chain​​​‌ (i.e., a record trail​ of the overall state​‌ of the application and​​ its intermediate results) that​​​‌ builds trust on the​ workflow's results? How could​‌ data models support data​​ volume and transfer reduction​​​‌ as a step towards​ resource sustainability of applications​‌ in the continuum? It​​ also includes some more​​​‌ specific research directions related​ to the execution of​‌ distributed AI workflows across​​ the Computing Continuum (involving​​​‌ parallel learning and federated​ learning).

The second axis​‌ addresses research challenges related​​ to middleware-level data management​​​‌ across the continuum, where​ workflows combining simulation, data​‌ analytics and AI are​​ being deployed. In particular,​​​‌ this axis plans to​ cover topics such as​‌ I/O behaviour characterization, storage-centric​​ hybrid infrastructure convergence, and​​​‌ data interoperability across hybrid​ HPC/Cloud/Edge infrastructures. It also​‌ addresses the question: how​​ to perform in-situ data​​​‌ analysis for post-Exascale workflows​ processing continuous data flows,​‌ while considering both performance​​ and energy efficiency?

Figure 2

Storage​​​‌ heterogeneity across the Computing​ Continuum.

Figure 2:​‌ Storage heterogeneity across the​​ Computing Continuum.

The third​​​‌ (lower-level) axis focuses on​ resource management, with a​‌ strong focus on storage​​ resources, but not only.​​ It addresses questions such​​​‌ as: how to provision‌ heterogeneous storage resources across‌​‌ hybrid HPC/Cloud/Edge infrastructures (Figure​​ 2)? What would be​​​‌ a frugal data storage‌ architecture enabling the transition‌​‌ to post-Exascale workflows? How​​ to leverage emerging storage​​​‌ approaches such as disaggregated‌ storage (i.e. a set‌​‌ of storage units physically​​ separated from the compute​​​‌ units) and computational storage‌ (i.e. storage units augmented‌​‌ with some limited integrated​​ computational capabilities)? Finally: how​​​‌ can resource managers and‌ HPC transform/evolve to better‌​‌ adapt to climate change?​​

We identified two transverse​​​‌ (vertical) themes that are‌ present in some of‌​‌ the research topics of​​ the three (horizontal) axes:​​​‌ artificial intelligence (as a‌ target type of workflow‌​‌ to be supported, but​​ also as an enabling​​​‌ technique) and sustainability (including‌ aspects related to energy‌​‌ efficiency, frugality and adaptation​​ to emerging applications and​​​‌ hardware technologies in response‌ to climate change).

3.1‌​‌ Axis 1: Supporting Data-Centric​​ Applications and Workflows Running​​​‌ Across the Computing Continuum‌

Today, there is a‌​‌ need to efficiently integrate​​ simulations, data analytics and​​​‌ learning, which requires interoperable‌ solutions for data processing‌​‌ 90. As an​​ example, upcoming large-scale scientific​​​‌ experiments like the Square‌ Kilometer Array (SKA) 2‌​‌ are expected to process​​ raw data in the​​​‌ order of an exabyte‌ per day 3.‌​‌ Processing these data volumes​​ requires complex scientific workflows​​​‌ able to extract knowledge‌ and produce insight at‌​‌ every stage: from the​​ instruments and devices producing​​​‌ data that needs to‌ be reduced and pre-processed‌​‌ in situ, to the​​ service-oriented visualization and exploration​​​‌ dashboards that need to‌ be customized for the‌​‌ use case of the​​ domain scientist. Existing works​​​‌ on workflow composition and‌ deployment in the continuum‌​‌ focus on task-flow control​​ and are disconnected from​​​‌ data patterns and structures‌ beyond domain-specific applications 40‌​‌, 31. Moreover,​​ general approaches for representing​​​‌ knowledge and provenance in‌ the form of metadata‌​‌ are also lacking for​​ such workflows.

As different​​​‌ communities leverage the Computing‌ Continuum, they express the‌​‌ need to make their​​ research verifiable by others.​​​‌ This is exacerbated by‌ the pervasive usage of‌​‌ AI, as there is​​ increasing awareness about potential​​​‌ ethical and practical implications‌ 105. The explainability‌​‌ (i.e., making AI's decision-making​​ process understandable) and transparency​​​‌ of AI (i.e., ensuring‌ clarity in AI's design,‌​‌ data and operation) are​​ particularly concerning 30.​​​‌ Advancing explainability and transparency‌ in AI is currently‌​‌ an essential priority for​​ responsible and trustworthy AI-powered​​​‌ applications. This requires advances‌ in repeatability, replicability, and‌​‌ reproducibility (3R’s) accross the​​ Computing Continuum 41,​​​‌ 111.

As AI-oriented‌ workflows tend to gain‌​‌ an increasing share, it​​ becomes important to address​​​‌ the performance and scalability‌ of machine learning (ML)‌​‌ distributed algorithms executed across​​ the Computing Continuum. Methods​​​‌ like deep learning (DL)‌ and federated learning (FL)‌​‌ leverage different technologies to​​ produce insight from large​​​‌ volumes of data. Despite‌ increasing convergence between DL‌​‌ and HPC 76,​​ 44, the training​​​‌ of DL models remains‌ time-consuming and resource-intensive. In‌​‌ FL, powerful facilities (Cloud​​​‌ or HPC) are used​ to train a global​‌ model, while the local,​​ personalized training is typically​​​‌ done close to the​ data production sites on​‌ less powerful computational resources​​ (Edge). This yields the​​​‌ challenge of managing heterogeneity​ (e.g., differences in computation​‌ capacity, network latency and​​ node volatility) as well​​​‌ as variability in data​ distributions among clients, while​‌ respecting the privacy requirements​​ of potentially malicious devices.​​​‌

In summary, this axis​ covers research directions to​‌ support the composition of​​ scalable and reproducible workflows​​​‌ comprising diverse applications –from​ simulations to AI–, while​‌ addressing the challenges of​​ an heterogeneous environment.

Data-Centric​​​‌ Workflow Composition in the​ Computing Continuum.

The scientific​‌ community has reached consensus​​ that common interfaces for​​​‌ data management in the​ continuum are necessary 70​‌. Unified data abstractions​​ can enable the interoperability​​​‌ of data storage and​ processing across the continuum​‌ and facilitate data analytics​​ at all levels 53​​​‌, alleviating the disconnect​ between application- and storage-oriented​‌ approaches to interoperability. However,​​ no unified data modeling​​​‌ approaches exist for structuring​ and representing data on​‌ a logical level across​​ the computing continuum.

The​​​‌ first steps in this​ research direction involve establishing​‌ what are the essential​​ attributes needed to represent​​​‌ data in the different​ programming models coexisting in​‌ the continuum (e.g., ML​​ models, simulation data, annotations​​​‌ resulting from analysis). We​ will systematically categorize these​‌ attributes to deliver data​​ abstractions and models that​​​‌ can be specialized for​ different tasks. In addition,​‌ we will investigate how​​ to embed metadata in​​​‌ these abstractions so that​ future work can explore​‌ new ways of describing,​​ processing, and tracking data​​​‌ at the workflow level,​ which aligns with the​‌ topics of workflow instrumentation​​ and reproducibility. On a​​​‌ longer term, we will​ also study how these​‌ data models could support​​ data volume and transfer​​​‌ reduction as a step​ towards resource sustainability of​‌ applications in the continuum.​​

Enabling Reproducibility and Trustworthiness​​​‌ in Complex Workflows Across​ the Computing Continuum.

Current​‌ approaches to support workflow​​ reproducibility are based on​​​‌ workflow modelling 104 or​ simulation 26, 112​‌. These approaches raise​​ some important challenges in​​​‌ terms of specification, modelling,​ and validation to support​‌ reproducibility in the Computing​​ Continuum. For example, it​​​‌ is increasingly difficult to​ model the heterogeneity and​‌ volatility of Edge devices​​ or to assess the​​​‌ impact of the inherent​ complexity of hybrid Edge-Cloud​‌ deployments on performance. With​​ the rise of AI​​​‌ workflows, the issue of​ reproducibility is aggravated by​‌ the limitations to our​​ ability to reason about​​​‌ the decision-making process of​ many machine learning models​‌ that act as a​​ black box 33,​​​‌ and the lack of​ comprehensive specifications to the​‌ data that needs to​​ be collected to establish​​​‌ a provenance chain (i.e.,​ a record trail of​‌ the overall state of​​ the application and its​​​‌ intermediate results) that builds​ trust on the workflow's​‌ results 46.

We​​ aim to tackle these​​​‌ challenges through a rigorous​ methodology supporting the automatic​‌ deployment, the complete analysis​​ cycle and the optimization​​ of applications on the​​​‌ Computing Continuum. We started‌ to implement this methodology‌​‌ in the E2Clab 102​​ software tool for workflow​​​‌ lifecycle management across the‌ Continuum. For the short‌​‌ term, in the framework​​ of the STEEL project​​​‌ of the PEPR Cloud,‌ we plan to investigate‌​‌ how to further enrich​​ both the methodology and​​​‌ the tool, in order‌ to support the next‌​‌ generation of scientific testbeds​​ (e.g., SLICES-FR) and​​​‌ the non-trivial reproducibility of‌ ML and DL workflows.‌​‌ This is a particularly​​ challenging direction due to​​​‌ the increased degree of‌ randomness (i.e., in terms‌​‌ of initial parameters and​​ hyperparameters settings) incurred by​​​‌ such applications. We will‌ expand E2Clab to capture‌​‌ provenance metadata during the​​ execution of AI workflows,​​​‌ which includes a detailed‌ record of data sources,‌​‌ processing steps, model configurations,​​ and computational resources utilized.​​​‌ This provenance metadata can‌ be leveraged not only‌​‌ to ensure transparency and​​ traceability throughout the AI​​​‌ lifecycle, but also to‌ conduct resource, energy and‌​‌ performance optimizations. For the​​ longer term we will​​​‌ work towards the definition‌ of ontologies and taxonomies‌​‌ for AI workflow provenance​​ data to build a​​​‌ theoretical foundation for developing‌ provenance data management systems‌​‌ tailored for the different​​ stakeholders involved in AI​​​‌ applications.

Efficient Parallel Continual‌ Learning.

Some scenarios of‌​‌ DL training involve the​​ need to assimilate new​​​‌ training data arriving continuously.‌ This kind of incremental‌​‌ training suffers from catastrophic​​ forgetting (i.e., new patterns​​​‌ are reinforced at the‌ expense of previously acquired‌​‌ knowledge). Training from scratch​​ each time new training​​​‌ data becomes available would‌ result in extremely long‌​‌ training times and massive​​ data accumulation. Rehearsal-based continual​​​‌ learning mixes samples from‌ previous training tasks with‌​‌ samples from new training​​ tasks to alleviate catastrophic​​​‌ forgetting, but research to‌ date has not addressed‌​‌ performance and scalability of​​ these methods 39,​​​‌ 101, 54,‌ 59.

We propose‌​‌ asynchronous data management techniques​​ that enable the design​​​‌ and implementation of a‌ scalable distributed rehearsal buffer‌​‌ abstraction, which is instrumental​​ in enabling continual learning​​​‌ to take advantage of‌ data-parallel techniques. So far,‌​‌ this solution was validated​​ for class-incremental classification problems.​​​‌ The approach could however‌ be easily applied to‌​‌ generative models (in which​​ case we can simply​​​‌ use one class to‌ store all representatives). This‌​‌ is a short-term research​​ direction we intend to​​​‌ explore in the context‌ of our new research‌​‌ project. For the longer​​ term, we plan to​​​‌ further explore other parallelization‌ approaches (model-based and hybrid)‌​‌ to address the challenges​​ posed by evolving datasets​​​‌ in DL models.

Scalable,‌ Secure and Resource-Efficient Federated‌​‌ Learning.

FL aims to​​ achieve an accuracy close​​​‌ to the one achieved‌ by centralized models but‌​‌ in a scalable and​​ resource-efficient manner. Simultaneously, FL​​​‌ is subject to security‌ threats coming from the‌​‌ edge of the network​​ since malicious peers may​​​‌ attempt to manipulate the‌ learning process, compromise the‌​‌ privacy of other peers,​​ or disrupt the training​​​‌ altogether. Clustered FL –grouping‌ clients with similar data‌​‌ distributions and by training​​​‌ personalized models in each​ identified cluster– is a​‌ mechanism to support low​​ resource utilization, but existing​​​‌ approaches 72, 51​ mainly focus on the​‌ achieved accuracy of the​​ clustering mechanisms, overlooking system​​​‌ and infrastructure resource constraints​ like energy consumption. At​‌ the same time, current​​ threat mitigation approaches 47​​​‌, 116, 36​, 84 rely on​‌ robust aggregation, anomaly detection​​ and generative models for​​​‌ defending against poisoning attacks.​ Yet, they either have​‌ limited defensive capabilities due​​ to their underlying design​​​‌ or are impractical to​ use as they rely​‌ on constraining building blocks.​​

For the short term,​​​‌ we plan to explore​ approaches to scalable, secure​‌ and resource-efficient FL considering​​ for the first time​​​‌ the device heterogeneity, the​ training accuracy and the​‌ robustness against malicious activity​​ simultaneously. A first direction​​​‌ consists in devising resource-constrained​ clustering algorithms, specifically tailored​‌ for FL executed at​​ the edge. The goal​​​‌ is to enable transparent​ adaptation to the execution​‌ environment (e.g., node volatility,​​ malicious attacks, network congestion)​​​‌ by automatically tuning the​ FL parameters in order​‌ to improve user-defined performance​​ metrics (e.g., energy efficiency,​​​‌ execution time, accuracy). Generative​ model based approaches have​‌ been gaining increasing interest,​​ and are shown to​​​‌ be more resilient against​ a wider range of​‌ attacks. In this context,​​ we will continue to​​​‌ extend FedGuard 60,​ our novel FL framework​‌ that utilizes the generative​​ capabilities of Conditional Variational​​​‌ AutoEncoders (CVAE) to effectively​ defend against poisoning attacks​‌ with tuneable overhead in​​ communication and computation. We​​​‌ plan to enhance the​ robustness of this approach​‌ through new aggregation operators​​ and under different levels​​​‌ of dataset imbalance, including​ highly imbalanced datasets with​‌ very few samples per​​ client. For the longer​​​‌ term, a more challenging​ direction that we plan​‌ to explore is how​​ FedGuard and other strategies​​​‌ perform in a setup​ where clients get access​‌ to a stream of​​ incoming data (i.e., dynamic​​​‌ datasets).

3.2 Axis 2:​ Data-Aware Middleware Approaches for​‌ the Computing Continuum

Supporting​​ emerging scenarios over the​​​‌ domains of “modelling and​ simulation”, “AI”, “Analytics” and​‌ “Internet of things” (IoT)​​ across the Computing Continuum​​​‌ leads to new data​ movement challenges. This is​‌ due in part to​​ the variety of storage​​​‌ systems and to the​ increasing gaps between the​‌ processing paradigms that developed​​ separately in these environments.​​​‌ In this context, it​ is necessary to explore​‌ the different ways in​​ which storage models can​​​‌ converge, notably through a​ thorough understanding of workloads​‌ on the one hand,​​ and efficient data-aware middleware​​​‌ on the other. Within​ KerData, we propose to​‌ address this problem from​​ multiple complementary perspectives.

Firstly,​​​‌ we will study the​ I/O behavior of scientific​‌ workloads running across the​​ Computing Continuum. Understanding what,​​​‌ when, where, and how​ I/O-intensive applications read or​‌ write data is decisive​​ for making the right​​​‌ decisions, especially when it​ comes to scheduling. Optimization​‌ goals need to consider​​ both performance and energy​​​‌ efficiency and potential necessary​ trade-offs between both. We​‌ will then study I/O​​ optimization techniques for extreme-scale​​ workloads. Although a lot​​​‌ of research has been‌ produced on this subject,‌​‌ the expansion to very​​ large scale as in​​​‌ the case of HPC‌ systems raises new challenges‌​‌ that must be addressed​​ to accelerate the time​​​‌ to solution while maintaining‌ sustainability. Next, we will‌​‌ look at the abstraction​​ of distributed storage resources​​​‌ on hybrid infrastructures as‌ a first step towards‌​‌ general-purpose middleware solutions that​​ are necessary to interoperate​​​‌ the storage resources in‌ the continuum. Finally, building‌​‌ on the above, we​​ will propose a data​​​‌ exchange layer that can‌ interoperate with the various‌​‌ platforms on the Computing​​ Continuum. This layer will​​​‌ be central to the‌ composition of hybrid workflows.‌​‌

Workflow I/O Behavior Analysis​​ Methods for Sustainability.

Understanding​​​‌ how workflows and applications‌ use staging areas on‌​‌ the Computing Continuum is​​ decisive for improving scheduling​​​‌ algorithms and deploying an‌ optimized I/O software stack‌​‌ 63. This requires​​ first characterizing these applications​​​‌ and workflows from an‌ I/O point of view,‌​‌ i.e. determine through performance​​ evaluation and empirical study​​​‌ a relatively high-level set‌ of characteristics that describes‌​‌ the data access pattern​​ 100, 97.​​​‌ Data collection and analysis‌ will also leverage semi-supervised‌​‌ clustering methods and federated​​ learning techniques from Axis​​​‌ 1. The result of‌ this characterization can then‌​‌ be used to feed​​ job and I/O scheduling​​​‌ algorithms and improve data‌ movement efficiency. The preliminary‌​‌ step in this characterization​​ is the collection of​​​‌ execution traces, from which‌ detailed studies can be‌​‌ carried out 55.​​ In the field of​​​‌ high-performance computing, Darshan109‌ is the reference tool‌​‌ for I/O monitoring.

We​​ will extend our existing​​​‌ work on PyDarshan 88‌, 87, a‌​‌ Python library for querying​​ Darshan log records, and​​​‌ develop new tools and‌ abstractions applicable throughout the‌​‌ computing continuum that generalize​​ "decision support services" that​​​‌ allow the augmentation of‌ workflow execution plans with‌​‌ I/O and energy behavior​​ information which support our​​​‌ resource management and system‌ architecture research. For example,‌​‌ by identifying the most​​ energy-intense operations, candidates for​​​‌ hardware acceleration can be‌ determined. This short-term direction‌​‌ is the first step​​ to provide support for​​​‌ the design of future‌ strategies for I/O scheduling‌​‌ considering performance/energy trade-offs, that​​ we further plan to​​​‌ investigate for the longer‌ term. This line of‌​‌ research also supports Axis​​ 3 on resource management.​​​‌

Abstraction for HPC/Cloud Storage‌ Convergence.

On HPC and‌​‌ Cloud infrastructures, while the​​ number of processing units​​​‌ has grown to meet‌ the computing power requirements‌​‌ of large-scale applications, the​​ I/O capacity as well​​​‌ as the I/O bandwidth‌ per core have drastically‌​‌ decreased. Thus, data management​​ and analytics becoming the​​​‌ critical bottleneck on large-scale‌ systems, vendors have overcome‌​‌ this problem by deploying​​ new tiers of intermediate​​​‌ storage between the applications‌ and the global shared‌​‌ storage system, usually along​​ with a dedicated (and​​​‌ sometimes proprietary) software layer.‌ These new levels of‌​‌ storage hierarchy feature various​​ capacities, characteristics and performance​​​‌ one has to be‌ aware of to fully‌​‌ utilize them 86.​​​‌ This is especially true​ in the context of​‌ complex hybrid workflows such​​ as in-situ analysis, visualization​​​‌ or code-coupling: the unawareness​ of those underlying tiers​‌ is a serious loss​​ of performance 71.​​​‌ An approach focusing on​ storage convergence across HPC​‌ and Cloud infrastructure is​​ decisive to glue this​​​‌ deep hierarchy and to​ make the most of​‌ these new technologies on​​ one side and ensure​​​‌ effective data sharing between​ components running across the​‌ computing continuum 34,​​ 43.

Identifying a​​​‌ good storage abstraction that​ is accurate enough to​‌ properly describe the wide​​ variety of devices and​​​‌ sufficiently general to be​ portable on various systems​‌ is crucial. In that​​ context, we will work​​​‌ on the development of​ a two-stage abstraction layer​‌ above local (system) and​​ remote (distant platform) storage​​​‌ resources. To do so,​ we will follow a​‌ co-design approach, whereby the​​ HPC, Cloud, and Edge-computing​​​‌ architectures would all benefit​ from an infrastructure-wide level​‌ of abstraction. This work​​ will be a continuation​​​‌ of the research undertaken​ on data aggregation 115​‌, 114, for​​ which an abstraction of​​​‌ network topology and memory​ and storage levels was​‌ necessary for the algorithm's​​ portability, and will also​​​‌ build on existing work​ in the community on​‌ resource abstraction 52,​​ 113. In the​​​‌ longer term, we plan​ to extend this library,​‌ which focuses on physical​​ resources, with a logical​​​‌ layer. More concretely, we​ will build on top​‌ of that library a​​ tier-to-tier data transfer layer​​​‌ enabling compatibility between several​ storage paradigms (block, file,​‌ object).

Exascale In-Situ Analytics.​​

Without a major change​​​‌ in practices, the increased​ computing capacity of the​‌ next generation of computers​​ will lead to an​​​‌ explosion in the volume​ of data produced by​‌ numerical simulations. Managing this​​ data, from production to​​​‌ analysis, is a major​ challenge. While it is​‌ not conceivable to do​​ without a storage system,​​​‌ many experiments are aimed​ at reducing its use.​‌ Thus emerged approaches leveraging​​ in-situ processing, in-transit​​​‌ processing, staging nodes​, helper cores 45​‌, 66. All​​ of these approaches aim​​​‌ to replace the usual​ write-read process by a​‌ means to perform analysis​​ at the same time​​​‌ as simulation, a​ capability of particular interest​‌ to physicists. This need​​ has led to the​​​‌ first implementations of in-situ​ or in-transit analysis systems​‌ in simulation codes, and​​ to the creation of​​​‌ specific middleware for asynchronous,​ scalable post-Exascale systems, such​‌ as Damaris CITATION NOT​​ FOUND: dorier:hal-00715252, 67​​​‌, 49.

Developed​ by the KerData team​‌ since 2011, Damaris is​​ the team's flagship software​​​‌ for Exascale HPC. Damaris​ 4 proposes a middleware-level​‌ approach to scalable asynchronous​​ I/O management and real-time​​​‌ in situ processing of​ data from large-scale MPI-based​‌ HPC simulations. It leverages​​ the idea of dedicated​​​‌ cores for such tasks​ performed asynchronously within multicore​‌ nodes. Initial feedback from​​ application users clearly shows​​​‌ the need to design​ a system that can​‌ dynamically trigger the activation​​ of new analyses during​​ the simulation run. The​​​‌ timing can be decided‌ either by the simulation‌​‌ code or by an​​ analysis. To maintain high​​​‌ performance results, it is‌ also essential to appropriately‌​‌ leverage the possibility to​​ place analysis tasks on​​​‌ GPUs. These are challenges‌ we plan to address‌​‌ by extending our previous​​ work based on the​​​‌ Damaris approach, in the‌ context of the Exa-DoST‌​‌ project of the NumPEx​​ PEPR (2023-2030), to support​​​‌ the needs of Exascale‌ workloads. In particular, two‌​‌ applications are targeted: SKA​​ 65 in collaboration with​​​‌ the CNRS, Observatoire de‌ Paris and Observatoire de‌​‌ la Côte d'Azur and​​ Gysela 74, in​​​‌ collaboration with the CEA.‌ For the longer term,‌​‌ we expect additional application​​ requirements to emerge during​​​‌ the execution of the‌ NumPEx PEPR program (in‌​‌ particular, in collaboration with​​ the NumPEx Exa-DI project​​​‌ 5, which has‌ set up a dedicated‌​‌ process to support new​​ applications, not identified yet).​​​‌ We plan to contribute‌ to the support of‌​‌ such applications that could​​ exhibit new patterns with​​​‌ respect to in situ‌ analysis.

Sustainable Interoperability Across‌​‌ the Computing Continuum.

New​​ endeavors towards interoperability in​​​‌ the continuum are addressing‌ the need for common‌​‌ data spaces through federated​​ data infrastructures in the​​​‌ cloud (e.g., Gaia-X 50‌, European Open Science‌​‌ Cloud (EOSC) 37,​​ German National Research Data​​​‌ Infrastructure (NFDI) 75)‌ and converged research infrastructures‌​‌ for leadership-class supercomputing and​​ cloud resources (e.g., FENIX​​​‌ 6, EuroHPC 107‌, PRACE-RI 85,‌​‌ European Grid Initiative (EGI)​​ 80). For the​​​‌ long term, key players‌ in the public and‌​‌ private sectors are making​​ strong investments in long-term​​​‌ strategic decisions about how‌ the computing continuum will‌​‌ develop. Specifically, quantum computing​​ is receiving massive support,​​​‌ and one of the‌ mandates of the EuroHPC‌​‌ JU is to acquire​​ and deploy quantum technologies​​​‌ in HPC environments once‌ they reach sufficient maturity‌​‌ 7. Furthermore, other​​ new technologies like neuromorphic​​​‌ accelerators 56 will increase‌ the heterogeneity in future‌​‌ HPC systems. These non-conventional​​ architectures can also be​​​‌ found in highly energy-efficient‌ IoT devices 95,‌​‌ 108, fast scientific​​ instruments for large-scale science​​​‌ 110, and new‌ approaches for fast and‌​‌ efficient artificial intelligence in​​ cloud and HPC environments​​​‌ 99, 89,‌ 96. Current works‌​‌ on the integration of​​ emerging technologies into existing​​​‌ computing ecosystems focus on‌ the interoperability and performance‌​‌ of algorithms without considering​​ data-oriented optimizations, and workflow-specific​​​‌ challenges such as task-resource‌ mapping, automation, and provenance‌​‌ are rarely explored 64​​, 98.

Overall,​​​‌ the successful interoperability between‌ the existing and projected‌​‌ platforms in the Computing​​ Continuum will depend on​​​‌ middleware able to interoperate‌ and execute in hybrid‌​‌ scenarios. In the short​​ term, we plan to​​​‌ investigate the design a‌ data exchange layer that‌​‌ will be the core​​ of a workflow composition​​​‌ approach that connects with‌ established data staging and‌​‌ transport layers, alleviating the​​ disconnect between raw data​​​‌ management and knowledge-based workflow‌ management in the continuum‌​‌ for better resource balancing,​​​‌ economy, provenance, and data​ reduction. In addition, the​‌ applications themselves could use​​ this information hub to​​​‌ monitor and record the​ progress of their individual​‌ components in smart ways,​​ enriching existing approaches for​​​‌ in situ analysis and​ workflow reproducibility. To ensure​‌ the sustainability of workflow​​ software solutions in an​​​‌ evolving and hyper-heterogeneous landscape,​ on a longer term​‌ we will study the​​ data and access patterns​​​‌ in hybrid workflows involving​ the interaction with emerging​‌ hardware technologies. The final,​​ long-term goal is to​​​‌ contribute to the development​ of an interoperable software​‌ stack by designing data​​ models that address the​​​‌ challenges in encoding, arrangement,​ locality, and mapping to​‌ high-level data abstractions.

3.3​​ Axis 3: Sustainable Resource​​​‌ Management for the Computing​ Continuum

With a growing​‌ number disciplines relying on​​ compute services in the​​​‌ CC, data centers are​ faced with significant changes​‌ in their workload and​​ services. In addition to​​​‌ “traditional” numerical simulation applications,​ there is a massive​‌ influx of data (sometimes​​ coupled with remote sensors),​​​‌ analytical and learning applications.​ These applications present significant​‌ uncertainty and dynamicity in​​ their resource requirements, due​​​‌ to their intrinsic behavior​ and data-intensive profiles. At​‌ the same time, planetary​​ limits and the ecological​​​‌ crisis will also have​ a definite impact on​‌ the way these computing​​ centers are managed.

Within​​​‌ KerData we propose to​ investigate next-generation resource management​‌ techniques enabled by reconfigurable​​ software-defined hardware and by​​​‌ the recent convergence trend​ of industry standards for​‌ the flexible integration of​​ accelerators and disaggregated memory.​​​‌ Memory disaggregation refers to​ the decoupling of physical​‌ and logical memory resulting​​ in flexibility to leverage​​​‌ underutilized resources without physically​ needing reconfigure a distributed​‌ system. To utilize as​​ efficiently as possible the​​​‌ Computing Continuum we will​ take advantage of compute​‌ optimization down to the​​ lowest available levels, which​​​‌ is becoming feasible because​ open toolchains down to​‌ the chip level are​​ maturing.

Provisioning Storage Resources​​​‌ on Large-Scale Infrastructures.

While​ for years HPC systems​‌ were the predominant means​​ of meeting the requirements​​​‌ expressed by large-scale scientific​ workflows, today some components​‌ have moved away from​​ supercomputers to extend across​​​‌ the Computing Continuum. This​ migration has been mainly​‌ motivated by the need​​ of specialized data processing​​​‌ such as data filtering​ at the Edge or​‌ data analysis on Cloud​​ infrastructures. From an I/O​​​‌ and storage perspective, this​ means having to deal​‌ with very different paradigms:​​ infrastructures where direct access​​​‌ to resources is extremely​ limited due to a​‌ very high level of​​ abstraction, on-premise supercomputers offering​​​‌ a low-level approach requiring​ tight user control, or​‌ highly-constrained devices limited in​​ terms of access and​​​‌ reconfiguration. One way to​ address that is to​‌ converge the infrastructures composing​​ the Computing Continuum by​​​‌ exploring ways to provision​ storage resources distributed across​‌ hybrid HPC/Cloud/Edge systems to​​ complex scientific workflows combining​​​‌ data production, simulation and​ data analysis. However, this​‌ implies low-level access to​​ systems that are sometimes​​​‌ difficult to reach, or​ to resources in production.​‌ Simulation is one way​​ of exploring storage provisioning​​ 58, 57,​​​‌ 82.

In this‌ context, we will continue‌​‌ our work on the​​ simulation of storage systems​​​‌ implemented within the StorAlloc‌ 91, 92 simulator.‌​‌ This work has enabled​​ us to demonstrate the​​​‌ correct sizing and partitioning‌ of intermediate storage resources‌​‌ and to work on​​ the modeling of storage​​​‌ systems, including the way‌ in which they distribute‌​‌ data over the available​​ storage spaces. In KerData,​​​‌ we will be exploring‌ new methods for I/O-aware‌​‌ scheduling of jobs on​​ hybrid infrastructures 78.​​​‌ Based on post-mortem studies‌ of storage systems, we‌​‌ will also work on​​ characterizing workflows running on​​​‌ the Computing Continuum 106‌ in order to refine‌​‌ job scheduling decisions. In​​ the longer term, our​​​‌ ambition is to propose‌ a calibrated and validated‌​‌ simulator of storage systems​​ distributed across the Computing​​​‌ Continuum. This simulator will‌ enable us to predict‌​‌ the I/O performance and​​ energy cost of complex​​​‌ workflows leveraging Edge, Cloud‌ and HPC resources. To‌​‌ achieve this, we will​​ rely on state-of-the-art simulation​​​‌ frameworks such as WRENCH‌ 57 and SimGrid 58‌​‌.

Storage Disaggregation and​​ Computational Storage.

A key​​​‌ challenge for many large-scale‌ applications is the mismatch‌​‌ between compute power, the​​ dimension of caches and​​​‌ buffers, and the available‌ I/O bandwidth from the‌​‌ network down to the​​ chip level. Important contributing​​​‌ factors to this situation‌ include economies of scale‌​‌ catering to markets with​​ different needs, a prohibitively​​​‌ expensive development process and‌ lack of manufacturing capacity‌​‌ to consider more customized​​ solutions. However, as computing,​​​‌ memory, storage, and network‌ hardware are becoming increasingly‌​‌ modular and re-configurable, it​​ is possible to consider​​​‌ system and storage architectures‌ that would have been‌​‌ prohibitively expensive before 29​​, 38. The​​​‌ enabling technologies for some‌ of these developments are‌​‌ the emerging industry standards​​ such as the Compute​​​‌ Express Link (CXL) 73‌, 83 for more‌​‌ flexible integration of accelerators​​ and disaggregated storage, and​​​‌ the P4 programming language‌ used in re-configurable networking‌​‌ 48.

We will​​ identify the most energy-intensive​​​‌ routines used on the‌ I/O path, both from‌​‌ the service perspective and​​ from the domain perspective,​​​‌ and curate a portable‌ reference library of key‌​‌ algorithms catering to both​​ software and hardware acceleration.​​​‌ We will leverage, for‌ example, open container standards‌​‌ and instruction set architectures​​ such as RISC-V that​​​‌ can be applied both‌ in data centers and‌​‌ resource-constrained edge contexts 117​​. High-level technologies such​​​‌ as containers facilitate the‌ development of algorithmic improvements‌​‌ but often also introduce​​ runtime overhead, while low-level​​​‌ hardware acceleration allows to‌ reduce the energy consumption‌​‌ of computations to a​​ minimum. Unfortunately the hardware​​​‌ acceleration of all desired‌ functionality is not possible‌​‌ because of the need​​ to retain flexibility and​​​‌ limitations due to cost,‌ manufacturing, and physical constraints.‌​‌ Instead a careful selection​​ of routines that should​​​‌ be hardware accelerated needs‌ to be performed for‌​‌ which the priorities shift​​ from application to application.​​​‌ We will start by‌ exploring domain and service-specific‌​‌ (e.g., compression, erasure coding,​​​‌ encryption) approaches first and​ then identify generalized abstractions​‌ for common functionality useful​​ across domains. Ultimately, this​​​‌ research enables building reusable,​ abstract, and fine-grained building​‌ blocks that allow the​​ construction of frugal computational​​​‌ storage architectures including the​ subset of functionality optimized​‌ for a particular application​​ or workflow.

Frugal Data​​​‌ Storage Architectures to Support​ Post-Exascale Workflows.

Post-exascale workflows​‌ such as digital twins​​ and machine learning require​​​‌ fast access to increasing​ amounts of data in​‌ long-term archives which poses​​ challenging using existing storage​​​‌ technologies. Especially, long-term storage​ is latency and bandwidth​‌ constrained while high-performance storage​​ systems tend to be​​​‌ cost, energy, and capacity​ constrained. A major obstacle​‌ for better utilization of​​ existing technologies lies in​​​‌ the requirements of legacy​ applications, but due to​‌ applications and workflows transitioning​​ to new programming models​​​‌ it becomes possible to​ consider new storage system​‌ architectures. This creates an​​ opportunity to research frugal​​​‌ data storage architectures that​ integrate computational storage allowing​‌ to avoid wait times​​ and stress on contended​​​‌ resources such as the​ network and storage subsystems​‌ while also increasing energy​​ efficiency through hardware acceleration.​​​‌

High I/O performance and​ energy-efficient storage designs require​‌ taking a domain- or​​ application-specific approaches and an​​​‌ extension of computational storage​ to long-term data archives​‌ 73, 83,​​ 69. By applying​​​‌ the methods for holistic​ workflow I/O behavior analysis​‌ already discussed in Axis​​ 2 to discover bottlenecks,​​​‌ it becomes possible to​ identify service- and application-specific​‌ I/O bottlenecks and apply​​ I/O acceleration building blocks.​​​‌ Using these identified building​ blocks we will develop​‌ software libraries that allow​​ their remote execution and/or​​​‌ hardware acceleration close to​ the data storage location.​‌ A holistic effort is​​ necessary to combine advances​​​‌ and consolidation in data​ and workflow management –​‌ as promoted by the​​ FAIR principles (Findable, Accessible,​​​‌ Interoperable, Reusable) 118 –​ with emerging open technologies​‌ for computation and storage​​ 29, 38.​​​‌ To this end, we​ will research low-level software​‌ and hardware support for​​ metadata queries well as​​​‌ aggregations on top of​ self-describing data formats needed​‌ by both the application​​ workflow and data management​​​‌ communities. In particular, we​ will investigate the integration​‌ of emerging storage technologies​​ that allow for highly​​​‌ parallel access as found​ in NAND- and NVRAM-based​‌ systems as manufacturing costs​​ go down, as well​​​‌ as DNA-based storage systems​ when array-based synthesis methods​‌ become commercially available 81​​, 94.

HPC​​​‌ Resource Management Faced with​ the Environmental Crisis.

It​‌ is essential to consider​​ the evolution of HPC​​​‌ in the face of​ the climate crisis, and​‌ its impact on our​​ research topics. As in​​​‌ other field, we have​ to consider what a​‌ "lower-tech" version of HPC​​ would be, how to​​​‌ make it usable. This​ is the target of​‌ this axis. The current​​ trend in HPC has​​​‌ been to outbid each​ other for new supercomputers,​‌ renewing them every 6-7​​ years to make them​​​‌ ever more powerful. However,​ this policy seems hardly​‌ sustainable. Regular shortages of​​ components 42, 77​​, the origin (and​​​‌ uniqueness) of the sources‌ of certain materials, coupled‌​‌ with the geopolitical context​​ 61 alone make this​​​‌ growth policy challenging. We‌ will start by trying‌​‌ to evaluate the need​​ for scale from a​​​‌ social perspective: what is‌ the relation between scale‌​‌ and social advance. While​​ the scientific community traditionally​​​‌ relies on various metrics‌ to assess the performance‌​‌ of HPC systems —such​​ as the Top500 list​​​‌ (based on HPL performance),‌ HPCG, Graph500, IO500— these‌​‌ metrics do not capture​​ how HPC contributes to​​​‌ social progress.

Then, in‌ front of the lack‌​‌ of resources, we expect​​ manufacturers to need to​​​‌ take these risks into‌ account in their future‌​‌ machines. American HPC laboratories​​ already have constraints for​​​‌ the 2050 horizon, such‌ as zero-emission procurement. So,‌​‌ the first trend we​​ can expect is an​​​‌ extension of HPC machine‌ lifetimes. This could be‌​‌ followed by a move​​ towards refurbished machines, i.e.​​​‌ machines that use components‌ from other machines. These‌​‌ changes, and the introduction​​ of second-hand hardware, should​​​‌ open up several challenges‌ for system managers. Until‌​‌ now, the number of​​ faults has grown linearly​​​‌ with the number of‌ resources 35. HPC‌​‌ fault tolerance mechanism assume​​ that the Mean Time​​​‌ Between Failure is large‌ in front of system‌​‌ characteristic time (such as​​ the time to checkpoint​​​‌ data). With second-hand material,‌ the number of fault‌​‌ may increase at a​​ much higher rate, while​​​‌ machine performance would not‌ improve since we are‌​‌ not updating the machines.​​ This would render obsolete​​​‌ existing fault-tolerance mechanisms. To‌ this end we will‌​‌ explore new fault-tolerance mechanisms​​ that could be applicable.​​​‌ Heterogeneity linked to resource‌ unavailability and the increased‌​‌ computational complexity motivates the​​ need for a precise​​​‌ description of available resources.‌ In this context, we‌​‌ will explore alternatives for​​ an efficient design of​​​‌ resource management systems to‌ optimize the use of‌​‌ these resources. In addition,​​ non-fatal faults may be​​​‌ invisible, typically slowdowns/varying performance‌ due to wear and‌​‌ tear. We will investigate​​ how one can detect​​​‌ and manage resources that‌ slow down the calculations‌​‌ performed on it. Of​​ course, one of the​​​‌ challenge of this axis‌ will be to work‌​‌ on defining metrics to​​ evaluate the benefits of​​​‌ various solutions. Indeed, not‌ only is it a‌​‌ multi-dimensional problem (TCO analysis),​​ but it should also​​​‌ consider what has been‌ long known on optimization‌​‌ and Jevons paradox 28​​.

4 Application domains​​​‌

The KerData team investigates‌ the design and implementation‌​‌ of architectures for data​​ storage and processing across​​​‌ clouds, HPC and edge-based‌ systems, which address the‌​‌ needs of a large​​ spectrum of applications. The​​​‌ use cases we target‌ to validate our research‌​‌ results come from the​​ following domains.

4.1 Radio​​​‌ astronomy

The international SKA‌ 103 project aims to‌​‌ create the largest telescope​​ in the world in​​​‌ order to observe a‌ part of the universe.‌​‌ A very large volume​​ of data is generated​​​‌ at the telescope level,‌ pre-processed on local clusters‌​‌ (filtering, reduction) in real​​​‌ time and sent to​ a supercomputer (SDP) at​‌ a rate of 1TB/s.​​ This data feeds numerical​​​‌ simulation, generating 1PB of​ daily output data that​‌ needs to be saved.​​ At this stage, the​​​‌ computing power and storage​ resources required are such​‌ that machines capable of​​ reaching the exascale become​​​‌ necessary. However, the efficient​ use of these systems​‌ raises new challenges, especially​​ regarding data management.

In​​​‌ the context of the​ ExaDoST project (NumPEx PEPR),​‌ for which SKA is​​ one of the main​​​‌ target demonstrators, we are​ working on optimizing the​‌ I/O of a data​​ processing pipeline that is​​​‌ a serious candidate for​ the radio telescope. This​‌ work has also taken​​ the form of active​​​‌ participation in the ECLAT​ (Extreme Computing Lab for​‌ Astronomical Telescopes) joint laboratory​​ 68.

4.2 Nuclear​​​‌ Fusion

GYSELA-X8 is​ our second use case​‌ explored in the Exa-DoST​​ project. It is a​​​‌ plasma simulation code developed​ at CEA as part​‌ of several national and​​ international collaborations. This code​​​‌ is also exhibiting data-related​ challenges with respect to​‌ the scalability of I/O,​​ storage and in-situ processing.​​​‌ It is part of​ a demonstrator for the​‌ Alice Recoque Exascale supercomputer.​​

4.3 Material Science

Coddex​​​‌ (Code de Dynamique des​ Discontinuités pour l'Étude des​‌ cristaux) is a simulation​​ code developed at CEA​​​‌ that solves the equations​ of continuum mechanics in​‌ dynamic hyperelasticity (for instance​​ shocks or rapid loading).​​​‌ It also incorporates the​ description of behavioral discontinuities​‌ of change. Within the​​ Exa-DoST project, this application​​​‌ serves to evaluate the​ approaches that we propose​‌ for efficient on-demand in-situ​​ data analysis. The PhD​​​‌ thesis of Arthur Jaquard​ explores this research direction.​‌

5 Social and environmental​​ responsibility

5.1 Footprint of​​​‌ research activities

HPC and​ cloud facilities are expensive​‌ in capital outlay (both​​ monetary and human) and​​​‌ in energy use and​ it is clear that​‌ there is a related​​ environmental impact inherent to​​​‌ this area. Our work​ on Damaris supports the​‌ efficient use of high​​ performance computing resources. Damaris​​​‌ 4 can help to​ minimize power needed in​‌ running computationally demanding engineering​​ applications and can reduce​​​‌ the amount of storage​ used for results, thus​‌ supporting environmental goals and​​ improving the cost effectiveness​​​‌ of running HPC systems.​ In addition, in the​‌ new research program of​​ the team, the whole​​​‌ third research axis is​ dedicated to frugal and​‌ sustainable HPC.

Another aspect​​ worth mentioning is that​​​‌ our team has strong​ and active international collaborations​‌ which sometimes require intercontinental​​ travels by plane. To​​​‌ minimize carbon footprint, we​ are careful to keep​‌ a balance between a​​ few physical meetings (necessary​​​‌ to maintain substantial exchanges)​ and remote meetings by​‌ videoconference (used in most​​ cases, when traveling is​​​‌ not necessary).

5.2 Impact​ of research results

Our​‌ scientific project includes specific​​ research directions to address​​​‌ challenges posed by sustainability​ and climate change, including​‌ research on frugal storage​​ and on ways to​​​‌ leverage second-hand HPC hardware.​ There is a question​‌ of what sufficient HPC​​ would mean.

Social impact.​​

When considering sufficiency in​​​‌ HPC, we need to‌ question the use of‌​‌ the resources and if​​ we can reduce them.​​​‌ This is the main‌ challenge of the project‌​‌ on result-scalability 11:​​ we aim at proposing​​​‌ ways to correctly resize‌ HPC computations by focusing‌​‌ on an evaluation of​​ the output rather than​​​‌ by considering input-based scaling‌ models.

Environmental impact.

Part‌​‌ of our research focuses​​ on extending the lifespan​​​‌ of HPC machines, in‌ the hope that it‌​‌ could reduce the environmental​​ impact of the field.​​​‌ We have set-up a‌ working group with different‌​‌ teams at Inria Rennes​​ (PACAP, TARAN) to study​​​‌ the challenges extending the‌ life of supercomputers would‌​‌ raise.

6 Highlights of​​ the year

  • Silvina Caino​​​‌ Lores was the recipient‌ of an ANR JCJC‌​‌ project.
  • François Tessier served​​ as a Program Chair​​​‌ of the HiPC'25 international‌ conference.
  • Guillaume Pallez was‌​‌ appointed Associate Editor at​​ IEEE TPDS and IEEE​​​‌ TOPC. He was nominated‌ a member of the‌​‌ steering committee of SC.​​
  • François Tessier , Alexandru​​​‌ Costan , with the‌ help of Jakob Luettgau‌​‌ , organized the 24th​​ IEEE International Symposium on​​​‌ Parallel and Distributed Computing‌ (ISPDC) in‌​‌ Rennes, France.
  • Jakob Luettgau​​ was hired as a​​​‌ permanent Inria Researcher on‌ 1 October 2025.
  • Alexandru‌​‌ Costan left the team​​ on 1 October 2025.​​​‌

7 Latest software developments,‌ platforms, open data

7.1‌​‌ Latest software developments

7.1.1​​ Damaris

  • Keywords:
    Visualization, I/O,​​​‌ HPC, Exascale, High performance‌ computing
  • Scientific Description:

    Damaris‌​‌ is a middleware for​​ I/O and data management​​​‌ targeting large-scale, MPI-based HPC‌ simulations. It initially proposed‌​‌ to dedicate cores for​​ asynchronous I/O in multicore​​​‌ nodes of recent HPC‌ platforms, with an emphasis‌​‌ on ease of integration​​ in existing simulations, efficient​​​‌ resource usage (with the‌ use of shared memory)‌​‌ and simplicity of extension​​ through plug-ins.

    Over the​​​‌ years, Damaris has evolved‌ into a more elaborate‌​‌ system, providing the possibility​​ to use dedicated cores​​​‌ or dedicated nodes to‌ in situ data processing‌​‌ and visualization. It proposes​​ a seamless connection to​​​‌ the VisIt visualization framework‌ to enable in situ‌​‌ visualization with minimum impact​​ on run time. Damaris​​​‌ provides an extremely simple‌ API and can be‌​‌ easily integrated into the​​ existing large-scale simulations.

    Damaris​​​‌ was at the core‌ of the PhD thesis‌​‌ of Matthieu Dorier, who​​ received an Accessit to​​​‌ the Gilles Kahn Ph.D.‌ Thesis Award of the‌​‌ SIF and the Academy​​ of Science in 2015.​​​‌ Developed in the framework‌ of our collaboration with‌​‌ the JLESC – Joint​​ Laboratory for Extreme-Scale Computing,​​​‌ Damaris was the first‌ software resulted from this‌​‌ joint lab validated in​​ 2011 for integration to​​​‌ the Blue Waters supercomputer‌ project. It scaled up‌​‌ to 16,000 cores on​​ Oak Ridge’s leadership supercomputer​​​‌ Titan (first in the‌ Top500 supercomputer list in‌​‌ 2013) before being validated​​ on other top supercomputers.​​​‌ Active development is currently‌ continuing within the KerData‌​‌ team at Inria, where​​ it is at the​​​‌ center of several collaborations‌ with industry as well‌​‌ as with national and​​​‌ international academic partners.

    Damaris​ has been selected to​‌ be one of the​​ key software pieces of​​​‌ software for the NumPEx​ PEPR project, which aims​‌ to provide the software​​ infrastructure for the future​​​‌ Exascale machine to be​ hosted in France in​‌ 2025 (Alice Recoque, Jules​​ Vernes project). The capabilities​​​‌ within Damaris will further​ studied in collaboration with​‌ CEA within the NumPEx​​ exploratory PEPR project.

  • Functional​​​‌ Description:
    Damaris is a​ middleware for data management​‌ and in-situ visualization targeting​​ large-scale HPC simulations. Damaris​​​‌ enables: - In-situ data​ analysis by using selected​‌ dedicated cores/nodes of the​​ simulation platform. - Asynchronous​​​‌ and fast data transfer​ from HPC simulations to​‌ Damaris. - Semantic-aware dataset​​ processing through Damaris plug-ins,​​​‌ - Writing aggregated data​ (by hdf5 format) or​‌ visualizing them either by​​ VisIt or ParaView. -​​​‌ Dask analytics supports.
  • Release​ Contributions:
    v1.12.1 of Damaris​‌ provides basis for an​​ overhaul of the the​​​‌ Plugin layer: adding event​ triggers on specific hooks,​‌ reorganizing event functioning, and​​ enabling/adding data dependency for​​​‌ events. It includes also​ the (missing) implementions of​‌ the management of Parameter​​ for the string and​​​‌ label types, and the​ handling of some typos​‌ and bugs.
  • News of​​ the Year:
    In 2025,​​​‌ an extendable Scheduling layer​ has been added (yet​‌ to be released): to​​ reduce communication costs. Also,​​​‌ to enable dynamic analysis​ handling capability in Damaris,​‌ two main activities have​​ been carry out (yet​​​‌ to be released). The​ development of a dynamic​‌ expression module, and an​​ overhaul of the the​​​‌ Plugin layer (Harmonization of​ plugin definition, with possibility​‌ to pass specific data​​ to each plugin, dynamic​​​‌ event creation, triggers (with​ condition), event/data dependency, data​‌ availability across iteration with​​ ‘sliding window’). Furthermore, in​​​‌ the context of NumPEx​ PEPR project, we enhanced​‌ the Damaris / PDI​​ interoperability. We continued the​​​‌ development of the Damaris​ plugin for PDI (to​‌ be release soon), and​​ started working on the​​​‌ PDI plugin in Damaris.​ With this, the simulation​‌ instrumented with PDI (https://pdi.dev/main/),​​ can use Damaris to​​​‌ perform asynchronous data analysis​ using dedicated resources, and​‌ the ones instrumented with​​ Damaris could have full​​​‌ access to PDI ecosystem.​ In addition, Damaris is​‌ now part of the​​ NumPEx Software Catalog (https://numpex-pc5.gitlabpages.inria.fr/tutorials/projects/catalog/index.html).​​​‌
  • URL:
  • Contact:
    Gabriel​ Antoniu
  • Participant:
    8 anonymous​‌ participants
  • Partner:
    ENS Rennes​​

7.1.2 E2Clab

  • Name:
    Edge-to-Cloud​​​‌ lab
  • Keywords:
    Distributed systems,​ Cloud, Reproducibility, Experimentation, Computing​‌ Continuum, Evaluation, Large scale,​​ Provenance
  • Scientific Description:

    E2Clab​​​‌ is a framework that​ implements a rigorous methodology​‌ that provides guidelines to​​ move from real-life application​​​‌ workflows to representative settings​ of the physical infrastructure​‌ underlying this application in​​ order to accurately reproduce​​​‌ its relevant behaviors and​ therefore understand and optimize​‌ end-to-end performance.

    E2Clab allows​​ a rigorous analysis of​​​‌ possible application configurations in​ a controlled testbed environment​‌ to understand their behavior​​ and related performance trade-offs.​​​‌ E2Clab can be generalized​ to other applications in​‌ the Edge-to-Cloud Continuum. E2Clab​​ is currently used by​​​‌ the Pl@ntNet team to​ understand and optimize the​‌ performance of the application.​​ It is also used​​ by our partners from​​​‌ Instituto Politécnico Nacional for‌ automatic experiment deployments in‌​‌ the context of the​​ SmartFastData associate team.

    In​​​‌ an effort to enhance‌ the reproducibility capabilities of‌​‌ E2Clab, we extended it​​ to enable efficient provenance​​​‌ date capture across the‌ Edge-to-Cloud Continuum. Specifically, we‌​‌ leverage simplified data models,​​ data compression and grouping,​​​‌ and lightweight transmission protocols‌ to reduce overheads for‌​‌ collecting such data on​​ the IoT/Edge. This integration​​​‌ makes E2Clab a promising‌ platform for the performance‌​‌ optimization of applications through​​ reproducible experiments.

  • Functional Description:​​​‌
    E2Clab is a framework‌ that implements a rigorous‌​‌ methodology that provides guidelines​​ to move from real-life​​​‌ application workflows to representative‌ settings of the physical‌​‌ infrastructure underlying this application​​ in order to accurately​​​‌ reproduce its relevant behaviors‌ and therefore understand end-to-end‌​‌ performance. Understanding end-to-end performance​​ means rigorously mapping the​​​‌ scenario characteristics to the‌ experimental environment, identifying and‌​‌ controlling the relevant configuration​​ parameters of applications and​​​‌ system components, and defining‌ the relevant performance metrics.‌​‌
  • Release Contributions:

    Changelog: https://gitlab.inria.fr/E2Clab/e2clab/-/blob/master/CHANGELOG.rst?ref_type=heads​​

    Features (release 1.0.0):

    (i)​​​‌ the configuration of the‌ experimental environment, libraries and‌​‌ frameworks, (ii) the mapping​​ between the application parts​​​‌ and machines on the‌ Edge, Fog and Cloud,‌​‌ (iii) the deployment of​​ the application on the​​​‌ infrastructure, (iv) Edge-to-Cloud network‌ emulation, (v) the automated‌​‌ execution and monitoring, (vi)​​ the application optimization, and​​​‌ (vii) the gathering of‌ experiment metrics.

  • News of‌​‌ the Year:

    In an​​ effort coordinated within the​​​‌ PEPR Cloud, we have‌ worked towards adapting E2Clab‌​‌ to run experiment leveraging​​ commercial computing resources provided​​​‌ by Scaleway. Enabling users‌ to occasionally leverage resources‌​‌ provided by Scaleway would​​ give them access to​​​‌ state-of-the art GPU nodes‌ and diversity of computing‌​‌ resources.

    Additional contributions include:​​ - Ongoing experiments with​​​‌ the ECLAT laboratory to‌ provide experimental support to‌​‌ their simulation pipeline -​​ Improved software reliability through​​​‌ testing and usability through‌ easy ssh access to‌​‌ deployed resources - Documented​​ use of the software​​​‌ for new use-cases such‌ as Federated Learning.

    Latest‌​‌ release archive: https://gitlab.inria.fr/E2Clab/e2clab/-/releases/v3.6.0

  • URL:​​
  • Publications:
    hal-04208787,​​​‌ hal-04779813, hal-04698619,‌ hal-02916032, hal-03310540,‌​‌ hal-03269852, hal-03332524,​​ hal-03270129, hal-03338520,​​​‌ hal-03324177, hal-03259975,‌ hal-03409405, hal-03510012,‌​‌ hal-04659211
  • Contact:
    Gabriel Antoniu​​
  • Participant:
    5 anonymous participants​​​‌

7.1.3 Fives

  • Name:
    Simulator‌ for Scheduling on Storage‌​‌ System at Scale
  • Keywords:​​
    Simulation, HPC, Distributed Storage​​​‌ Systems
  • Scientific Description:
    Development‌ of Fives began in‌​‌ 2023, given the limitations​​ of our previous StorAlloc​​​‌ simulator. At the end‌ of 2023, Fives is‌​‌ still in active development,​​ while its design and​​​‌ initial results are being‌ submitted to a conference‌​‌ in the field.
  • Functional​​ Description:

    Fives is a​​​‌ storage resource scheduling simulator‌ for supercomputers based on‌​‌ WRENCH and SimGrid, two​​ state-of-the-art simulation frameworks. In​​​‌ particular, Fives can model‌ a parallel file system‌​‌ such as Lustre, a​​ computing partition, and simulate​​​‌ a set of jobs‌ performing I/O on the‌​‌ resulting HPC system.

    Fives​​ is based on several​​​‌ components. Firstly, as part‌ of the development of‌​‌ this simulator, an abstraction​​​‌ called "Compound Storage Service"​ was proposed to represent​‌ a distributed storage system,​​ and integrated into WRENCH.​​​‌ Within Fives, a job​ model was designed to​‌ represent a history of​​ jobs and submit them​​​‌ to the scheduler present​ in WRENCH. Finally, a​‌ model of an existing​​ supercomputer, Theta at Argonne​​​‌ National Laboratory, and a​ reverse-engineered version of its​‌ Lustre file system were​​ developed in our simulator.​​​‌

    Experiments are underway to​ calibrate and validate Fives.​‌

  • Publication:
  • Contact:
    François​​ Tessier

7.1.4 MOSAIC

  • Name:​​​‌
    Merging Operations and SegmentAtion​ for I/O Categorization
  • Keywords:​‌
    Categorization, HPC, I/O
  • Scientific​​ Description:

    MOSAIC is a​​​‌ Python categorizer that takes​ I/O traces as input​‌ and assigns classes to​​ describe the patterns found​​​‌ inside.

    Those classes form​ a general description of​‌ applications' I/O activity, giving​​ information about the temporality​​​‌ of I/O, whether periodic​ operations occur, and an​‌ estimation of the impact​​ on the metadata servers.​​​‌

    One of MOSAIC's building​ blocks is the automatic​‌ detection of recurring operations.​​ This is achieved with​​​‌ a clustering algorithm that​ groups operations sharing the​‌ same characteristics (duration, I/O​​ amount, etc.) into one​​​‌ single recurring operation.

    MOSAIC​ automatically finds the traces​‌ that were generated by​​ the same program to​​​‌ reduce the number of​ files to be processed​‌ and speed up a​​ system-scale categorization.

    MOSAIC works​​​‌ for now with traces​ from the Darshan monitoring​‌ tool but can be​​ easily extended to fit​​​‌ other trace formats.

    MOSAIC​ was used to process​‌ the 2019 traces from​​ the BlueWaters supercomputer trace​​​‌ dataset (National Center for​ Supercomputing Applications - University​‌ of Illinois).

  • Functional Description:​​

    MOSAIC is a tool​​​‌ for categorizing HPC application​ storage activity. It processes​‌ traces containing all application​​ storage operations and assigns​​​‌ classes to describe how​ they are performed.

    MOSAIC​‌ can describe when the​​ activity is performed (when​​​‌ the application starts, at​ the end, throughout the​‌ execution, etc.), find if​​ some operations are recurring​​​‌ (e.g., saving data to​ a file every 10​‌ minutes), and estimate the​​ overhead caused by the​​​‌ metadata operations.

    It can​ analyze large datasets of​‌ I/O traces coming from​​ a supercomputer to find​​​‌ the general behavior of​ the applications that were​‌ carried out on the​​ machine.

  • News of the​​​‌ Year:
    Support of file​ temperature, better detection of​‌ periodic behavior and improved​​ performance for very large​​​‌ datasets were implemented in​ 2025. An intermediate data​‌ format, based on the​​ so-called Trace Event Format,​​​‌ was developed for MOSAIC​ to convert traces from​‌ I/O monitoring tools (such​​ as Darshan, Recorder, and​​​‌ so on) to a​ common abstraction.
  • Publication:
  • Contact:
    François Tessier

7.1.5​​ FLAdversary

  • Name:
    Emulation of​​​‌ Federated Learning Scenarios with​ Adversarial Clients
  • Keywords:
    Federated​‌ learning, Emulation, Adversarial attack​​
  • Functional Description:

    Federated Learning​​​‌ (FL) is subject to​ diverse threats from the​‌ Edge of the network​​ where local training runs​​​‌ on widely distributed, heterogeneous​ and volatile resources.

    FLAdversary​‌ provides tools to dynamically​​ introduce adversarial attacks into​​​‌ the FL training phase.​ Different (model and data)​‌ poisoning attacks can be​​ introduced at the client​​ level to emulate adversaries​​​‌ in the FL training.‌ Several defensive strategies are‌​‌ provided as baselines.

  • Publication:​​
  • Contact:
    Gabriel Antoniu​​​‌
  • Partner:
    DFKI (German Research‌ Center for Artificial Intelligence)‌​‌

7.1.6 FLDrift

  • Name:
    Emulation​​ of Federated Learning Scenarios​​​‌ with Client Drift
  • Keywords:‌
    Federated learning, Emulation, Heterogeneous‌​‌ Data
  • Functional Description:

    When​​ deploying Federated Learning (FL)​​​‌ on the Computing Continuum,‌ devices are subject to‌​‌ high variations in local​​ data distributions. This limits​​​‌ the capacity of the‌ system to generate a‌​‌ single model optimized for​​ the entire federation of​​​‌ devices.

    FLDrift provides support‌ for various Non-IID scenarios‌​‌ (i.e., introducing concept-drift and​​ label-shift between federated peers)​​​‌ for FL experiments. Several‌ personalization/clustering strategies are provided‌​‌ as baselines.

  • News of​​ the Year:
    We implemented​​​‌ several baseline clustering strategies‌ improving personalization in Federated‌​‌ Learning to address client​​ drift. FLDrift proposes 4​​​‌ scenarios to evaluate the‌ performance of clustering approaches.‌​‌ Each scenario introduces a​​ different form of concept​​​‌ drift between client local‌ datasets.
  • Publication:
  • Contact:‌​‌
    Gabriel Antoniu
  • Partner:
    DFKI​​ (German Research Center for​​​‌ Artificial Intelligence)

7.2 Open‌ data

7.2.1 I/O Traces‌​‌

For our IPDPS'25 paper​​ 13, we used​​​‌ traces of I/O activity‌ from four different systems‌​‌ to answer a set​​ of questions about temporal​​​‌ I/O behavior. To focus‌ on realistic workloads, we‌​‌ gathered traces from jobs​​ running over a period​​​‌ of time instead of‌ profiling a limited set‌​‌ of selected applications.

Two​​ of these data sets​​​‌ were Darshan traces available‌ online (from the Intrepid‌​‌9 and Blue Waters​​ systems10), while​​​‌ two others were obtained‌ by us, using file‌​‌ system monitoring tools:

  • PlaFRIM​​ (BeeGFS): a 192-nodes experimental​​​‌ platform in Bordeaux, monitored‌ during 26 months (2022–2024).‌​‌
  • SDumont (Lustre): the largest​​ supercomputer in Latin America,​​​‌ monitored during 12 months‌ (2020).

The collected file‌​‌ system data was correlated​​ with the batch scheduler​​​‌ logs to obtain two‌ time series of I/O‌​‌ bandwidth per job (for​​ "reads" and "writes"), with​​​‌ a value per second‌ for PlaFRIM and a‌​‌ value every 15 seconds​​ for SDumont. The two​​​‌ datasets as well as‌ all code and instructions‌​‌ on how to reproduce​​ our experiments are provided​​​‌ in Zenodo: https://­doi.­org/­10.­5281/­zenodo.­14965920.‌ As explained in the‌​‌ instructions, additional information can​​ be obtained from https://­github.­com/­tuda-parallel/­FTIO/­tree/­main/­artifacts/­ipdps25​​​‌ for FTIO, and https://­zenodo.­org/­records/­13785395‌ for MOSAIC.

8 New‌​‌ results

8.1 Supporting Data-Centric​​ Applications and Workflows Running​​​‌ Across the Computing Continuum‌

8.1.1 On the Reproducibility‌​‌ Challenges of Federated Learning:​​ Investigating the Gap between​​​‌ Simulation, Emulation and Real-World‌ Deployments

Participants: Cédric Prigent‌​‌, Alexandru Costan,​​ Gabriel Antoniu.

  • Collaboration.​​​‌
    This work has been‌ carried out in co-operation‌​‌ with Cédric Tedeschi (University​​ of Rennes, MAGELLAN team),​​​‌ Loïc Cudennec (DGA MI)‌ and Kate Keahey (Argonne‌​‌ National Laboratory), in the​​ framework of the STEEL​​​‌ project of the PEPR‌ CLOUD program and of‌​‌ the UNIFY 2 Associate​​ Team with ANL, associated​​​‌ to teh JLESC international‌ laboratory.

Federated Learning (FL)‌​‌ is an emerging paradigm​​ for decentralized training of​​​‌ Machine Learning models. It‌ has been the subject‌​‌ of a large corpus​​​‌ of research due to​ its innovative approach to​‌ handling sensitive data. A​​ common practice in the​​​‌ FL literature is to​ run simulations on a​‌ single compute node to​​ assess the performance of​​​‌ FL algorithms. While simulation​ enables fast prototyping and​‌ validation of algorithmic concepts,​​ it may face limitations​​​‌ in reproducing the real​ system's performance in heterogeneous​‌ environments such as the​​ Computing Continuum, and particularly​​​‌ on resource-constrained Edge devices.​ Conversely, emulation on distributed​‌ testbeds offers more effective​​ means to accurately reproduce​​​‌ the performance of real-world​ devices. However, to the​‌ best of our knowledge,​​ no prior research has​​​‌ investigated the differences between​ simulation and emulation in​‌ FL experiments. In this​​ work, we study the​​​‌ complementarity of these approaches​ and discuss their respective​‌ challenges, as a first​​ step towards reproducibility of​​​‌ FL experiments. We illustrate​ our study with a​‌ real-life application used as​​ a baseline: an outdoor​​​‌ air quality forecasting framework​ with real-world sensors. Our​‌ results show that simulation​​ can be used to​​​‌ accurately reproduce model performance​ metrics, while emulation can​‌ effectively reproduce the system​​ performance of real-world experiments.​​​‌ Finally, we present a​ set of lessons learned​‌ on the challenges of​​ FL reproducibility and the​​​‌ selection of experimental infrastructures​ for FL experiments and​‌ applications. This work has​​ been published as 16​​​‌.

8.1.2 Evaluating Federated​ Learning Workflows Beyond Simulation:​‌ A Deployment-Aware Methodology

Participants:​​ Mathis Valli, Alexandru​​​‌ Costan, Gabriel Antoniu​.

  • Collaboration.
    This work​‌ has been carried out​​ in co-operation with Cédric​​​‌ Tedeschi (University of Rennes,​ MAGELLAN team) and Loïc​‌ Cudennec (DGA MI).

Federated​​ Learning (FL) is often​​​‌ evaluated in simulation, which​ overlooks network variability, system​‌ heterogeneity, and energy costs​​ in geo-distributed settings. We​​​‌ propose a deployment-aware methodology​ that triangulates analytical modeling​‌, simulation, and​​ real-world deployments within a​​​‌ unified FL evaluation framework.​ For a given series​‌ of experimental scenarios, the​​ methodology allows to assess​​​‌ the consistency of performance​ trends across the three​‌ evaluation approaches, quantifying deviations​​ in key metrics such​​​‌ as run time, communication​ overhead, and energy consumption.​‌ This further enables cross-validation​​ of the reliability of​​​‌ multiple measurement tools, highlighting​ discrepancies in commonly reported​‌ metrics such as the​​ energy usage.

The methodology​​​‌ is validated on FL​ workloads by comparing analytical​‌ predictions and simulations against​​ large-scale deployments on the​​​‌ Grid’5000 testbed, spanning 51​ nodes across four geographically​‌ distant sites. By varying​​ key FL components such​​​‌ as aggregation algorithms, client​ sampling rates, and datasets,​‌ we characterize how different​​ FL design choices affect​​​‌ the reliability of the​ three evaluation approaches. Our​‌ findings reveal significant divergences:​​ analytical models accurately capture​​​‌ communication patterns and preserve​ the relative performance of​‌ the scenarios, simulations reflect​​ broad trends but often​​​‌ lead to performance rankings​ of different configurations inconsistent​‌ with those found through​​ actual deployment, while only​​​‌ the latter uncovers hidden​ costs, such as increased​‌ energy consumption due to​​ data imbalances.

This work​​​‌ has been submitted for​ publication to a conference​‌ (currently under evaluation).

8.1.3​​ Supporting SKA data processing​​ workflows with the E2CLab​​​‌ approach to workflow lifecycle‌ management across the continuum‌​‌

Participants: Thomas Badts,​​ Gabriel Antoniu.

  • Collaboration.​​​‌
    This work has been‌ carried out in co-operation‌​‌ with Baptiste Besnard and​​ Damien Gratadour (LIRA, Observatoire​​​‌ de Paris)

Tu support‌ automatic deployment, the complete‌​‌ analysis cycle and the​​ optimization of applications on​​​‌ the Computing, we have‌ proposed the E2Clab methodology‌​‌ and its supporting software​​ tool for workflow lifecycle​​​‌ management across the Continuum.‌ We aim to assist‌​‌ the execution of the​​ Karabo pipeline 32 for​​​‌ radioastronomy simulation by enabling‌ reproducible distributed deployments and‌​‌ experiments on academic testbeds.​​ The Karabo pipeline is​​​‌ being developped to support‌ simulation of the future‌​‌ SKA radiotelescope within the​​ ECLAT laboratory. E2Clab also​​​‌ provides the workflow capabilities‌ to run optimization loops‌​‌ over end-to-end experiments and​​ improve parameter discovery and​​​‌ fine-tuning in a complex,‌ cross-disciplinary, stack of software‌​‌ components ranging from distributed​​ computing frameworks to astrophysics​​​‌ simulations.

This collaboration started‌ in 2005 is still‌​‌ in the exploratory stages,​​ further work is expected​​​‌ in the following year.‌

8.1.4 Methodology for Automated‌​‌ IoT Experimentation in Controlled​​ Testbeds Prior to Real-World​​​‌ Deployments

Participants: Elias Del‌ Pozo Punal, Silvina‌​‌ Caino Lores, Thomas​​ Badts, Gabriel Antoniu​​​‌.

  • Collaboration.
    This work‌ has been carried out‌​‌ in co-operation with Felix​​ Garcia-Carballeira and Alejandro Calderon​​​‌ from University Carlos III‌ of Madrid.

Several tools‌​‌ and frameworks have been​​ proposed to automate deployments​​​‌ in distributed systems. Infrastructure-as-Code‌ (IaC) approaches such as‌​‌ Ansible, Puppet, Salstack, or​​ Chef are widely used​​​‌ to abstract low-level configuration‌ details. In parallel, some‌​‌ frameworks support experiment description​​ and execution in specific​​​‌ research testbeds, such as‌ the cOntrol and Management‌​‌ Framework (OMF) and the​​ OMF Measurement Library (OML)​​​‌ 27. Despite these‌ advances, existing solutions often‌​‌ remain limited to specific​​ domains or infrastructures, and​​​‌ integrating heterogeneous environments remains‌ a challenge when considering‌​‌ the broader computing continuum,​​ and in particular for​​​‌ IoT deployments. As a‌ result, researchers frequently rely‌​‌ on fragmented tools or​​ manual procedures, which hinder​​​‌ the repeatability and scalability‌ of experiments and ultimately‌​‌ limit the ability to​​ perform consistent pre-deployment validation​​​‌ of IoT systems in‌ controlled environments.

This work‌​‌ proposes a general methodology​​ for automated IoT experimentation​​​‌ and validation in controlled‌ environments. The approach provides‌​‌ a structured workflow for​​ designing, deploying, and executing​​​‌ experiments across different testbeds‌ in a reproducible, scalable‌​‌ manner. It allows researchers​​ to evaluate IoT deployments​​​‌ through controlled simulations and‌ pre-deployment testing, bridging the‌​‌ gap between conceptual design​​ and real-world implementation.

8.2​​​‌ Data-Aware Middleware Approaches for‌ the Computing Continuum

8.2.1‌​‌ Multi-level analysis of the​​ I/O pattern of HPC​​​‌ applications

Participants: François Tessier‌, Théo Jolivel,‌​‌ Jakob Luettgau, Julien​​ Monniot, Gabriel Antoniu​​​‌.

  • Collaboration.
    This work‌ has been carried out‌​‌ in close co-operation with​​ Philippe Deniel from CEA,​​​‌ the Inria TADaaM team‌ in Bordeaux within the‌​‌ Exa-DoST project of the​​ PEPR NumPEx program. It​​​‌ also involves a collaboration‌ with Ahmad Tarraf from‌​‌ the Technical University of​​​‌ Darmstadt, Germany.

While the​ ratio of I/O performance​‌ to computing power has​​ declined by a factor​​​‌ of 10 in the​ last decade 11,​‌ the volume of data​​ generated by scientific workflows​​​‌ and applications has significantly​ grown. In some supercomputing​‌ centers for instance, this​​ volume has increased almost​​​‌ 40-fold in ten years.​ This has made access​‌ to storage resources a​​ major bottleneck to scaling​​​‌ up applications.

Several levers​ exist along the data​‌ path to mitigate this​​ burden. For example, optimizations​​​‌ can be applied at​ the I/O library level​‌ or within the application​​ source code to improve​​​‌ I/O performance. At the​ job scheduler level, decisions​‌ can be taken when​​ allocating resources to avoid​​​‌ I/O interference between jobs.​ However, all these optimizations​‌ require a good upstream​​ understanding of application I/O​​​‌ behavior.

In this research​ axis, we are working​‌ on analyzing the I/O​​ behavior of large-scale applications​​​‌ at various levels. The​ thesis that Théo Jolivel​‌ started in October 2024​​ proposes to tackle this​​​‌ question. One approach is​ to exploit public datasets​‌ containing several years of​​ I/O execution traces of​​​‌ applications running on supercomputers.​ We developed multiple methodologies​‌ and tools to pre-process​​ those datasets, extract the​​​‌ relevant data, and analyse​ the data access behavior.​‌ In particular, we extended​​ MOSAIC 79, a​​​‌ categorizer that detects I/O​ patterns from execution traces.​‌ MOSAIC extracts I/O operations​​ contained in I/O traces​​​‌ and assigns classes to​ describe how I/O operations​‌ are performed throughout the​​ execution. The description is​​​‌ based on three distinct​ axes: I/O temporality (when​‌ was data read or​​ written?), access periodicity (are​​​‌ there recurring operations?), and​ metadata overhead (what is​‌ the impact of metadata​​ operations?). This extended version​​​‌ is under submission in​ a conference 21 and​‌ has been presented as​​ a poster during the​​​‌ annual meeting of the​ ExaDoST project 22 (an​‌ updated version of this​​ poster is also under​​​‌ submission for the PASC​ 2026 conference). A complementary​‌ work on the temporal​​ I/O behavior of HPC​​​‌ applications, in collaboration with​ Inria Bordeaux and TU​‌ Darmsdadt, has been presented​​ at IPDPS'2025, an A-rank​​​‌ conference in the field​ 19.

8.2.2 Study​‌ of I/O interference between​​ jobs

Participants: François Tessier​​​‌, Méline Trochon.​

  • Collaboration.
    This work has​‌ been carried out in​​ close co-operation with the​​​‌ Inria TADaaM team in​ Bordeaux and Jean-Thomas Acquaviva,​‌ from DDN within the​​ Exa-DoST project of the​​​‌ PEPR NumPEx program.

High-performance​ computing is a key​‌ component for accelerating scientific​​ discovery and innovation by​​​‌ enabling rapid processing of​ complex simulations and large-scale​‌ data analyses. As HPC​​ applications grow in scale,​​​‌ the performance of the​ underlying storage infrastructure, particularly​‌ parallel file systems (PFS),​​ becomes critical. These shared​​​‌ systems distribute data across​ multiple storage targets (OST),​‌ but concurrent access by​​ multiple jobs can lead​​​‌ to interference, reducing performance​ compared to isolated operations.​‌ Interference varies depending on​​ application characteristics, often degrading​​​‌ overall bandwidth and causing​ significant performance variability, sometimes​‌ by orders of magnitude.​​

In the context of​​ Méline Trochon's PhD thesis​​​‌ (CIFRE DDN-Inria) we studied‌ how interference impacts checkpointing,‌​‌ a key fault-tolerance technique​​ in HPC. Checkpointing involves​​​‌ periodically saving application data‌ to persistent storage to‌​‌ recover from failures. As​​ applications handle more data,​​​‌ checkpoint files grow larger,‌ making I/O performance even‌​‌ more crucial. Interference during​​ these operations can severely​​​‌ affect their efficiency, highlighting‌ the need to understand‌​‌ and mitigate its effects.​​

To do this, we​​​‌ launched a large number‌ of experiments with an‌​‌ application that simulates checkpoint​​ phases and one or​​​‌ more applications that simulate‌ interference. Since the checkpoint‌​‌ application has fixed parameters,​​ we looked at how​​​‌ different configurations of interference‌ workloads may or may‌​‌ not affect I/O performance​​ and to what extent.​​​‌ This work is currently‌ being finalized and a‌​‌ paper is expected to​​ be published in 2026.​​​‌ A pre-print is already‌ available online 23.‌​‌ This work will continue​​ in 2026, notably through​​​‌ the development of a‌ simulator that will allow‌​‌ us to test more​​ configurations.

8.2.3 Enabling Efficient​​​‌ Runtime Data Analysis to‌ a Crystal Deformation Simulation‌​‌

Participants: Arthur Jaquard,​​ Silvina Caino Lores,​​​‌ Gabriel Antoniu.

  • Collaboration.‌
    This work has been‌​‌ carried out in close​​ co-operation with Laurent Colombet​​​‌ (from CEA DAM) and‌ Julien Bigot (CEA/Maison de‌​‌ la Simulation) within the​​ Exa-DoST project of the​​​‌ PEPR NumPEx program.

Exascale‌ simulations generate massive data‌​‌ volumes that strain I/O​​ and post-hoc analysis. In​​​‌ the framework of Arthur‌ Jaquard's PhD thesis we‌​‌ explore how in-situ analysis​​ as supported by the​​​‌ Damaris in situ middleware‌ can benefit to Coddex,‌​‌ a crystal deformation code,​​ to offload data movement​​​‌ and analysis to dedicated‌ processes. This is achieved‌​‌ by enabling runtime extraction​​ of key diagnostics without​​​‌ writing intermediate files. We‌ evaluated tin hysteresis cases‌​‌ on CEA's INTI cluster​​ (with 14 nodes, 1,728​​​‌ cores) and compare against‌ a ParaView-based post-hoc pipeline.‌​‌ In situ analysis eliminates​​ per-iteration I/O stalls and​​​‌ reduces output time by‌ up to 5x while‌​‌ preserving overall iteration time,​​ with benefits increasing with​​​‌ the number of tracked‌ variables. This work is‌​‌ conducted within the Exa-DoST​​ project of the PEPR​​​‌ NumPEx program, which aims‌ to build the software‌​‌ infrastructure for the first​​ Exascale machine expected to​​​‌ be set up in‌ France (Alice Recoque, Jules‌​‌ Verne project). It has​​ been published as a​​​‌ poster at the SC25‌ conference 24.

8.3‌​‌ Sustainable Resource Management for​​ the Computing Continuum

8.3.1​​​‌ Result-Scalability: Following the Evolution‌ of Selected Social Impact‌​‌ of HPC.

Participants: Guillaume​​ Pallez.

  • Collaboration.
    This​​​‌ work has been carried‌ out in collaboration with‌​‌ Sally Rose Ellingson from​​ the medical college of​​​‌ the University of Kentucky.‌

While the scientific community‌​‌ traditionally relies on various​​ computational metrics to assess​​​‌ the performance of HPC‌ systems –such as the‌​‌ TOP500 list (based on​​ HPL performance), HPCG, Graph500,​​​‌ IO500– these metrics do‌ not capture how HPC‌​‌ contributes to social progress.​​ We propose 11 a​​​‌ novel approach to follow‌ how the growth of‌​‌ HPC systems and the​​​‌ advances of HPC research​ address concrete social challenges.​‌ The uniqueness of these​​ new metrics lies in​​​‌ their ability to not​ only measure the capabilities​‌ of HPC architectures but​​ also to gauge the​​​‌ concrete social advancements achieved​ through their use: it​‌ focuses on the output​​ of the computation instead​​​‌ of its input. Contrarily​ to current measure, it​‌ also promotes the diversity​​ of machines by evaluating​​​‌ the Pareto front created​ between size and result.​‌ We emphasize the need​​ for dynamic, community-driven metrics​​​‌ that can evolve with​ emerging social needs.

8.3.2​‌ Increasing the Lifetime of​​ HPC Machines: Issues, Implications,​​​‌ and Open Challenges

Participants:​ Guillaume Pallez, Robin​‌ Boezennec.

  • Collaboration.
    This​​ work has been carried​​​‌ out as a large​ collaboration in Rennes including​‌ two different teams: PACAP​​ and TARAN, as well​​​‌ as with Brice Goglin​ (TADAAM in Bordeaux)

Extending​‌ the lifetime of High-Performance​​ Computing (HPC) machines is​​​‌ becoming an important concern​ for a variety of​‌ reasons. These include the​​ environmental and human costs​​​‌ associated with chip manufacturing,​ the rising demands by​‌ AI workloads, the soaring​​ prices of accelerator chips,​​​‌ political blocks, and delays​ in the delivery of​‌ next-generation supercomputers. As a​​ community, we must reconsider​​​‌ the traditional HPC paradigm​ and explore new strategies​‌ for making existing HPC​​ infrastructure viable for longer​​​‌ periods. In 18,​ we highlight the current​‌ barriers in prolonging HPC​​ machines lifespan and discuss​​​‌ key technical and operational​ challenges towards this goal.​‌

8.3.3 Improving Supercomputer Usage​​ with Aging Awareness.

Participants:​​​‌ Guillaume Pallez, Robin​ Boezennec, Alix Tremodeux​‌.

Lifetime of electronic​​ devices has a critical​​​‌ impact on their environmental​ footprint. In addition, the​‌ high-demand by AI companies​​ of GPU has reduced​​​‌ tremendously their availability for​ supercomputing centers. Consequently, improving​‌ the duration of CPUs​​ and GPUs is becoming​​​‌ a major issue in​ High Performance Computing (HPC)​‌ domain. This contribution 12​​ investigates the optimization of​​​‌ a machine usage before​ a fatal failure and​‌ the trade-offs with performance.​​ The lifetime of computing​​​‌ devices is strongly connected​ with the temperature and​‌ thus with the running​​ frequency. We investigate the​​​‌ node frequency reconfiguration to​ optimize HPC usage. We​‌ estimate the benefit of​​ a dedicated scheduling algorithm​​​‌ compared with a constant​ frequency.

We show that​‌ a correct decision can​​ increase considerably the number​​​‌ of FLOP of a​ machine with a trade-off​‌ in terms of performance.​​ Because aging models are​​​‌ currently inaccurate, we consider​ different models and discuss​‌ the robustness of our​​ algorithms to inaccuracy

8.3.4​​​‌ Priority-BF: a Task Manager​ for Priority-Based Scheduling

Participants:​‌ Guillaume Pallez.

  • Collaboration.​​
    This work has been​​​‌ carried out with Ana​ Gainaru and Scott Patkin​‌ (Oak Ridge National Laboratory).​​

The increasing demand for​​​‌ computational resources, particularly in​ High-Performance Computing environments, necessitates​‌ to rethink how we​​ handle job scheduling strategies.​​​‌ In 14, we​ address the challenge of​‌ managing concurrent jobs with​​ differing priorities on overloaded​​​‌ parallel systems, where strict​ QoS constraints are often​‌ difficult for users to​​ define. Our solution relies​​ on a qualitative description​​​‌ of priorities and pulls‌ from two key approaches:‌​‌ the Easy-BF algorithm and​​ the Conservative Backfilling algorithms.​​​‌ This solution improves the‌ response time for high-priority‌​‌ jobs by 50% without​​ affecting the overall system​​​‌ utilization. We show its‌ applicability in several critical‌​‌ scenarios such as High-Performance​​ Computing (HPC) resource management​​​‌ and in-situ computing.

8.3.5‌ Scheduling multiple task-based applications‌​‌ on distributed heterogeneous computing​​ nodes

Participants: Etienne Ndamlabin​​​‌.

  • Collaboration.
    This work‌ has been carried out‌​‌ with Bérenger Bramas (CAMUS​​ team, Inria Nancy).

Modern​​​‌ high-performance computing platforms combine‌ extreme parallelism with growing‌​‌ size, complexity, and cost,​​ making inefficient resource usage​​​‌ increasingly critical in terms‌ of performance and energy.‌​‌ Our research addresses this​​ challenge by focusing on​​​‌ the concurrent execution of‌ multiple task-based applications on‌​‌ shared heterogeneous (CPU/GPU) environments.​​ We created load-balancing heuristics​​​‌ to distribute the task‌ graphs over the processing‌​‌ units and designed and​​ implemented RSCHED, an adaptive​​​‌ scheduling framework integrated into‌ the StarPU runtime system‌​‌ 93. RSCHED dynamically​​ reorganizes resource allocation in​​​‌ response to application progress‌ and completion, while jointly‌​‌ optimizing application makespan and​​ resource utilization during concurrent​​​‌ execution. Experimental results real‌ applications show up ,to‌​‌ a 10× reduction in​​ overall makespan compared to​​​‌ consecutive execution, while increasing‌ resource utilization. RSCHED also‌​‌ highlights the benefits of​​ system-level coordination on top​​​‌ of independent application schedulers,‌ compared to unsupervised concurrent‌​‌ execution.

8.4 Methodological study​​ over the practice of​​​‌ HPC Research

The following‌ contributions are not necessarily‌​‌ building on the team​​ project but are more​​​‌ adjacent. They both discuss‌ how our community performs‌​‌ research, the first one​​ by studying the reproducibility​​​‌ evaluation process of a‌ large HPC conference (SC'24),‌​‌ and the second one​​ by stuying some claims​​​‌ behind the use of‌ LLM to generate scheduling‌​‌ algorithms.

8.4.1 Implementing a​​ Reproducibility Initiative in HPC:​​​‌ Experiences from SC24.

Participants:‌ Guillaume Pallez.

  • Collaboration.‌​‌
    This work has been​​ carried out with Sascha​​​‌ Hunold (University of Vienna)‌ and Judith Hill (Lawrence‌​‌ Livermore National Laboratory).

Reproducibility​​ is fundamental to scientific​​​‌ research, but can be‌ particularly challenging in research‌​‌ that involves High Performance​​ Computing (HPC) due to​​​‌ the unique characteristics of‌ supercomputers. Performance-based metrics such‌​‌ as execution time, energy​​ consumption, and throughput further​​​‌ complicate reproducibility, especially on‌ shared systems. In 15‌​‌, we present our​​ experience implementing a reproducibility​​​‌ initiative at SC24, with‌ particular emphasis on changes‌​‌ made compared to prior​​ SC conferences. We outline​​​‌ HPC-specific challenges, describe the‌ measures adopted to address‌​‌ them, and reflect on​​ the limitations of reproducibility​​​‌ badges. Faced with the‌ constraints of the existing‌​‌ badging nomenclature, we discuss​​ our implementation of a​​​‌ reproducibility report, which aims‌ to provide more context‌​‌ about the reproducibility of​​ each paper. We conclude​​​‌ by recommending that the‌ “Artifact Replicable” badge be‌​‌ dropped by HPC conferences​​ at this time, and​​​‌ discuss alternate ways of‌ ensuring replicability evaluation.

8.4.2‌​‌ An In-depth Study of​​ LLM Contributions to the​​​‌ Bin Packing Problem

Participants:‌ Guillaume Pallez.

  • Collaboration.‌​‌
    This work has been​​​‌ carried out with Julien​ Herrmann (CNRS).

Recent studies​‌ have suggested that Large​​ Language Models (LLMs) could​​​‌ provide interesting ideas contributing​ to mathematical discovery. This​‌ claim was motivated by​​ reports that LLM-based genetic​​​‌ algorithms produced heuristics offering​ new insights into the​‌ online bin packing problem​​ under uniform and Weibull​​​‌ distributions. In 20,​ we reassess this claim​‌ through a detailed analysis​​ of the heuristics produced​​​‌ by LLMs, examining both​ their behavior and interpretability.​‌ Despite being human-readable, these​​ heuristics remain largely opaque​​​‌ even to domain experts.​ Building on this analysis,​‌ we propose a new​​ class of algorithms tailored​​​‌ to these specific bin​ packing instances. The derived​‌ algorithms are significantly simpler,​​ more efficient, more interpretable,​​​‌ and more generalizable, suggesting​ that the considered instances​‌ are themselves relatively simple.​​ We then discuss the​​​‌ limitations of the claim​ regarding LLMs' contribution to​‌ this problem, which appears​​ to rest on the​​​‌ mistaken assumption that the​ instances had previously been​‌ studied. Our findings instead​​ emphasize the need for​​​‌ rigorous validation and contextualization​ when assessing the scientific​‌ value of LLM-generated outputs.​​

9 Partnerships and cooperations​​​‌

9.1 International initiatives

9.1.1​ Associate Teams in the​‌ framework of an Inria​​ International Lab or in​​​‌ the framework of an​ Inria International Program

UNIFY​‌ 2
  • Title:
    Intelligent Unified​​ Data Services for Hybrid​​​‌ Workflows Combining Compute-Intensive Simulations​ and Data-Intensive Analytics at​‌ Extreme Scales - 2​​
  • Duration:
    2023 ->
  • Coordinator:​​​‌
    Tom PETERKA (tpeterka@mcs.anl.gov)
  • Partners:​
    • Argonne National Laboratory Argonne​‌ (États-Unis)
  • Inria contact:
    Gabriel​​ Antoniu
  • Summary:
    Since several​​​‌ years we have been​ witnessing the emergence of​‌ complex workflows combining simulations​​ with data analysis, potentially​​​‌ leveraging machine-learning techniques. Such​ complex workflows seem to​‌ naturally need to jointly​​ use supercomputers interconnected with​​​‌ clouds and potentially Edge-based​ systems. This assembly is​‌ called the Computing Continuum.​​ In a general scheme,​​​‌ Edge devices create streams​ of input data, which​‌ are processed by data​​ analytics and machine learning​​​‌ applications in the Cloud,​ whereas simulations on large,​‌ specialised HPC systems provide​​ insights into and prediction​​​‌ of future system state.​ The emergence of such​‌ workflows is reshaping the​​ traditional vision on the​​​‌ areas involved, as described​ in the ETP4HPC Research​‌ Agenda published in 2020.​​ Building software ecosystems addressing​​​‌ the needs of such​ workflows poses multiple challenges​‌ at several levels. In​​ this context, this Associate​​​‌ Team will focus on​ three related challenges: 1)​‌ How to adequately handle​​ the heterogeneity of storage​​​‌ resources within the Computing​ Continuum to support complex​‌ science workflows? 2) How​​ to efficiently support deep-learning​​​‌ workloads across the Computing​ Continuum? 3) How to​‌ provide reproducibility support for​​ experimentation across the Computing​​​‌ Continuum?

9.2 International research​ visitors

9.2.1 Visits of​‌ international scientists

Swann Perarnau​​
  • Status
    Senior Scientist
  • Institution​​​‌ of origin:
    Argonne National​ Laboratory
  • Country:
    USA
  • Dates:​‌
    Dec 8-10, 2025
  • Context​​ of the visit:
    Jury​​​‌ for PhD of Robin​ Boezennec
  • Mobility program/type of​‌ mobility:
    lecture

9.2.2 Visits​​ to international teams

Research​​​‌ visits abroad
Gabriel Antoniu​ , Jakob Luettgau ,​‌ Arthur Jaquard , Robin​​ Boezennec
  • Visited institution:
    Argonne​​ National Laboratory
  • Country:
    USA​​​‌
  • Dates:
    13-15 May 2025‌
  • Context of the visit:‌​‌
    Exploration of research collaboration​​ on in situ processing​​​‌ with Tom Peterka and‌ Orçun Yildiz.
  • Mobility program/type‌​‌ of mobility:
    Visit during​​ the JLESC workshop.

9.3​​​‌ European initiatives

9.3.1 H2020‌ projects

EUPEX

EUPEX project‌​‌ on cordis.europa.eu

  • Title:
    EUROPEAN​​ PILOT FOR EXASCALE
  • Duration:​​​‌
    From January 1, 2022‌ to December 31, 2026‌​‌
  • Partners:
    • INSTITUT NATIONAL DE​​ RECHERCHE EN INFORMATIQUE ET​​​‌ AUTOMATIQUE (INRIA), France
    • GRAND‌ EQUIPEMENT NATIONAL DE CALCUL‌​‌ INTENSIF (GENCI), France
    • VSB​​ - TECHNICAL UNIVERSITY OF​​​‌ OSTRAVA (VSB - TU‌ Ostrava), Czechia
    • JOHANNES GUTENBERG-UNIVERSITAT‌​‌ MAINZ, Germany
    • FORSCHUNGSZENTRUM JULICH​​ GMBH (FZJ), Germany
    • COMMISSARIAT​​​‌ A L ENERGIE ATOMIQUE‌ ET AUX ENERGIES ALTERNATIVES‌​‌ (CEA), France
    • IDRYMA TECHNOLOGIAS​​ KAI EREVNAS (FOUNDATION FOR​​​‌ RESEARCH AND TECHNOLOGYHELLAS), Greece‌
    • SVEUCILISTE U ZAGREBU FAKULTET‌​‌ ELEKTROTEHNIKE I RACUNARSTVA (UNIZG-FER),​​ Croatia
    • UNIVERSITA DEGLI STUDI​​​‌ DI TORINO (UNITO), Italy‌
    • Consortium Ubiquitous Technologies S.c.a.r.l.‌​‌ (CUBIT), Italy
    • CYBELETECH, France​​
    • UNIVERSITA DI PISA (UNIPI),​​​‌ Italy
    • GRAN SASSO SCIENCE‌ INSTITUTE (GSSI), Italy
    • ISTITUTO‌​‌ NAZIONALE DI ASTROFISICA (INAF),​​ Italy
    • UNIVERSITA DEGLI STUDI​​​‌ DEL MOLISE, Italy
    • E‌ 4 COMPUTER ENGINEERING SPA‌​‌ (E4), Italy
    • CONSIGLIO NAZIONALE​​ DELLE RICERCHE (CNR), Italy​​​‌
    • JOHANN WOLFGANG GOETHE-UNIVERSITAET FRANKFURT‌ AM MAIN (GUF), Germany‌​‌
    • EUROPEAN CENTRE FOR MEDIUM-RANGE​​ WEATHER FORECASTS (ECMWF), United​​​‌ Kingdom
    • BULL SAS (BULL),‌ France
    • POLITECNICO DI MILANO‌​‌ (POLIMI), Italy
    • EXASCALE PERFORMANCE​​ SYSTEMS - EXAPSYS IKE,​​​‌ Greece
    • ALMA MATER STUDIORUM‌ - UNIVERSITA DI BOLOGNA‌​‌ (UNIBO), Italy
    • PARTEC AG​​ (PARTEC), Germany
    • ISTITUTO NAZIONALE​​​‌ DI GEOFISICA E VULCANOLOGIA,‌ Italy
    • CINECA CONSORZIO INTERUNIVERSITARIO‌​‌ (CINECA), Italy
    • SECO SPA​​ (SECO SRL), Italy
    • CONSORZIO​​​‌ INTERUNIVERSITARIO NAZIONALE PER L'INFORMATICA‌ (CINI), Italy
  • Inria contact:‌​‌
    Olivier Beaumont
  • Coordinator:
    Etienne​​ Walter (EVIDEN)
  • Summary:

    The​​​‌ EUPEX consortium aims to‌ design, build, and validate‌​‌ the first EU platform​​ for HPC, covering end-to-end​​​‌ the spectrum of required‌ technologies with European assets:‌​‌ from the architecture, processor,​​ system software, development tools​​​‌ to the applications. The‌ EUPEX prototype will be‌​‌ designed to be open,​​ scalable and flexible, including​​​‌ the modular OpenSequana-compliant platform‌ and the corresponding HPC‌​‌ software ecosystem for the​​ Modular Supercomputing Architecture. Scientifically,​​​‌ EUPEX is a vehicle‌ to prepare HPC, AI,‌​‌ and Big Data processing​​ communities for upcoming European​​​‌ Exascale systems and technologies.‌ The hardware platform is‌​‌ sized to be large​​ enough for relevant application​​​‌ preparation and scalability forecast,‌ and a proof of‌​‌ concept for a modular​​ architecture relying on European​​​‌ technologies in general and‌ on European Processor Technology‌​‌ (EPI) in particular. In​​ this context, a strong​​​‌ emphasis is put on‌ the system software stack‌​‌ and the applications.

    Being​​ the first of its​​​‌ kind, EUPEX sets the‌ ambitious challenge of gathering,‌​‌ distilling and integrating European​​ technologies that the scientific​​​‌ and industrial partners use‌ to build a production-grade‌​‌ prototype. EUPEX will lay​​ the foundations for Europe's​​​‌ future digital sovereignty. It‌ has the potential for‌​‌ the creation of a​​ sustainable European scientific and​​​‌ industrial HPC ecosystem and‌ should stimulate science and‌​‌ technology more than any​​ national strategy (for numerical​​​‌ simulation, machine learning and‌ AI, Big Data processing).‌​‌

    The EUPEX consortium –​​​‌ constituted of key actors​ on the European HPC​‌ scene – has the​​ capacity and the will​​​‌ to provide a fundamental​ contribution to the consolidation​‌ of European supercomputing ecosystem.​​ EUPEX aims to directly​​​‌ support an emerging and​ vibrant European entrepreneurial ecosystem​‌ in AI and Big​​ Data processing that will​​​‌ leverage HPC as a​ main enabling technology.

9.3.2​‌ Collaborations with Major European​​ Organizations

Participants: Gabriel Antoniu​​​‌, Alexandru Costan,​ Jakob Luettgau.

ETP4HPC:​‌ Since 2019, Gabriel Antoniu​​ has served as a​​​‌ co-leader of the working​ group on Programming Environments,​‌ contributing to two successive​​ versions of the Strategic​​​‌ Research Agenda of ETP4HPC.​ Alexandru Costan served as​‌ a member of this​​ working group. Jakob Luettgau​​​‌ served as a member​ of the working group​‌ on Data Storage and​​ I/O. A white paper​​​‌ of this group 25​ was published in 2025.​‌

9.4 National initiatives

Exa-DoST​​

Participants: Gabriel Antoniu,​​​‌ François Tessier, Julien​ Monniot, Jakob Luetgau​‌, Etienne Ndamlabin,​​ Silvina Caino Lores,​​​‌ Guilaume Pallez.

Exa-DoST​ project of the NumPEx​‌ PEPR program

  • Title:
    Data-oriented​​ Software and Tools for​​​‌ the Exascale
  • Duration:
    From​ January 1, 2023 to​‌ April 1, 2030
  • Partners:​​
    • Inria
    • CEA
    • CNRS
    • University​​​‌ of Bordeaux
    • Observatoire de​ Paris
    • Observatoire de la​‌ Côte d'Azure
    • Data Direct​​ Networks France (DDN)
  • Coordinator:​​​‌
    Gabriel Antoniu (KerData Team,​ Inria)
  • Summary:

    The advent​‌ of future Exascale supercomputers​​ raises multiple data-related challenges.​​​‌ To enable applications to​ fully leverage the upcoming​‌ infrastructures, a major challenge​​ concerns the scalability of​​​‌ techniques used for data​ storage, transfer, processing and​‌ analytics. Additional key challenges​​ emerge from the need​​​‌ to adequately exploit emerging​ technologies for storage and​‌ processing, leading to new,​​ more complex storage hierarchies.​​​‌ Finally, it now becomes​ necessary to support more​‌ and more complex hybrid​​ workflows involving at the​​​‌ same time simulation, analytics​ and learning, running at​‌ extreme scales across supercomputers​​ interconnected to clouds and​​​‌ edgebased systems. The Exa-DoST​ project will address most​‌ of these challenges, organized​​ in 3 areas:

    • Scalable​​​‌ storage and I/O;
    • Scalable​ in situ processing;
    • Scalable​‌ smart analytics.

    As part​​ of the NumPEx program,​​​‌ Exa-DoST targets a much​ higher technology readiness level​‌ than previous national projects​​ concerning the HPC software​​​‌ stack. It will address​ the major data challenges​‌ by proposing operational solutions​​ co-designed and validated in​​​‌ French and European applications.​ This will allow filling​‌ the gap left by​​ previous international projects to​​​‌ ensure that French and​ European needs are taken​‌ into account in the​​ roadmaps for building the​​​‌ data-oriented Exascale software stack.​

STEEL

Participants: Gabriel Antoniu​‌, Alexandru Costan,​​ Jakob Luettgau, François​​​‌ Tessier, Mathis Valli​, Thomas Badts.​‌

  • Title:
    Secure and efficient​​ daTa storagE and procEssing​​​‌ on cLoud-based infrastructures
  • Duration:​
    From June 1, 2023​‌ to 31 August 2030​​
  • Partners:
    • Inria
    • CNRS
    • Institut​​​‌ Mines Télécom (IMT)
    • University​ of Bordeaux
    • University of​‌ Rennes
    • INSA Rennes
    • INSA​​ Lyon
  • Coordinator:
    Gabriel Antoniu​​​‌ (KerData Team, Inria)
  • Summary:​
    The strong development of​‌ cloud computing since its​​ emergence in 2007 and​​ its massive adoption for​​​‌ the storage of unprecedented‌ volumes of data in‌​‌ a growing number of​​ domains has brought to​​​‌ light major technological challenges.‌ In this project we‌​‌ will address several of​​ these challenges, organized in​​​‌ three research directions. The‌ first direction concerns the‌​‌ exploitation of emerging technologies​​ for efficient storage on​​​‌ cloud infrastructures. We will‌ address this challenge through‌​‌ NVRAM-based distributed performance storage​​ solutions, as close as​​​‌ possible to data production‌ and consumption locations (disaggregation‌​‌ principle) and develop strategies​​ to optimize the trade-off​​​‌ between data consistency and‌ access performance. The second‌​‌ direction concerns the efficient​​ storage and processing of​​​‌ data on hybrid, heterogeneous‌ infrastructures within the digital‌​‌ edge-cloud-supercomputer continuum. In many​​ domains (autonomous cars, predictive​​​‌ maintenance, intelligent buildings, etc.)‌ we are witnessing the‌​‌ emergence of hybrid workflows​​ combining simulations, analysis of​​​‌ sensor data flows and‌ machine learning. Their execution‌​‌ requires storage resources ranging​​ from the edge to​​​‌ cloud infrastructures, and even‌ to supercomputers, which poses‌​‌ challenges for unified data​​ storage and processing. The​​​‌ third research direction is‌ dedicated to confidential storage,‌​‌ in connection with the​​ need to store and​​​‌ analyze large volumes of‌ data of strategic interest‌​‌ or of a personal​​ nature. For all of​​​‌ these directions, the project‌ will take into account‌​‌ the need to propose​​ and validate interoperable approaches​​​‌ with a potential for‌ transfer to major French‌​‌ or European industrial players​​ in cloud computing.
ECLAT​​​‌

Participants: François Tessier,‌ Gabriel Antoniu, Théo‌​‌ Jolivel, Jakob Luettgau​​, Thomas Badts.​​​‌

  • Title:
    Extreme Computing Laboratory‌ for Astronomical Telescopes
  • Duration:‌​‌
    Since May, 2024
  • Partners:​​
    • Inria
    • CNRS
    • Université de​​​‌ Rennes
    • Eviden
    • Observatoire de‌ la Côte d'Azur
    • Observatoire‌​‌ de Paris
    • Université Paris-Saclay​​
    • Centrale Supelec
  • Coordinator:
    Gabriel​​​‌ Antoniu (KerData Team, Inria)‌
  • Summary:
    ECLAT is positioned‌​‌ as a center of​​ excellence dedicated to High-Performance​​​‌ Computing (HPC) and Artificial‌ Intelligence (AI) technologies and‌​‌ techniques applied to astronomical​​ instrumentation. This project brings​​​‌ together sixteen partner laboratories‌ and teams around a‌​‌ common roadmap, aimed at​​ strengthening research and development​​​‌ (R&D) collaborations. The aim‌ is to design and‌​‌ build future cyber-physical systems​​ for astronomy, capable of​​​‌ managing, processing and optimizing‌ gigantic volumes of data.‌​‌
Grid'5000

We are members​​ of Grid'5000 community and​​​‌ run experiments on the‌ Grid'5000 platform on a‌​‌ daily basis.

Inria Exploratory​​ program: Repas

Participants: Guillaume​​​‌ Pallez.

  • Project Acronym:‌
    REPAS
  • Title:
    New Portrayal‌​‌ of HPC Applications
  • Coordinator:​​
    Guillaume Pallez
  • Collaboration:
    This​​​‌ is done in collaboration‌ with the team DATAMOVE‌​‌ (Inria Grenoble)
  • Duration:
    2022-2025​​
  • Summary:
    What is the​​​‌ right way to represent‌ an application in order‌​‌ to run it on​​ a highly parallel (typically​​​‌ exascale) machine? The idea‌ of project is to‌​‌ completely review the models​​ used in the development​​​‌ scheduling algorithms and software‌ solutions to take into‌​‌ account the real needs​​ of new users of​​​‌ HPC platforms.

10 Dissemination‌

10.1 Promoting scientific activities‌​‌

10.1.1 Scientific events: organisation​​

General chair, scientific chair​​​‌
  • François Tessier
    • General co-Chair‌ of ISPDC 2025,‌​‌ the 24th IEEE International​​​‌ Symposium on Parallel and​ Distributed Computing (Rennes, France).​‌
    • Workshop co-Chair of ESSA​​ 2025, the 6th​​​‌ Workshop on Extreme-Scale Storage​ and Analysis held in​‌ conjunction with IPDPS 2025​​ (Milan, Italy).
    • Workshop co-Chair​​​‌ of Supercompcloud, the​ 9th Workshop on Interoperability​‌ of Supercomputing and Cloud​​ Technologies combined with OpenCHAMI​​​‌ held in conjunction with​ ISC 2025 (Hamburg, Germany).​‌
  • Alexandru Costan:
    • General co-Chair​​ of ISPDC 2025,​​​‌ the 24th IEEE International​ Symposium on Parallel and​‌ Distributed Computing (Rennes, France).​​
    • Workshop co-Chair of FlexScience​​​‌ 2025, the 15th​ Workshop on AI and​‌ Scientific Computing at Scale​​ using Flexible Computing Infrastructures,​​​‌ held in conjuncciton with​ ACM HPDC 2025 (Notre​‌ Dame, USA).
  • Guillaume Pallez​​
    • Co-General chair of IPDPS'26​​​‌, 40th IEEE International​ Parallel & Distributed Processing​‌ Symposium (New Orleans, USA).​​
    • Member of the Steering​​​‌ Committee of ICPP,​ International Conference on Parallel​‌ Processing.
  • Silvina Caino Lores​​
    • General Co-Chair of WORKS​​​‌ 2025, the 20th​ Workshop on Workflows in​‌ Support of Large-Scale Science,​​ held in conjunction with​​​‌ SC 2025 (St. Louis,​ USA).
  • Gabriel Antoniu
Member of the organizing​‌ committees
  • Jakob Luettgau:
  • Gabriel Antoniu:
  • Théo Jolivel:
    • Web​​​‌ Chair of ISPDC 2025,​ the 24th IEEE International​‌ Symposium on Parallel and​​ Distributed Computing (Rennes, France).​​​‌
  • Arthur Jaquard:
    • Web Chair​ of WORKS 2025, the​‌ 20th Workshop on Workflows​​ in Support of Large-Scale​​​‌ Science (St. Louis, MO,​ USA)

10.1.2 Scientific events:​‌ selection

Chair of conference​​ program committees
  • François Tessier​​​‌
    • Program Co-Chair of HiPC​ 2025, the 32nd​‌ edition of the IEEE​​ International Conference on High​​​‌ Performance Computing, Data, and​ Analytics (Hyderabad, India).
Member​‌ of the conference program​​ committees
  • François Tessier:
    CCGrid2025,​​​‌ ISC25 (Workshop proposals)
  • Gabriel​ Antoniu:
    HPDC 2025, Cluster​‌ 2025
  • Alexandru Costan:
    SC'25​​ (Posters and ACM SRC​​​‌ track), IPDPS 25 (PhD​ Forum), EuroPar 2025, BigData​‌ 2025, HiPC 2025, CCGrid​​ 2025
Reviewer
  • Théo Jolivel:​​
    • IEEE CCGrid25
  • Mathis Valli:​​​‌
    • IEEE BigData 2025
  • Arthur‌ Jaquard:
    • CCGRID2025

10.1.3 Journal‌​‌

Member of the editorial​​ boards
  • Guillaume Pallez :​​​‌
    • IEEE TPDS
    • IEEE TOPC‌

10.1.4 Invited talks

  • Guillaume‌​‌ Pallez :
    • « Vers​​ un calcul intensif plus​​​‌ sobre », organisé par‌ Laboratoire 1.5
    • “Model (co)-Design‌​‌ and Accuracy for Resource​​ Management in HPC” at​​​‌ Co-Design workshop (Osaka, Jn)‌ co-organized by Jack Dongarra‌​‌ and the Chinese Academy​​ of Science
  • François Tessier​​​‌ :
    • "The Difficult Task‌ of Understanding I/O Behavior‌​‌ on Large-scale Systems", Keynote​​ talk at the 3rd​​​‌ NHR Conference, Germany‌

10.1.5 Leadership within the‌​‌ scientific community

  • Gabriel Antoniu​​ :
    • Large National project​​​‌ management: Coordinator of ExaDoST,‌ one of the 5‌​‌ targeted projects of the​​ NumPEx PEPR project (started​​​‌ in 2023, budget: 6.2‌ M€). Coordinator of STEEL,‌​‌ one of the 7​​ high-priority projects of the​​​‌ CLOUD PEPR project (started‌ in 2023, budget: 2.8‌​‌ M€).
    • ETP4HPC: Since 2019,​​ co-leader of the working​​​‌ group on Programming Environments,‌ lead co-author of the‌​‌ corresponding chapter of the​​ Strategic Research Agenda of​​​‌ ETP4HPC.
    • International lab management:‌ Executive Director of JLESC‌​‌ for Inria since April​​ 2024 (previously Vice Executive​​​‌ Director). JLESC is the‌ Joint Inria-Illinois-ANL-BSC-JSC-RIKEN/AICS Laboratory for‌​‌ Extreme-Scale Computing. Within JLESC,​​ he also serves as​​​‌ a Topic Leader for‌ Data storage, I/O and‌​‌ in situ processing for​​ Inria.
    • International Working Group​​​‌ management: Co-Leader of the‌ Working group on Data‌​‌ management and Computing Continuum​​ within the InPEX International​​​‌ Post-Exascale Project.
    • Team‌ management: Head of the‌​‌ KerData Project-Team (INRIA-INSA Rennes).​​
    • International Associate Team management:​​​‌ Leader of the UNIFY2‌ Associate Team with Argonne‌​‌ National Lab (2013–2025).
  • François​​ Tessier :
    • Work package​​​‌ co-leader with Francieli Boito‌ (Associate Professor, University of‌​‌ Bordeaux) within the NumPEX​​ ExaDoST project.
    • Leader for​​​‌ KerData in the ECLAT‌ joint laboratory.
  • Alexandru Costan‌​‌ :
    • Work package leader​​ of WP2 within the​​​‌ PEPR CLOUD STEEL project.‌

10.1.6 Scientific expertise

  • Gabriel‌​‌ Antoniu:
    • Evaluator for a​​ Horizon Europe project (HORIZON-CL4-2021-HUMAN-01​​​‌ call)
  • Alexandru Costan:
    • Evaluator‌ for several projects submitted‌​‌ to FFPlus, a European​​ initiative highlighting and promoting​​​‌ the adoption of High-Performance‌ Computing (HPC) by SMEs‌​‌ and start-ups across Europe)​​
    • Member of the jury​​​‌ for GDR RSD Prix‌ de thèse, Prix chercheur‌​‌

10.1.7 Research administration

  • François​​ Tessier
    • Member of the​​​‌ Commission on Health, Safety‌ and Working Conditions (now‌​‌ called FSS) within the​​ Inria center of Rennes​​​‌
  • Guillaume Pallez:
    • Member of‌ the National Commission on‌​‌ Health, Safety and Working​​ Conditions (now called FS)​​​‌
    • Member of the Scientific‌ Board of Inria
  • Gabriel‌​‌ Antoniu:
    • Member of the​​ Inria HRS4R Steering Committee​​​‌ (HRS4R: European Human Resources‌ Strategy for Research)

10.2‌​‌ Teaching - Supervision -​​ Juries - Educational and​​​‌ pedagogical outreach

10.2.1 Teaching‌

  • Alexandru Costan
    • Bachelor: Software‌​‌ Engineering and Java Programming,​​ 28 hours (lab sessions),​​​‌ L3, INSA Rennes.
    • Bachelor:‌ Databases, 68 hours (lectures‌​‌ and lab sessions), L2,​​ INSA Rennes.
    • . Bachelor:​​​‌ Practical case studies, 24‌ hours (project), L3, INSA‌​‌ Rennes.
    • Master: Big Data​​ Storage and Processing, 28h​​​‌ hours (lectures, lab sessions),‌ M1, INSA Rennes.
    • Master:‌​‌ Algorithms for Big Data,​​​‌ 28 hours (lectures, lab​ sessions), M2, INSA Rennes.​‌
    • Master: Big Data Project,​​ 28 hours (project), M2,​​​‌ INSA Rennes.
  • Gabriel Antoniu:​
    • Master (Engineering Degree, 5th​‌ year): NoSQL and Cloud​​ technologies, 21 hours (lectures),​​​‌ M2 level, ENSAI (​École nationale supérieure de​‌ la statistique et de​​ l'analyse de l'information),​​​‌ Bruz.
    • Master: Infrastructures for​ Big Data, 14 hours​‌ (lectures), M1 level, IBD​​ Module, University of Rennes.​​​‌
    • Master: Cloud Computing and​ Big Data, 14 hours​‌ (lectures), M2 level, Cloud​​ Module, MIAGE Master Program,​​​‌ University of Rennes.
  • François​ Tessier
    • Bachelor: Computer science​‌ discovery, 15 hours (lab​​ sessions), L1 level, DIE​​​‌ Module, ISTIC, University of​ Rennes.
    • Master: Cloud Computing​‌ and Big Data, 15​​ hours (lectures), M2 level,​​​‌ Cloud Module, MIAGE Master​ Program, University of Rennes.​‌
    • Master (Engineering Degree, 4th​​ year): Storage on Clouds,​​​‌ 5 hours (lecture and​ lab session), M2 level,​‌ IMT Atlantique, Rennes.
  • Jakob​​ Luettgau:
    • Master: Cloud and​​​‌ Network Infrastructures (CNI), 4​ hours (lectures), M2 level,​‌ Master Program, University of​​ Rennes.
  • Théo Jolivel:
    • Master:​​​‌ Cloud Computing and Big​ Data, 36 hours (lab​‌ sessions), M2 level, Cloud​​ Module, MIAGE Master Program,​​​‌ University of Rennes.
  • Mathis​ Valli:
    • Bachelor: Databases, 12​‌ hours (lab sessions), L3,​​ INSA Rennes.

10.2.2 Supervision​​​‌

  • Defended PhD theses:
    • Cédric​ Prigent, "Supporting Online Learning​‌ and Inference in Parallel​​ across the Digital Continuum",​​​‌ thesis started in November​ 2021, co-advised by Alexandru​‌ Costan, Gabriel Antoniu and​​ Loïc Cudennec (DGA). Defended​​​‌ on 25 May 2025.​
    • Robin Boezennec, “Reducing HPC​‌ Resource Consumption”, defended on​​ December 10th, 2025, co-advised​​​‌ by Guillaume Pallez and​ Fanny Dufossé (Datamove, Grenoble).​‌ Defended on 10 December​​ 2025.
  • PhD in progress:​​​‌
    • Mathis Valli, "Comparative Analysis​ of Federated Learning: Simulations​‌ Versus Real-World Testbeds in​​ dynamic settings", thesis started​​​‌ in April 2023, co-advised​ by Alexandru Costan, Cédric​‌ Tedeschi (Myriads) and Loïc​​ Cudennec (DGA).
    • Théo Jolivel,​​​‌ "Modeling and Simulation of​ Exascale Storage Systems", thesis​‌ started in October 2024,​​ co-advised by François Tessier,​​​‌ Gabriel Antoniu and Philippe​ Deniel (CEA).
    • Arthur Jaquard,​‌ "Dynamic in situ and​​ in transit data analysis​​​‌ for Exascale Computing using​ Damaris", thesis started in​‌ October 2024, co-advised by​​ Gabriel Antoniu, Laurent Colombet​​​‌ (CEA), Silvina Caino-Lores, and​ Julien Bigot (CEA).
    • Méline​‌ Trochon, "Adaptive Checkpoint-Restart System​​ with Knowledge of the​​​‌ Network Load", CIFRE thesis​ started in February 2025,​‌ located at Inria Bordeaux,​​ co-supervised by Francieli Boito,​​​‌ Brice Goglin (TADaaM -​ Inria Bordeaux), Jean-Thomas Acquaviva​‌ (DDN) and François Tessier.​​
    • Serge Meurrens, "Ordonnancement des​​​‌ E/S adapté aux applications​ dans les systèmes HPC",​‌ thesis started in December​​ 2025, located at Inria​​​‌ Bordeaux, co-supervised by Francieli​ Boito, Luan Teylo (TADaaM​‌ - Inria Bordeaux) and​​ François Tessier.
    • Simon Renard​​​‌ , “Data Interfaces for​ Hybrid Quantum-Classical Computational Workflows”,​‌ thesis started on October​​ 2025, co-supervised by Silvina​​​‌ Caino Lores ,Gabriel​ Antoniu and Marc Baboulin​‌ (Inria Paris-Saclay).
    • Alix Tremodeux​​ , “Etude des conséquences​​​‌ du vieillissement sur les​ machines HPC”, thesis started​‌ on September 2025, co-supervised​​ by Guillaume Pallez and​​​‌ Erven Rohou (PACAP -​ Inria Rennes).
  • Internships:
    • Remy​‌ Chiv, "Analyse et optimisation​​ des entrées/sorties d'un pipeline​​ de traitement de données​​​‌ pour la radio-astronomie à‌ grande échelle", 5-month Master‌​‌ 2 internship started in​​ May 2025, supervised by​​​‌ François Tessier.

10.2.3 Juries‌

  • Gabriel Antoniu :
    • HDR:‌​‌ Towards Better I/O Resource​​ Usage in HPC,​​​‌ Francieli Zanon Boito, Université‌ de Bordeaux, defended on‌​‌ 5 December 2025.
  • Alexandru​​ Costan :
    • PhD: Complexity​​​‌ and Algorithmic results for‌ Translocation Distances, Maria‌​‌ Constantinescu, University of Bucharest,​​ defended on 29 May​​​‌ 2025.

11 Scientific production‌

11.1 Major publications

11.2 Publications of​‌ the year

International journals​​

International peer-reviewed​‌ conferences

Doctoral dissertations​​ and habilitation theses

  • 17​​​‌ thesisC.Cédric Prigent‌. Towards Efficient and‌​‌ Trustworthy Federated Learning on​​ the Computing Continuum.​​​‌INSA de RennesMay‌ 2025HAL

Reports &‌​‌ preprints

Other scientific publications‌

11.3 Cited publications

  • 26​​ articleD. P.David​​​‌ Perez Abreu, K.​Karima Velasquez, M.​‌Marilia Curado and E.​​Edmundo Monteiro. A​​​‌ Comparative Analysis of Simulators​ for the Cloud to​‌ Fog Continuum.Simulation​​ Modelling Practice and Theory​​​‌2019, 102029back​ to text
  • 27 article​‌F.Fatih Abut and​​ M.Mehmet Kızıldağ.​​​‌ Design and Implementation of​ a Reconfigurable Test Environment​‌ for Network Measurement Tools​​ Based on a Control​​​‌ and Management Framework.​Applied Sciences151​‌2025, URL: https://www.mdpi.com/2076-3417/15/1/487​​DOIback to text​​​‌
  • 28 bookB.Blake​ Alcott, M.Mario​‌ Giampietro, K.Kozo​​ Mayumi and J.John​​​‌ Polimeni. The Jevons​ paradox and the myth​‌ of resource efficiency improvements​​.Routledge2012back​​​‌ to text
  • 29 article​A.Alon Amid,​‌ D.David Biancolin,​​ A.Abraham Gonzalez,​​​‌ D.Daniel Grubb,​ S.Sagar Karandikar,​‌ H.Harrison Liew,​​ A.Albert Magyar,​​​‌ H.Howard Mao,​ A.Albert Ou,​‌ N.Nathan Pemberton,​​ P.Paul Rigge,​​​‌ C.Colin Schmidt,​ J.John Wright,​‌ J.Jerry Zhao,​​ Y. S.Yakun Sophia​​​‌ Shao, K.Krste​ Asanović and B.Borivoje​‌ Nikolić. Chipyard: Integrated​​ Design, Simulation, and Implementation​​​‌ Framework for Custom SoCs​.IEEE Micro40​‌42020, 10-21​​DOIback to text​​​‌back to text
  • 30​ articleM.Marianna Anagnostou​‌, O.Olga Karvounidou​​, C.Chrysovalantou Katritzidaki​​​‌, C.Christina Kechagia​, K.Kyriaki Melidou​‌, E.Eleni Mpeza​​, I.Ioannis Konstantinidis​​​‌, E.Eleni Kapantai​, C.Christos Berberidis​‌, I.Ioannis Magnisalis​​ and others. Characteristics​​​‌ and challenges in the​ industries towards responsible AI:​‌ a systematic literature review​​.Ethics and Information​​​‌ Technology2432022​, 37back to​‌ text
  • 31 bookG.​​Gabriel Antoniu, P.​​​‌Patrick Valduriez, H.-C.​Hans-Christian Hoppe and J.​‌Jens KrÃŒger. Towards​​ Integrated Hardware/Software Ecosystems for​​​‌ the Edge-Cloud-HPC Continuum.​ETP4HPC White PapersETP4HPC:​‌ European Technology Platform for​​ High Performance Computing2021​​​‌HALDOIback to​ text
  • 32 softwareU.​‌University of Applied Sciences​​ Northwestern Switzerland. Karabo-Pipeline​​​‌.v0.34.0 lic: MIT​.back to text​‌
  • 33 articleV.Vijay​​ Arya, R. K.​​​‌Rachel KE Bellamy,​ P.-Y.Pin-Yu Chen,​‌ A.Amit Dhurandhar,​​ M.Michael Hind,​​​‌ S. C.Samuel C​ Hoffman, S.Stephanie​‌ Houde, Q. V.​​Q Vera Liao,​​​‌ R.Ronny Luss,​ A.Aleksandra Mojsilović and​‌ others. One explanation​​ does not fit all:​​​‌ A toolkit and taxonomy​ of ai explainability techniques​‌.arXiv preprint arXiv:1909.03012​​2019back to text​​​‌
  • 34 articleM.Mark​ Asch, T.Terry​‌ Moore, R.R​​ Badia, M.Micah​​​‌ Beck, P.P​ Beckman, T.T​‌ Bidot, F.François​​ Bodin, F.Franck​​​‌ Cappello, A.A​ Choudhary, B.B​‌ De Supinski and others​​. Big data and​​ extreme-scale computing: Pathways to​​​‌ convergence-toward a shaping strategy‌ for a future software‌​‌ and data ecosystem for​​ scientific inquiry.The​​​‌ International Journal of High‌ Performance Computing Applications32‌​‌42018, 435--479​​back to text
  • 35​​​‌ phdthesisG.Guillaume Aupy‌. Resilient and energy-efficient‌​‌ scheduling algorithms at scale​​.École Normale Supérieure​​​‌ de Lyon2014back‌ to text
  • 36 inproceedings‌​‌S.Sana Awan,​​ B.Bo Luo and​​​‌ F.Fengjun Li.‌ Contra: Defending against poisoning‌​‌ attacks in federated learning​​.Computer Security--ESORICS 2021:​​​‌ 26th European Symposium on‌ Research in Computer Security,‌​‌ Darmstadt, Germany, October 4--8,​​ 2021, Proceedings, Part I​​​‌ 26Springer2021,‌ 455--475back to text‌​‌
  • 37 articleP.Paul​​ Ayris, J.-Y.Jean-Yves​​​‌ Berthou, R.Rachel‌ Bruce, S.Stefanie‌​‌ Lindstaedt, A.Anna​​ Monreale, B.Barend​​​‌ Mons, Y.Yasuhiro‌ Murayama, C.Caj‌​‌ Soedergaard, K.Klaus​​ Tochtermann and R.Ross​​​‌ Wilkinson. Realising the‌ European open science cloud‌​‌.2016back to​​ text
  • 38 inproceedingsJ.​​​‌Jonathan Bachrach, H.‌Huy Vo, B.‌​‌Brian Richards, Y.​​Yunsup Lee, A.​​​‌Andrew Waterman, R.‌Rimas Aviżienis, J.‌​‌John Wawrzynek and K.​​Krste Asanović. Chisel:​​​‌ constructing hardware in a‌ Scala embedded language.‌​‌Proceedings of the 49th​​ Annual Design Automation Conference​​​‌DAC '12New York,‌ NY, USASan Francisco,‌​‌ CaliforniaAssociation for Computing​​ Machinery2012, 1216–1225​​​‌URL: https://doi.org/10.1145/2228360.2228584DOIback‌ to textback to‌​‌ text
  • 39 articleY.​​Yogesh Balaji, M.​​​‌Mehrdad Farajtabar, D.‌Dong Yin, A.‌​‌Alex Mott and A.​​Ang Li. The​​​‌ effectiveness of memory replay‌ in large scale continual‌​‌ learning.arXiv preprint​​ arXiv:2010.024182020back to​​​‌ text
  • 40 articleD.‌Daniel Balouek-Thomert, E.‌​‌ G.Eduard Gibert Renart​​, A. R.Ali​​​‌ Reza Zamani, A.‌Anthony Simonet and M.‌​‌Manish Parashar. Towards​​ a computing continuum: Enabling​​​‌ edge-to-cloud integration for data-driven‌ workflows.The International‌​‌ Journal of High Performance​​ Computing Applications336​​​‌2019, 1159--1174back‌ to text
  • 41 article‌​‌L. A.L. A.​​ Barba and G. K.​​​‌G. K. Thiruvathukal.‌ Reproducible Research for Computing‌​‌ in Science Engineering.​​Computing in Science Engineering​​​‌1962017,‌ 85-87back to text‌​‌
  • 42 miscE.Eamon​​ Barrett. Taiwan’s drought​​​‌ is exposing just how‌ much water chipmakers like‌​‌ TSMC use (and reuse)​​.2021back to​​​‌ text
  • 43 inproceedingsM.‌Micah Beck, T.‌​‌Terry Moore, P.​​Piotr Luszczek and A.​​​‌Anthony Danalis. Interoperable‌ Convergence of Storage, Networking,‌​‌ and Computation.Advances​​ in Information and Communication​​​‌ChamSpringer International Publishing‌2020, 667--690back‌​‌ to text
  • 44 article​​T.Tal Ben-Nun and​​​‌ T.Torsten Hoefler.‌ Demystifying parallel and distributed‌​‌ deep learning: An in-depth​​ concurrency analysis.ACM​​​‌ Computing Surveys524‌2019, 1--43back‌​‌ to text
  • 45 inproceedings​​J. C.Janine C.​​​‌ Bennett, H.Hasan‌ Abbasi, P.-T.Peer-Timo‌​‌ Bremer, R.Ray​​​‌ Grout, A.Attila​ Gyulassy, T.Tong​‌ Jin, S.Scott​​ Klasky, H.Hemanth​​​‌ Kolla, M.Manish​ Parashar, V.Valerio​‌ Pascucci, P.Philippe​​ Pebay, D.David​​​‌ Thompson, H.Hongfeng​ Yu, F.Fan​‌ Zhang and J.Jacqueline​​ Chen. Combining in-situ​​​‌ and in-transit processing to​ enable extreme-scale scientific analysis​‌.SC '12: Proceedings​​ of the International Conference​​​‌ on High Performance Computing,​ Networking, Storage and Analysis​‌2012, 1-9DOI​​back to text
  • 46​​​‌ articleE.Elisa Bertino​, S.Suparna Bhattacharya​‌, E.Elena Ferrari​​ and D.Dejan Milojicic​​​‌. Trustworthy AI and​ Data Lineage.IEEE​‌ Internet Computing276​​2023, 5--6back​​​‌ to text
  • 47 article​P.Peva Blanchard,​‌ E. M.El Mahdi​​ El Mhamdi, R.​​​‌Rachid Guerraoui and J.​Julien Stainer. Machine​‌ learning with adversaries: Byzantine​​ tolerant gradient descent.​​​‌Advances in neural information​ processing systems302017​‌back to text
  • 48​​ articleP.Pat Bosshart​​​‌, D.Dan Daly​, G.Glen Gibb​‌, M.Martin Izzard​​, N.Nick McKeown​​​‌, J.Jennifer Rexford​, C.Cole Schlesinger​‌, D.Dan Talayco​​, A.Amin Vahdat​​​‌, G.George Varghese​ and D.David Walker​‌. P4: Programming Protocol-Independent​​ Packet Processors.ACM​​​‌ SIGCOMM Computer Communication Review​443July 2014​‌, 87--95URL: https://dl.acm.org/doi/10.1145/2656877.2656890​​DOIback to text​​​‌
  • 49 miscJ. C.​Joshua C Bowden,​‌ F.François Tessier,​​ C.Charles Deltel,​​​‌ S.Simone Bnà and​ G.Gabriel Antoniu.​‌ P. P.PRACE: Partnership​​ for Advanced Computing in​​​‌ Europe, eds. In-situ​ visualization using Damaris: the​‌ Code Saturne use case​​.PRACE White Paper​​​‌PRACE: Partnership for Advanced​ Computing in EuropeSeptember​‌ 2021HALback to​​ text
  • 50 articleA.​​​‌Arnaud Braud, G.​Gaël Fromentoux, B.​‌Benoit Radier and O.​​Olivier Le Grand.​​​‌ The road to European​ digital sovereignty with Gaia-X​‌ and IDSA.IEEE​​ network3522021​​​‌, 4--5back to​ text
  • 51 articleC.​‌Christopher Briggs, Z.​​Zhong Fan and P.​​​‌Péter András. Federated​ learning with hierarchical clustering​‌ of local updates to​​ improve training on non-IID​​​‌ data.2020 International​ Joint Conference on Neural​‌ Networks (IJCNN)2020,​​ 1-9URL: https://api.semanticscholar.org/CorpusID:216144447back​​​‌ to text
  • 52 inproceedings​F.Francois Broquedis,​‌ J.Jérôme Clet-Ortega,​​ S.Stéphanie Moreaud,​​​‌ N.Nathalie Furmento,​ B.Brice Goglin,​‌ G.Guillaume Mercier,​​ S.Samuel Thibault and​​​‌ R.Raymond Namyst.​ hwloc: A Generic Framework​‌ for Managing Hardware Affinities​​ in HPC Applications.​​​‌2010 18th Euromicro Conference​ on Parallel, Distributed and​‌ Network-based Processing2010,​​ 180-186DOIback to​​​‌ text
  • 53 inproceedingsP.​Pablo Brox, J.​‌Javier Garcia-Blas, D.​​ E.David E Singh​​​‌ and J.Jesus Carretero​. DICE: Generic Data​‌ Abstraction for Enhancing the​​ Convergence of HPC and​​​‌ Big Data.High​ Performance Computing: 8th Latin​‌ American Conference, CARLA 2021,​​ Guadalajara, Mexico, October 6--8,​​ 2021, Revised Selected Papers​​​‌Springer2022, 106--119‌back to text
  • 54‌​‌ inproceedingsP.Pietro Buzzega​​, M.Matteo Boschini​​​‌ and S.Simone Calderara‌. Rethinking experience replay:‌​‌ A bag of tricks​​ for continual learning.​​​‌25th International Conference on‌ Pattern Recognition (ICPR)2021‌​‌, 2180--2187back to​​ text
  • 55 inproceedingsP.​​​‌Philip Carns, R.‌Robert Latham, R.‌​‌Robert Ross, K.​​Kamil Iskra, S.​​​‌Samuel Lang and K.‌Katherine Riley. 24/7‌​‌ characterization of petascale I/O​​ workloads.2009 IEEE​​​‌ International Conference on Cluster‌ Computing and WorkshopsIEEE‌​‌2009, 1--10back​​ to text
  • 56 misc​​​‌P.Paul Carpenter,‌ U.-U.Utz-Uwe Haus,‌​‌ E.Erwin Laure,​​ S.Sai Narasimhamurthy and​​​‌ E.Estela Suarez.‌ Heterogeneity is here to‌​‌ stay: Challenges and Opportunities​​ in HPC.February​​​‌ 2022, URL: https://www.etp4hpc.eu/pujades/files/ETP4HPC_WP_Heterogeneous-HPC_20220216.pdf‌back to text
  • 57‌​‌ articleH.Henri Casanova​​, R.Rafael Ferreira​​​‌ da Silva, R.‌Ryan Tanaka, S.‌​‌Suraj Pandey, G.​​Gautam Jethwani, W.​​​‌William Koch, S.‌Spencer Albrecht, J.‌​‌James Oeth and F.​​Frédéric Suter. Developing​​​‌ Accurate and Scalable Simulators‌ of Production Workflow Management‌​‌ Systems with WRENCH.​​Future Generation Computer Systems​​​‌1122020, 162--175‌DOIback to text‌​‌back to text
  • 58​​ articleH.Henri Casanova​​​‌, A.Arnaud Giersch‌, A.Arnaud Legrand‌​‌, M.Martin Quinson​​ and F.Frédéric Suter​​​‌. Versatile, Scalable, and‌ Accurate Simulation of Distributed‌​‌ Applications and Platforms.​​Journal of Parallel and​​​‌ Distributed Computing7410‌June 2014, 2899-2917‌​‌HALback to text​​back to text
  • 59​​​‌ articleA.Arslan Chaudhry‌, A.Albert Gordo‌​‌, P. K.Puneet​​ K Dokania, P.​​​‌Philip Torr and D.‌David Lopez-Paz. Using‌​‌ hindsight to anchor past​​ knowledge in continual learning​​​‌.arXiv preprint arXiv:2002.08165‌32020back to‌​‌ text
  • 60 inproceedingsM.​​Melvin Chelli, C.​​​‌Cédric Prigent, R.‌René Schubotz, A.‌​‌Alexandru Costan, G.​​Gabriel Antoniu, L.​​​‌Lo\"ic Cudennec and P.‌Philipp Slusallek. FedGuard:‌​‌ Selective Parameter Aggregation for​​ Poisoning Attack Mitigation in​​​‌ Federated Learning.Cluster‌ 2023 - IEEE International‌​‌ Conference on Cluster Computing​​Santa Fe, New Mexico,​​​‌ United StatesIEEEOctober‌ 2023, 1-10HAL‌​‌back to text
  • 61​​ miscE.European Commission​​​‌. Critical Raw Materials‌ Resilience: Charting a Path‌​‌ towards greater Security and​​ Sustainability.2020back​​​‌ to text
  • 62 misc‌Contrat d’objectifs et de‌​‌ performance 2019-2023 Entre l’État​​ et Inria.2019​​​‌back to text
  • 63‌ articleC.C.S. Daley‌​‌, D.D. Ghoshal​​, G.G.K. Lockwood​​​‌, S.S. Dosanjh‌, L.L. Ramakrishnan‌​‌ and N.N.J. Wright​​. Performance characterization of​​​‌ scientific workflows for the‌ optimal use of Burst‌​‌ Buffers.Future Generation​​ Computer Systems1102020​​​‌, 468-480URL: https://www.sciencedirect.com/science/article/pii/S0167739X16308287‌DOIback to text‌​‌
  • 64 articleA.Advait​​ Deshpande. Assessing the​​​‌ quantum-computing landscape.Communications‌ of the ACM65‌​‌102022, 57--65​​​‌back to text
  • 65​ articleP. E.Peter​‌ E. Dewdney, P.​​ J.Peter J. Hall​​​‌, R. T.Richard​ T. Schilizzi and T.​‌ J.T. Joseph L.​​ W. Lazio. The​​​‌ Square Kilometre Array.​Proceedings of the IEEE​‌9782009,​​ 1482-1496DOIback to​​​‌ text
  • 66 inproceedingsE.​Estelle Dirand, L.​‌Laurent Colombet and B.​​Bruno Raffin. TINS:​​​‌ A Task-Based Dynamic Helper​ Core Strategy for In​‌ Situ Analytics.SCA18​​ - Supercomputing Frontiers Asia​​​‌ 2018Singapore, SingaporeMarch​ 2018, 159-178HAL​‌DOIback to text​​
  • 67 articleM.Matthieu​​​‌ Dorier, G.Gabriel​ Antoniu, F.Franck​‌ Cappello, M.Marc​​ Snir, R.Robert​​​‌ Sisneros, O.Orcun​ Yildiz, S.Shadi​‌ Ibrahim, T.Tom​​ Peterka and L.Leigh​​​‌ Orf. Damaris: Addressing​ Performance Variability in Data​‌ Management for Post-Petascale Simulations​​.ACM Transactions on​​​‌ Parallel Computing33​2016, 15HAL​‌DOIback to text​​
  • 68 miscECLAT -​​​‌ Extreme Computing Lab for​ Astronomical Telescopes.2024​‌, URL: https://eclat-lab.fr/back​​ to text
  • 69 article​​​‌D.Dina Fakhry,​ M.Mohamed Abdelsalam,​‌ M. W.M. Watheq​​ El-Kharashi and M.Mona​​​‌ Safar. A Review​ on Computational Storage Devices​‌ and near Memory Computing​​ for High Performance Applications​​​‌.Memories - Materials,​ Devices, Circuits and Systems​‌4July 2023,​​ 100051URL: https://www.sciencedirect.com/science/article/pii/S2773064623000282DOI​​​‌back to text
  • 70​ articleA.Ana Gainaru​‌, L.Lipeng Wan​​, R.Ruonan Wang​​​‌, E.Eric Suchyta​, J.Jieyang Chen​‌, N.Norbert Podhorszki​​, J.James Kress​​​‌, D.David Pugmire​ and S.Scott Klasky​‌. Understanding the Impact​​ of Data Staging for​​​‌ Coupled Scientific Workflows.​IEEE Transactions on Parallel​‌ and Distributed Systems33​​122022, 4134--4147​​​‌back to text
  • 71​ articleA.Ana Gainaru​‌, L.Lipeng Wan​​, R.Ruonan Wang​​​‌, E.Eric Suchyta​, J.Jieyang Chen​‌, N.Norbert Podhorszki​​, J.James Kress​​​‌, D.David Pugmire​ and S.Scott Klasky​‌. Understanding the Impact​​ of Data Staging for​​​‌ Coupled Scientific Workflows.​IEEE Transactions on Parallel​‌ and Distributed Systems33​​122022, 4134-4147​​​‌DOIback to text​
  • 72 articleA.Avishek​‌ Ghosh, J.Jichan​​ Chung, D.Dong​​​‌ Yin and K.Kannan​ Ramchandran. An Efficient​‌ Framework for Clustered Federated​​ Learning.IEEE Transactions​​​‌ on Information Theory68​122022, 8076-8091​‌DOIback to text​​
  • 73 inproceedingsD.Donghyun​​​‌ Gouk, S.Sangwon​ Lee, M.Miryeong​‌ Kwon and M.Myoungsoo​​ Jung. Direct Access,​​​‌ High-Performance Memory Disaggregation with​ DirectCXL.2022 USENIX​‌ Annual Technical Conference (USENIX​​ ATC 22)Carlsbad, CA​​​‌USENIX AssociationJuly 2022​, 287--294URL: https://www.usenix.org/conference/atc22/presentation/gouk​‌back to textback​​ to text
  • 74 article​​​‌V.V. Grandgirard,​ Y.Y. Sarazin,​‌ X.X. Garbet,​​ G.G. Dif-Pradalier,​​​‌ P.Ph. Ghendrih,​ N.N. Crouseilles,​‌ G.G. Latu,​​ E.E. Sonnendrücker,​​ N.N. Besse and​​​‌ P.P. Bertrand.‌ GYSELA, a full-f global‌​‌ gyrokinetic Semi-Lagrangian code for​​ ITG turbulence simulations.​​​‌AIP Conference Proceedings871‌12006, 100-111‌​‌URL: http://scitation.aip.org/content/aip/proceeding/aipcp/10.1063/1.2404543DOIback​​ to text
  • 75 article​​​‌N.Nathalie Hartl,‌ E.Elena Wössner and‌​‌ Y.York Sure-Vetter.​​ Nationale Forschungsdateninfrastruktur (NFDI).​​​‌Informatik Spektrum445‌2021, 370--373back‌​‌ to text
  • 76 article​​E. A.E. A.​​​‌ Huerta, A.Asad‌ Khan, E.Edward‌​‌ Davis, C.Colleen​​ Bushell, W. D.​​​‌William D. Gropp,‌ D. S.Daniel S.‌​‌ Katz, V. V.​​Volodymyr V. Kindratenko,​​​‌ S.Seid Koric,‌ W. T.William T.‌​‌ C. Kramer, B.​​Brendan McGinty, K.​​​‌Kenton McHenry and A.‌Aaron Saxton. Convergence‌​‌ of artificial intelligence and​​ high performance computing on​​​‌ NSF-supported cyberinfrastructure.Journal‌ of Big Data7‌​‌12020, 88​​back to text
  • 77​​​‌ miscK.Kevin Jacobs‌, S.Sagar Chopra‌​‌, A.Aaron Barr​​ and B.Benjamin Boucher​​​‌. Supply shortages and‌ an inflexible market give‌​‌ rise to high power​​ transformer lead times.​​​‌2021back to text‌
  • 78 articleE.Emmanuel‌​‌ Jeannot, G.Guillaume​​ Pallez and N.Nicolas​​​‌ Vidal. IO-aware Job-Scheduling:‌ Exploiting the Impacts of‌​‌ Workload Characterizations to select​​ the Mapping Strategy.​​​‌International Journal of High‌ Performance Computing Applications2023‌​‌, 1-13HALDOI​​back to text
  • 79​​​‌ inproceedingsT.Théo Jolivel‌, F.François Tessier‌​‌, J.Julien Monniot​​ and G.Guillaume Pallez​​​‌. MOSAIC: Detection and‌ Categorization of I/O Patterns‌​‌ in HPC Applications.​​SC24-W: Workshops of the​​​‌ International Conference for High‌ Performance Computing, Networking, Storage‌​‌ and AnalysisAtlanta, United​​ StatesNovember 2024,​​​‌ 1-7HALDOIback‌ to text
  • 80 inproceedings‌​‌D.Dieter Kranzlmueller,​​ J. M.J Marco​​​‌ de Lucas and P.‌P Oester. The‌​‌ European Grid Initiative (EGI)​​ Towards a Sustainable Grid​​​‌ Infrastructure.Remote Instrumentation‌ and Virtual Laboratories: Service‌​‌ Architecture and NetworkingSpringer​​2010, 61--66back​​​‌ to text
  • 81 article‌D.Dave Landsman and‌​‌ K.Karin Strauss.​​ The DNA Data Storage​​​‌ Model.Computer56‌7July 2023,‌​‌ 78--85URL: https://ieeexplore.ieee.org/document/10154188/DOI​​back to text
  • 82​​​‌ inproceedingsA.Adrien Lebre‌, A.Arnaud Legrand‌​‌, F.Frédéric Suter​​ and P.Pierre Veyre​​​‌. Adding Storage Simulation‌ Capacities to the SimGrid‌​‌ Toolkit: Concepts, Models, and​​ API.2015 15th​​​‌ IEEE/ACM International Symposium on‌ Cluster, Cloud and Grid‌​‌ Computing2015, 251-260​​DOIback to text​​​‌
  • 83 miscH.Huaicheng‌ Li, D. S.‌​‌Daniel S. Berger,​​ S.Stanko Novakovic,​​​‌ L.Lisa Hsu,‌ D.Dan Ernst,‌​‌ P.Pantea Zardoshti,​​ M.Monish Shah,​​​‌ S.Samir Rajadnya,‌ S.Scott Lee,‌​‌ I.Ishwar Agarwal,​​ M. D.Mark D.​​​‌ Hill, M.Marcus‌ Fontoura and R.Ricardo‌​‌ Bianchini. Pond: CXL-Based​​ Memory Pooling Systems for​​​‌ Cloud Platforms.October‌ 2022, URL: http://arxiv.org/abs/2203.00241‌​‌DOIback to text​​​‌back to text
  • 84​ articleS.Suyi Li​‌, Y.Yong Cheng​​, W.Wei Wang​​​‌, Y.Yang Liu​ and T.Tianjian Chen​‌. Learning to Detect​​ Malicious Clients for Robust​​​‌ Federated Learning.CoRR​abs/2002.002112020, URL:​‌ https://arxiv.org/abs/2002.00211back to text​​
  • 85 incollectionT.Thomas​​​‌ Lippert, T.Thomas​ Eickermann and D.Dietmar​‌ Erwin. PRACE: Europe's​​ supercomputing research infrastructure.​​​‌Applications, Tools and Techniques​ on the Road to​‌ Exascale ComputingIOS Press​​2012, 7--18back​​​‌ to text
  • 86 inproceedings​G.GK. Lockwood,​‌ D.D. Hazen,​​ Q.Q. Koziol,​​​‌ R.RS. Canon,​ K.K. Antypas and​‌ J.J. Balewski.​​ Storage 2020: A Vision​​​‌ for the Future of​ HPC Storage.Report:​‌ LBNL-2001072Lawrence Berkeley National​​ Laboratory2017, URL:​​​‌ https://escholarship.org/uc/item/744479dp#authorback to text​
  • 87 inproceedingsJ.Jakob​‌ Luettgau, S.Shane​​ Snyder, P.Philip​​​‌ Carns, J. M.​Justin M. Wozniak,​‌ J.Julian Kunkel and​​ T.Thomas Ludwig.​​​‌ Toward Understanding I/O Behavior​ in HPC Workflows.​‌2018 IEEE/ACM 3rd International​​ Workshop on Parallel Data​​​‌ Storage & Data Intensive​ Scalable Computing Systems (PDSW-DISCS)​‌Dallas, TX, USANovember​​ 2018, 64--75DOI​​​‌back to text
  • 88​ inproceedingsJ.Jakob Luettgau​‌, S.Shane Snyder​​, T.Tyler Reddy​​​‌, N.Nikolaus Awtrey​, K.Kevin Harms​‌, J. L.Jean​​ Luca Bez, R.​​​‌Rui Wang, R.​Rob Latham and P.​‌Philip Carns. Enabling​​ Agile Analysis of I/O​​​‌ Performance Data with PyDarshan​.Proceedings of the​‌ SC '23 Workshops of​​ The International Conference on​​​‌ High Performance Computing, Network,​ Storage, and AnalysisSC-W​‌ '23New York, NY,​​ USAAssociation for Computing​​​‌ MachineryNovember 2023,​ 1380--1391URL: https://doi.org/10.1145/3624062.3624207DOI​‌back to text
  • 89​​ articleT.Tao Luo​​​‌, W.-F.Weng-Fai Wong​, R. S.Rick​‌ Siow Mong Goh,​​ A. T.Anh Tuan​​​‌ Do, Z.Zhixian​ Chen, H.Haizhou​‌ Li, W.Wenyu​​ Jiang and W.Weiyun​​​‌ Yau. Achieving Green​ AI with Energy-Efficient Deep​‌ Learning Using Neuromorphic Computing​​.Commun. ACM66​​​‌7jun 2023,​ 52–57URL: https://doi.org/10.1145/3588591DOI​‌back to text
  • 90​​ bookM.Michael Malms​​​‌, L.Laurent Cargemel​, E.Estela Suarez​‌, N.Nico Mittenzwey​​, M.Marc Duranton​​​‌, S.Sakir Sezer​, C.Craig Prunty​‌, P.Pascale Rossé-Laurent​​, M.Maria Pérez-Harnandez​​​‌, M.Manolis Marazakis​, G.Guy Lonsdale​‌, P.Paul Carpenter​​, G.Gabriel Antoniu​​​‌, S.Sai Narasimharmurthy​, A.André Brinkman​‌, D.Dirk Pleiter​​, U.-U.Utz-Uwe Haus​​​‌, J.Jens Krueger​, H.-C.Hans-Christian Hoppe​‌, E.Erwin Laure​​, A.Andreas Wierse​​​‌, V.Valeria Bartsch​, K.Kristel Michielsen​‌, C.Cyril Allouche​​, T.Tobias Becker​​​‌ and R.Robert Haas​. ETP4HPC's SRA 5​‌ - Strategic Research Agenda​​ for High-Performance Computing in​​​‌ Europe - 2022.​Zenodo2022HALDOI​‌back to textback​​ to textback to​​ text
  • 91 inproceedingsJ.​​​‌Julien Monniot, F.‌François Tessier, M.‌​‌Matthieu Robert and G.​​Gabriel Antoniu. StorAlloc:​​​‌ A Simulator for Job‌ Scheduling on Heterogeneous Storage‌​‌ Resources.HeteroPar 2022​​Glasgow, United KingdomAugust​​​‌ 2022HALback to‌ text
  • 92 articleJ.‌​‌Julien Monniot, F.​​François Tessier, M.​​​‌Matthieu Robert and G.‌Gabriel Antoniu. Supporting‌​‌ dynamic allocation of heterogeneous​​ storage resources on HPC​​​‌ systems.Concurrency and‌ Computation: Practice and Experience‌​‌35282023,​​ e7890URL: https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.7890DOI​​​‌back to text
  • 93‌ articleE.Etienne Ndamlabin‌​‌ and B.Berenger Bramas​​. RSCHED: An Effective​​​‌ Heterogeneous Resource Management for‌ Simultaneous Execution of Task-Based‌​‌ Applications.International Journal​​ of Advanced Computer Science​​​‌ and Applications162‌2025HALDOIback‌​‌ to text
  • 94 article​​B. H.Bichlien H.​​​‌ Nguyen, C. N.‌Christopher N. Takahashi,‌​‌ G.Gagan Gupta,​​ J. A.Jake A.​​​‌ Smith, R.Richard‌ Rouse, P.Paul‌​‌ Berndt, S.Sergey​​ Yekhanin, D. P.​​​‌David P. Ward,‌ S. D.Siena D.‌​‌ Ang, P.Patrick​​ Garvan, H.-Y.Hsing-Yeh​​​‌ Parker, R.Rob‌ Carlson, D.Douglas‌​‌ Carmean, L.Luis​​ Ceze and K.Karin​​​‌ Strauss. Scaling DNA‌ Data Storage with Nanoscale‌​‌ Electrode Wells.Science​​ Advances748November​​​‌ 2021, eabi6714URL:‌ https://www.science.org/doi/10.1126/sciadv.abi6714DOIback to‌​‌ text
  • 95 articleC.​​ I.Cosmas Ifeanyi Nwakanma​​​‌, J.-W.Jae-Woo Kim‌, J.-M.Jae-Min Lee‌​‌ and D.-S.Dong-Seong Kim​​. Edge AI prospect​​​‌ using the NeuroEdge computing‌ system: Introducing a novel‌​‌ neuromorphic technology.ICT​​ Express722021​​​‌, 152--157back to‌ text
  • 96 articleH.‌​‌Harish Padmanaban. Quantum​​ Computing and AI in​​​‌ the Cloud.Journal‌ of Computational Intelligence and‌​‌ Robotics41Mar.​​ 2024, 14–32URL:​​​‌ https://thesciencebrigade.com/jcir/article/view/116back to text‌
  • 97 inproceedingsF.Fengfeng‌​‌ Pan, Y.Yinliang​​ Yue, J.Jin​​​‌ Xiong and D.Daxiang‌ Hao. I/O Characterization‌​‌ of Big Data Workloads​​ in Data Centers.​​​‌Big Data Benchmarks, Performance‌ Optimization, and Emerging Hardware‌​‌ChamSpringer International Publishing​​2014, 85--97back​​​‌ to text
  • 98 inproceedings‌R.Robert Patton,‌​‌ C.Catherine Schuman,​​ S.Shruti Kulkarni,​​​‌ M.Maryam Parsa,‌ J. P.J Parker‌​‌ Mitchell, N. Q.​​N Quentin Haas,​​​‌ C.Christopher Stahl,‌ S.Spencer Paulissen,‌​‌ P.Prasanna Date,​​ T.Thomas Potok and​​​‌ others. Neuromorphic computing‌ for autonomous racing.‌​‌International Conference on Neuromorphic​​ Systems 20212021,​​​‌ 1--5back to text‌
  • 99 articleM.Mohan‌​‌ Raparthi. Real-Time AI​​ Decision Making in IoT​​​‌ with Quantum Computing: Investigating‌ & Exploring the Development‌​‌ and Implementation of Quantum-Supported​​ AI Inference Systems for​​​‌ IoT Applications.Internet‌ of Things and Edge‌​‌ Computing Journal11​​Mar. 2021, 18–27​​​‌URL: https://thesciencebrigade.com/iotecj/article/view/130back to‌ text
  • 100 inproceedingsG.‌​‌ P.Gonzalo Pedro Rodrigo​​ Álvarez, P.-O.Per-Olov​​​‌ Östberg, E.Erik‌ Elmroth, K.Katie‌​‌ Antypas, R.Richard​​​‌ Gerber and L.Lavanya​ Ramakrishnan. HPC System​‌ Lifetime Story: Workload Characterization​​ and Evolutionary Analyses on​​​‌ NERSC Systems.Proceedings​ of the 24th International​‌ Symposium on High-Performance Parallel​​ and Distributed ComputingHPDC​​​‌ '15New York, NY,​ USAPortland, Oregon, USA​‌Association for Computing Machinery​​2015, 57–60URL:​​​‌ https://doi.org/10.1145/2749246.2749270DOIback to​ text
  • 101 articleD.​‌David Rolnick and A.​​Arun Ahuja. Experience​​​‌ replay for continual learning​.Advances in Neural​‌ Information Processing Systems32​​2019back to text​​​‌
  • 102 inproceedingsD.Daniel​ Rosendo, P.Pedro​‌ Silva, M.Matthieu​​ Simonin, A.Alexandru​​​‌ Costan and G.Gabriel​ Antoniu. E2Clab: Exploring​‌ the Computing Continuum through​​ Repeatable, Replicable and Reproducible​​​‌ Edge-to-Cloud Experiments.2020​ IEEE International Conference on​‌ Cluster Computing (CLUSTER)2020​​, 176-186DOIback​​​‌ to text
  • 103 misc​SKA - Square Kilometre​‌ Array.2024,​​ URL: https://www.skao.int/enback to​​​‌ text
  • 104 inproceedingsS.​Shazia Sadiq, M.​‌Maria Orlowska, W.​​Wasim Sadiq and C.​​​‌Cameron Foulger. Data​ Flow and Validation in​‌ Workflow Modelling.Proceedings​​ of the 15th Australasian​​​‌ database conference-Volume 272004​, 207--214back to​‌ text
  • 105 inproceedingsC.​​Conrad Sanderson, Q.​​​‌Qinghua Lu, D.​David Douglas, X.​‌Xiwei Xu, L.​​Liming Zhu and J.​​​‌Jon Whittle. Towards​ Implementing Responsible AI.​‌2022 IEEE International Conference​​ on Big Data (Big​​​‌ Data)IEEE2022,​ 5076--5081back to text​‌
  • 106 articleR. F.​​Rafael Ferreira da Silva​​​‌, H.Henri Casanova​, A.-C.Anne-Cécile Orgerie​‌, R.Ryan Tanaka​​, E.Ewa Deelman​​​‌ and F.Frédéric Suter​. Characterizing, Modeling, and​‌ Accurately Simulating Power and​​ Energy Consumption of I/O-intensive​​​‌ Scientific Workflows.Journal​ of Computational Science44​‌2020, 101157URL:​​ https://www.sciencedirect.com/science/article/pii/S1877750320304580DOIback to​​​‌ text
  • 107 articleT.​Thomas Skordas. Toward​‌ a european exascale ecosystem:​​ the eurohpc joint undertaking​​​‌.Communications of the​ ACM6242019​‌, 70--70back to​​ text
  • 108 miscE.​​​‌ A.European Association on​ Smart Systems Integration.​‌ Strategic Research and Innovation​​ Agenda.2023,​​​‌ URL: https://ecssria.eu/ECS-SRIA%202023.pdfback to​ text
  • 109 inproceedingsS.​‌S. Snyder, P.​​P. Carns, K.​​​‌K. Harms, R.​R. Ross, G.​‌ K.G. K. Lockwood​​ and N. J.N.​​​‌ J. Wright. Modular​ HPC I/O Characterization with​‌ Darshan.2016 5th​​ Workshop on Extreme-Scale Programming​​​‌ Tools (ESPT)2016,​ 9-17DOIback to​‌ text
  • 110 inproceedingsL.​​Linghao Song, F.​​​‌Fan Chen, S.​ R.Steven R Young​‌, C. D.Catherine​​ D Schuman, G.​​​‌Gabriel Perdue and T.​ E.Thomas E Potok​‌. Deep learning for​​ vertex reconstruction of neutrino-nucleus​​​‌ interaction events with combined​ energy and time data​‌.ICASSP 2019-2019 IEEE​​ International Conference on Acoustics,​​​‌ Speech and Signal Processing​ (ICASSP)IEEE2019,​‌ 3882--3886back to text​​
  • 111 articleV.Victoria​​​‌ Stodden and S.Sheila​ Miguez. Best Practices​‌ for Computational Science: Software​​ Infrastructure and Environments for​​ Reproducible and Extensible Research​​​‌.Available at SSRN‌ 23222762013back to‌​‌ text
  • 112 articleS.​​Sergej Svorobej, P.​​​‌Patricia Takako Endo,‌ M.Malika Bendechache,‌​‌ C.Christos Filelis-Papadopoulos,​​ K. M.Konstantinos M​​​‌ Giannoutakis, G. A.‌George A Gravvanis,‌​‌ D.Dimitrios Tzovaras,​​ J.James Byrne and​​​‌ T.Theo Lynn.‌ Simulating Fog and Edge‌​‌ Computing Scenarios: An Overview​​ and Research Challenges.​​​‌Future Internet113‌2019, 55back‌​‌ to text
  • 113 inproceedings​​H.Houjun Tang,​​​‌ S.Suren Byna,‌ F.François Tessier,‌​‌ T.Teng Wang,​​ B.Bin Dong,​​​‌ J.Jingqing Mu,‌ Q.Quincey Koziol,‌​‌ J.Jerome Soumagne,​​ V.Venkatram Vishwanath,​​​‌ J.Jialin Liu and‌ R.Richard Warren.‌​‌ Toward Scalable and Asynchronous​​ Object-Centric Data Management for​​​‌ HPC.2018 18th‌ IEEE/ACM International Symposium on‌​‌ Cluster, Cloud and Grid​​ Computing (CCGRID)2018,​​​‌ 113-122DOIback to‌ text
  • 114 inproceedingsF.‌​‌François Tessier, P.​​Paul Gressier and V.​​​‌Venkatram Vishwanath. Optimizing‌ Data Aggregation by Leveraging‌​‌ the Deep Memory Hierarchy​​ on Large-scale Systems.​​​‌Proceedings of the 2018‌ International Conference on Supercomputing‌​‌ICS '18New York,​​ NY, USABeijing, China​​​‌ACM2018, 229--239‌URL: http://doi.acm.org/10.1145/3205289.3205316DOIback‌​‌ to text
  • 115 inproceedings​​F.F. Tessier,​​​‌ V.V. Vishwanath and‌ E.E. Jeannot.‌​‌ TAPIOCA: An I/O Library​​ for Optimized Topology-Aware Data​​​‌ Aggregation on Large-Scale Supercomputers‌.2017 IEEE International‌​‌ Conference on Cluster Computing​​ (CLUSTER)Sept 2017,​​​‌ 70-80DOIback to‌ text
  • 116 inproceedingsV.‌​‌Vale Tolpegin, S.​​Stacey Truex, M.​​​‌ E.Mehmet Emre Gursoy‌ and L.Ling Liu‌​‌. Data Poisoning Attacks​​ Against Federated Learning Systems​​​‌.Computer Security –‌ ESORICS 2020Lecture Notes‌​‌ in Computer ScienceCham​​Springer International Publishing2020​​​‌, 480--501DOIback‌ to text
  • 117 inproceedings‌​‌A.Andrew Waterman,​​ Y.Yunsup Lee,​​​‌ R.Rimas Avizienis,‌ H.Henry Cook,‌​‌ D.David Patterson and​​ K.Krste Asanovic.​​​‌ The RISC-V instruction set‌.2013 IEEE Hot‌​‌ Chips 25 Symposium (HCS)​​2013, 1-1DOI​​​‌back to text
  • 118‌ articleM. D.Mark‌​‌ D. Wilkinson, M.​​Michel Dumontier, I.​​​‌ J.IJsbrand Jan Aalbersberg‌, G.Gabrielle Appleton‌​‌, M.Myles Axton​​, A.Arie Baak​​​‌, N.Niklas Blomberg‌, J.-W.Jan-Willem Boiten‌​‌, L. B.Luiz​​ Bonino da Silva Santos​​​‌, P. E.Philip‌ E. Bourne, J.‌​‌Jildau Bouwman, A.​​ J.Anthony J. Brookes​​​‌, T.Tim Clark‌, M.Mercè Crosas‌​‌, I.Ingrid Dillo​​, O.Olivier Dumon​​​‌, S.Scott Edmunds‌, C. T.Chris‌​‌ T. Evelo, R.​​Richard Finkers, A.​​​‌Alejandra Gonzalez-Beltran, A.‌ J.Alasdair J. G.‌​‌ Gray, P.Paul​​ Groth, C.Carole​​​‌ Goble, J. S.‌Jeffrey S. Grethe,‌​‌ J.Jaap Heringa,​​ P. A.Peter A.​​​‌ C. 't Hoen,‌ R.Rob Hooft,‌​‌ T.Tobias Kuhn,​​​‌ R.Ruben Kok,​ J.Joost Kok,​‌ S. J.Scott J.​​ Lusher, M. E.​​​‌Maryann E. Martone,​ A.Albert Mons,​‌ A. L.Abel L.​​ Packer, B.Bengt​​​‌ Persson, P.Philippe​ Rocca-Serra, M.Marco​‌ Roos, R.Rene​​ van Schaik, S.-A.​​​‌Susanna-Assunta Sansone, E.​Erik Schultes, T.​‌Thierry Sengstag, T.​​Ted Slater, G.​​​‌George Strawn, M.​ A.Morris A. Swertz​‌, M.Mark Thompson​​, J.Johan van​​​‌ der Lei, E.​Erik van Mulligen,​‌ J.Jan Velterop,​​ A.Andra Waagmeester,​​​‌ P.Peter Wittenburg,​ K.Katherine Wolstencroft,​‌ J.Jun Zhao and​​ B.Barend Mons.​​​‌ The FAIR Guiding Principles​ for Scientific Data Management​‌ and Stewardship.Scientific​​ Data31March​​​‌ 2016, 160018URL:​ https://www.nature.com/articles/sdata201618DOIback to​‌ text