THOTH

THOTH - 2025

2025‌Activity reportProject-TeamTHOTH

RNSR: 201622034K

Research center‌ Inria Centre at Université Grenoble Alpes
Team name:‌ Learning visual models from large-scale data
In collaboration‌ with:Laboratoire Jean Kuntzmann (LJK)

Creation of the‌ Project-Team: 2016 March 01

Each year, Inria research‌ teams publish an Activity Report presenting their work‌ and results over the reporting period. These reports‌ follow a common structure, with some optional sections‌ depending on the specific team. They typically begin‌ by outlining the overall objectives and research programme,‌ including the main research themes, goals, and methodological‌ approaches. They also describe the application domains targeted‌ by the team, highlighting the scientific or societal‌ contexts in which their work is situated.

The‌ reports then present the highlights of the year,‌ covering major scientific achievements, software developments, or teaching‌ contributions. When relevant, they include sections on software,‌ platforms, and open data, detailing the tools developed‌ and how they are shared. A substantial part‌ is dedicated to new results, where scientific contributions‌ are described in detail, often with subsections specifying‌ participants and associated keywords.

Finally, the Activity Report‌ addresses funding, contracts, partnerships, and collaborations at various‌ levels, from industrial agreements to international cooperations. It‌ also covers dissemination and teaching activities, such as‌ participation in scientific events, outreach, and supervision. The‌ document concludes with a presentation of scientific production,‌ including major publications and those produced during the‌ year.

Keywords

Computer Science and Digital Science

A3.4.‌ Machine learning and statistics
A5.3. Image processing and‌ analysis
A5.9. Signal processing
A6.2.6. Optimization
A8.2. Optimization‌
A9.2. Machine learning
A9.3. Signal processing
A9.7. AI‌ algorithmics
A9.11. Generative AI
A9.12. Computer vision

Other‌ Research Topics and Application Domains

B9.5.6. Data science‌

1 Team members, visitors, external collaborators

Research Scientists‌

Julien Mairal [Team leader, Inria,‌ Senior Researcher, en détachement du corps des Mines, HDR]‌
Karteek Alahari [Inria‌, Senior Researcher,‌‌ HDR]
Michael Arbel [Inria, Researcher‌]
Pia Bideau [‌UGA, Chair]‌‌
Jocelyn Chanussot [Inria, Senior Researcher,‌ en détachement Grenoble INP‌, HDR]
Emanuele‌‌ Dalsasso [Inria, ISFP, from Dec‌ 2025]
Pierre Gaillard‌ [Inria, Researcher‌‌, HDR]
Hadrien Hendrikx [Inria,‌ Researcher]

Post-Doctoral Fellows‌

Alessia Boccalatte [UGA‌‌, Post-Doctoral Fellow, until Jan 2025]‌
Khaled Eldowa [Inria‌, Post-Doctoral Fellow,‌‌ from Oct 2025]
Charles-Gerard Lucas [Inria‌, Post-Doctoral Fellow,‌ from Oct 2025]‌‌
Giacomo Meanti [Inria, Post-Doctoral Fellow]‌
Romain Menegaux [Inria‌, until Jan 2025‌‌]
Scott Pesme [Inria, Post-Doctoral Fellow‌]

PhD Students

Yedidia‌ Agnimo [Ekimetrics,‌‌ CIFRE, from Jul 2025]
Loic Arbez‌ [GRENOBLE INP]‌
Eyal Benaroche [Meta‌‌, from Nov 2025]
Tariq Berrada Ifriqi‌ [Meta, CIFRE‌]
Theo Bodrito [‌‌Inria, until Jul 2025, with Willow‌]
Timothee Darcet [‌Meta, CIFRE,‌‌ until Feb 2025]
Fares El Khoury [‌Inria]
Renaud Gaucher‌ [Ecole Polytechnique]‌‌
Bilal Yagiz Gündeger [UGA, from Nov‌ 2025]
Vincent Herfeld‌ [ENHANCE LAB,‌‌ CIFRE, from May 2025]
Emmanuel Jehanno‌ [Inria]
Zhiqi‌ Kang [Inria,‌‌ until Sep 2025]
Paul Liautaud [Sorbonne‌ Univ]
Bianca Marin‌ Moreno [EDF,‌‌ CIFRE, until Nov 2025]
Juliette Marrie‌ [NAVER LABS Europe‌, CIFRE, until‌‌ Jul 2025]
Ieva Petrulionyte [UGA]‌
François Porcher [Meta‌, CIFRE, from‌‌ Apr 2025, with WILLOW]
Colin Prieur‌ [Univ Montpellier,‌ until Oct 2025]‌‌
Romain Seailles [ENS Paris, with Willow‌]
Amogh Tiwari [‌UGA]
Eloise Touron‌‌ [Inria]
Kenta Vert [UGA,‌ from Sep 2025]‌
Julien Zhou [Criteo‌‌, CIFRE]

Technical Staff

Juliette Bertrand [‌Inria, Engineer,‌ until Oct 2025]‌‌
Julien Horvat [UGA]
Noé Peterlongo [‌INPG SA, Engineer‌, until Jan 2025‌‌]
Thomas Ryckeboer [Inria, Engineer]‌
Mathis Tailland [UGA‌, until Oct 2025‌‌]

Interns and Apprentices

Theodore Batte [Polytech‌ Grenoble, Intern,‌ until Mar 2025]‌‌
Augustin Cablant [Criteo, Intern, from‌ May 2025 until Nov‌ 2025]
Romain Forestier‌‌ [UGA, Intern, from May 2025‌ until Jun 2025]‌
Manuela Giraldo Obando [‌‌INPG SA, Intern, from Jun 2025‌ until Jul 2025]‌
Quentin Goizet [Polytech‌‌ Grenoble, Intern]
Geraud Ilinca [Inria‌, Intern, from‌ Mar 2025 until Sep‌‌ 2025]
Lucas Montigon [Polytech Grenoble,‌ Intern, until Mar‌ 2025]
Carlos Inaki‌‌ Roman Martinez [UGA‌, Intern, from Feb 2025 until Jul‌ 2025]
Morgan Scalabrino [Inria, Intern‌, from Apr 2025 until Aug 2025]‌
Kenta Vert [Inria, Intern, from‌ Apr 2025 until Aug 2025]

Administrative Assistant‌

Nathalie Gillot [Inria]

Visiting Scientists

Nassim‌ Ait Ali Braham [DLR, until Oct‌ 2025]
Yusuf Mehmet Colak [UNIV PAVIE‌, from Oct 2025]
François Postic [‌INRAE, until Sep 2025]
Francesca Razzano‌ [Univ Padova, from Oct 2025]‌

External Collaborator

Olivier Flasseur [CNRS]

2‌ Overall objectives

Thoth is a computer vision and‌ machine learning team. Our initial goal was to‌ develop machine learning models for analyzing the massive‌ amounts of visual data that are currently available‌ on the web. Then, the focus of the‌ team has become more diverse. More precisely, we‌ share a common objective of developing machine learning‌ models that are robust and efficient (in terms‌ of computational cost and data requirements).

Our main‌ research directions are the following ones:

visual understanding‌ from limited annotations and data: Many state-of-the-art computer‌ vision models are typically trained on a huge‌ corpus of fully annotated data. We want to‌ reduce the cost by developing new algorithms for‌ unsupervised, self-supervised, continual, or incremental learning.
efficient deep‌ learning models, from theory to applications: We‌ want to invent a new generation of machine‌ learning models (in particular deep learning) with theoretical‌ guarantees, efficient algorithms, and a wide range of‌ applications. We develop for instance models for images,‌ videos, graphs, or sequences.
statistical machine learning and‌ optimization: we are also developing efficient machine learning‌ methods, with a focus on stochastic optimization for‌ processing large-scale data, and online learning.
pluri-disciplinary collaborations:‌ Machine learning being at the crossing of several‌ disciplines, we have successfully conducted collaborations in scientific‌ domains that are relatively far from our domains‌ of expertise. These fields are producing massive amounts‌ of data and are in dire needs of‌ efficient tools to make predictions or interpretations. For‌ example, we have had the chance to collaborate‌ with many colleagues from natural language processing, robotics,‌ neuroimaging, computational biology, genomics, astrophysics for exoplanet detections,‌ and we are currently involved in several remote‌ sensing and hyperspectral imaging projects thanks to Jocelyn‌ Chanussot (hosted by Thoth in the 2019 to‌ 2022 period, now an INRIA senior scientist on‌ leave from Grenoble INP since september 2023 ).‌

3 Research program

3.1 Designing and learning structured‌ models

The task of understanding image and video‌ content has been interpreted in several ways over‌ the past few decades, namely image classification, detecting‌ objects in a scene, recognizing objects and their‌ spatial extents in an image, recovering scene geometry.‌ However, addressing all these problems individually provides us‌ with a partial understanding of the scene at‌ best, leaving much of the visual data unexplained.‌

One of the main goals of this research axis is to go‌ beyond the initial attempts‌ that consider only a‌‌ subset of tasks jointly, by developing novel models‌ for a more complete‌ understanding of scenes to‌‌ address all the component tasks. We propose to‌ incorporate the structure in‌ image and video data‌‌ explicitly into the models. In other words, our‌ models aim to satisfy‌ the complex sets of‌‌ constraints that exist in natural images and videos.‌ Examples of such constraints‌ include: (i) relations between‌‌ objects, like signs for shops indicate the presence‌ of buildings, (ii) higher-level‌ semantic relations involving the‌‌ type of scene, geographic location, and the plausible‌ actions as a global‌ constraint, e.g., an image‌‌ taken at a swimming pool is unlikely to‌ contain cars, (iii) relating‌ objects occluded in some‌‌ of the video frames to content in other‌ frames, where they are‌ more clearly visible as‌‌ the camera or the object itself move, with‌ the use of long-term‌ trajectories and video object‌‌ proposals.

This research axis will focus on two‌ topics. The first is‌ developing deep features for‌‌ video. This involves designing rich features available in‌ the form of long-range‌ temporal interactions among pixels‌‌ in a video sequence to learn a representation‌ that is truly spatio-temporal‌ in nature. The second‌‌ topic is aimed at learning models that capture‌ the relationships among several‌ objects and regions in‌‌ a single image scene, and additionally, among scenes‌ in the case of‌ an image collection or‌‌ a video. The main scientific challenges in this‌ topic stem from learning‌ the structure of the‌‌ probabilistic graphical model as well as the parameters‌ of the cost functions‌ quantifying the relationships among‌‌ its entities. In the following we will present‌ work related to all‌ these three topics and‌‌ then elaborate on our research directions.

Deep features‌ for vision. Deep learning‌ models provide a rich‌‌ representation of complex objects but in return have‌ a large number of‌ parameters. Thus, to work‌‌ well on difficult tasks, a large amount of‌ data is required. In‌ this context, video presents‌‌ several advantages: objects are observed from a large‌ range of viewpoints, motion‌ information allows the extraction‌‌ of moving objects and parts, and objects can‌ be differentiated by their‌ motion patterns. We initially‌‌ plan to develop deep features for videos that‌ incorporate temporal information at‌ multiple scales. We then‌‌ plan to further exploit the rich content in‌ video by incorporating additional‌ cues such as minimal‌‌ prior knowledge of the object of interest, with‌ the goal of learning‌ a representation that is‌‌ more appropriate for video understanding. In other words,‌ a representation that is‌ learned from video data‌‌ and targeted at specific applications.
Structured models. The‌ interactions among various elements‌ in a scene, such‌‌ as the objects and regions in it, the‌ motion of object parts‌ or entire objects themselves,‌‌ form a key element for understanding image or‌ video content. These rich‌ cues define the structure‌‌ of visual data and‌ how it evolves spatio-temporally. We plan to develop‌ a novel graphical model to exploit this structure.‌ The main components in this graphical model are‌ spatio-temporal regions (in the case of video or‌ simply image regions), which can represent object parts‌ or entire objects themselves, and the interactions among‌ several entities. The dependencies among the scene entities‌ are defined with a higher order or a‌ global cost function. A higher order constraint is‌ a generalization of the pairwise interaction term, and‌ is a cost function involving more than two‌ components in the scene, e.g., several regions, whereas‌ a global constraint imposes a cost term over‌ the entire image or vide such as a‌ prior knowledge on the number of people expected‌ in the scene. The constraints we plan to‌ include generalize several existing methods, which are limited‌ to pairwise interactions or a small restrictive set‌ of higher-order costs. In addition to learning the‌ parameters of these novel functions, we will focus‌ on learning the structure of the graph itself—a‌ challenging problem that is seldom addressed in current‌ approaches. This provides an elegant way to go‌ beyond state-of-the-art deep learning methods, which are limited‌ to learning the high-level interaction among parts of‌ an object, by learning the relationships among objects.‌

3.2 Learning of visual models from minimal supervision‌

Today's approaches to visual recognition learn models for‌ a limited and fixed set of visual categories‌ with fully supervised classification techniques. This paradigm has‌ been adopted in the early 2000's, and within‌ it enormous progress has been made over the‌ last decade.

The scale and diversity in today's‌ large and growing image and video collections (such‌ as, e.g., broadcast archives, and personal image/video collections)‌ call for a departure from the current paradigm.‌ This is the case because to answer queries‌ about such data, it is unfeasible to learn‌ the models of visual content by manually and‌ precisely annotating every relevant concept, object, scene, or‌ action category in a representative sample of everyday‌ conditions. For one, it will be difficult, or‌ even impossible to decide a-priori what are the‌ relevant categories and the proper granularity level. Moreover,‌ the cost of such annotations would be prohibitive‌ in most application scenarios. One of the main‌ goals of the Thoth project-team is to develop‌ a new framework for learning visual recognition models‌ by actively exploring large digital image and video‌ sources (off-line archives as well as growing on-line‌ content), and exploiting the weak supervisory signal provided‌ by the accompanying metadata (such as captions, keywords,‌ tags, subtitles, or scripts) and audio signal (from‌ which we can for example extract speech transcripts,‌ or exploit speaker recognition models).

Textual metadata has‌ traditionally been used to index and search for‌ visual content. The information in metadata is, however,‌ typically sparse (e.g., the location and overall topic‌ of newscasts in a video archive 1)‌ and noisy (e.g., a movie script may tell us that two persons‌ kiss in some scene,‌ but not when, and‌‌ the kiss may occur off the screen or‌ not have survived the‌ final cut). For this‌‌ reason, metadata search should be complemented by visual‌ content based search, where‌ visual recognition models are‌‌ used to localize content of interest that is‌ not mentioned in the‌ metadata, to increase the‌‌ usability and value of image/video archives. The key‌ insight that we build‌ on in this research‌‌ axis is that while the metadata for a‌ single image or video‌ is too sparse and‌‌ noisy to rely on for search, the metadata‌ associated with large video‌ and image databases collectively‌‌ provide an extremely versatile source of information to‌ learn visual recognition models.‌ This form of “embedded‌‌ annotation” is rich, diverse and abundantly available. Mining‌ these correspondences from the‌ web, TV and film‌‌ archives, and online consumer generated content sites such‌ as Flickr, Facebook, or‌ YouTube, guarantees that the‌‌ learned models are representative for many different situations,‌ unlike models learned from‌ manually collected fully supervised‌‌ training data sets which are often biased.

The‌ approach we propose to‌ address the limitations of‌‌ the fully supervised learning paradigm aligns with “Big‌ Data” approaches developed in‌ other areas: we rely‌‌ on the orders-of-magnitude-larger training sets that have recently‌ become available with metadata‌ to compensate for less‌‌ explicit forms of supervision. This will form a‌ sustainable approach to learn‌ visual recognition models for‌‌ a much larger set of categories with little‌ or no manual intervention.‌ Reducing and ultimately removing‌‌ the dependency on manual annotations will dramatically reduce‌ the cost of learning‌ visual recognition models. This‌‌ in turn will allow such models to be‌ used in many more‌ applications, and enable new‌‌ applications based on visual recognition beyond a fixed‌ set of categories, such‌ as natural language based‌‌ querying for visual content. This is an ambitious‌ goal, given the sheer‌ volume and intrinsic variability‌‌ of the every day visual content available on-line,‌ and the lack of‌ a universally accepted formalism‌‌ for modeling it. Yet, the potential payoff is‌ a breakthrough in visual‌ object recognition and scene‌‌ understanding capabilities.

This research axis is organized into‌ the following three sub-tasks:‌

Weakly supervised learning. For‌‌ object localization we will go beyond current methods‌ that learn one category‌ model at a time‌‌ and develop methods that learn models for different‌ categories concurrently. This allows‌ “explaining away” effects to‌‌ be leveraged, i.e., if a certain region in‌ an image has been‌ identified as an instance‌‌ of one category, it cannot be an instance‌ of another category at‌ the same time. For‌‌ weakly supervised detection in video we will consider‌ detection proposal methods. While‌ these are effective for‌‌ still images, recent approaches for the spatio-temporal domain‌ need further improvements to‌ be similarly effective. Furthermore,‌‌ we will exploit appearance and motion information jointly‌ over a set of‌ videos. In the video‌‌ domain we will also‌ continue to work on learning recognition models from‌ subtitle and script information. The basis of leveraging‌ the script data which does not have a‌ temporal alignment with the video is to use‌ matches in the narrative in the script and‌ the subtitles (which do have a temporal alignment‌ with the video). We will go beyond simple‌ correspondences between names and verbs relating to self-motion,‌ and match more complex sentences related to interaction‌ with objects and other people. To deal with‌ the limited number of occurrences of such actions‌ in a single movie, we will consider approaches‌ that learn action models across a collection of‌ movies.
Online learning of visual models. As a‌ larger number of visual category models is being‌ learned, online learning methods become important, since new‌ training data and categories will arrive over time.‌ We will develop online learning methods that can‌ incorporate new examples for existing category models, and‌ learn new category models from few examples by‌ leveraging similarity to related categories using multi-task learning‌ methods. Here we will develop new distance-based classifiers‌ and attribute and label embedding techniques, and explore‌ the use of NLP techniques such as skipgram‌ models to automatically determine between which classes transfer‌ should occur. Moreover, NLP will be useful in‌ the context of learning models for many categories‌ to identify synonyms, and to determine cases of‌ polysemy (e.g. jaguar car brand v.s. jaguar animal),‌ and merge or refine categories accordingly. Ultimately this‌ will result in methods that are able to‌ learn an“encyclopedia” of visual models.
Visual search from‌ unstructured textual queries. We will build on recent‌ approaches that learn recognition models on-the-fly (as the‌ query is issued) from generic image search engines‌ such as Google Images. While it is feasible‌ to learn models in this manner in a‌ matter of seconds, it is challenging to use‌ the model to retrieve relevant content in real-time‌ from large video archives of more than a‌ few thousand hours. To achieve this requires feature‌ compression techniques to store visual representations in memory,‌ and cascaded search techniques to avoid exhaustive search.‌ This approach, however, leaves untouched the core problem‌ of how to associate visual material with the‌ textual query in the first place. The second‌ approach we will explore is based on image‌ annotation models. In particular we will go beyond‌ image-text retrieval methods by using recurrent neural networks‌ such as Elman networks or long short-term memory‌ (LSTM) networks to generate natural language sentences to‌ describe images.

3.3 Large-scale learning and optimization

We‌ have entered an era of massive data acquisition,‌ leading to the revival of an old scientific‌ utopia: it should be possible to better understand‌ the world by automatically converting data into knowledge.‌ It is also leading to a new economic‌ paradigm, where data is a valuable asset and‌ a source of activity. Therefore, developing scalable technology‌ to make sense of massive data has become a strategic issue. Computer‌ vision has already started‌ to adapt to these‌‌ changes.

In particular, very high-dimensional models such as‌ deep networks are becoming‌ highly popular and successful‌‌ for visual recognition. This change is closely related‌ to the advent of‌ big data. On the‌‌ one hand, these models involve a huge number‌ of parameters and are‌ rich enough to represent‌‌ well complex objects such as natural images or‌ text corpora. On the‌ other hand, they are‌‌ prone to overfitting (fitting too closely to training‌ data without being able‌ to generalize to new‌‌ unseen data) despite regularization; to work well on‌ difficult tasks, they require‌ a large amount of‌‌ labeled data that has been available only recently.‌ Other cues may explain‌ their success: the deep‌‌ learning community has made significant engineering efforts, making‌ it possible to learn‌ in a day on‌‌ a GPU large models that would have required‌ weeks of computations on‌ a traditional CPU, and‌‌ it has accumulated enough empirical experience to find‌ good hyper-parameters for its‌ networks.

To learn the‌‌ huge number of parameters of deep hierarchical models‌ requires scalable optimization techniques‌ and large amounts of‌‌ data to prevent overfitting. This immediately raises two‌ major challenges: how to‌ learn without large amounts‌‌ of labeled data, or with weakly supervised annotations?‌ How to efficiently learn‌ such huge-dimensional models? To‌‌ answer the above challenges, we will concentrate on‌ the design and theoretical‌ justifications of deep architectures‌‌ including our recently proposed deep kernel machines, with‌ a focus on weakly‌ supervised and unsupervised learning,‌‌ and develop continuous and discrete optimization techniques that‌ push the state of‌ the art in terms‌‌ of speed and scalability.

This research axis will‌ be developed into three‌ sub-tasks:

Deep kernel machines‌‌ for structured data. Deep kernel machines combine advantages‌ of kernel methods and‌ deep learning. Both approaches‌‌ rely on high-dimensional models. Kernels implicitly operate in‌ a space of possibly‌ infinite dimension, whereas deep‌‌ networks explicitly construct high-dimensional nonlinear data representations. Yet,‌ these approaches are complementary:‌ Kernels can be built‌‌ with deep learning principles such as hierarchies and‌ convolutions, and approximated by‌ multilayer neural networks. Furthermore,‌‌ kernels work with structured data and have well‌ understood theoretical principles. Thus,‌ a goal of the‌‌ Thoth project-team is to design and optimize the‌ training of such deep‌ kernel machines.
Large-scale parallel‌‌ optimization. Deep kernel machines produce nonlinear representations of‌ input data points. After‌ encoding these data points,‌‌ a learning task is often formulated as a‌ large-scale convex optimization problem‌; for example, this‌‌ is the case for linear support vector machines,‌ logistic regression classifiers, or‌ more generally many empirical‌‌ risk minimization formulations. We intend to pursue recent‌ efforts for making convex‌ optimization techniques that are‌‌ dedicated to machine learning more scalable. Most existing‌ approaches address scalability issues‌ either in model size‌‌ (meaning that the function to minimize is defined‌ on a domain of‌ very high dimension), or‌‌ in the amount of‌ training data (typically, the objective is a large‌ sum of elementary functions). There is thus a‌ large room for improvements for techniques that jointly‌ take these two criteria into account.
Large-scale graphical‌ models. To represent structured data, we will also‌ investigate graphical models and their optimization. The challenge‌ here is two-fold: designing an adequate cost function‌ and minimizing it. While several cost functions are‌ possible, their utility will be largely determined by‌ the efficiency and the effectiveness of the optimization‌ algorithms for solving them. It is a combinatorial‌ optimization problem involving billions of variables and is‌ NP-hard in general, requiring us to go beyond‌ the classical approximate inference techniques. The main challenges‌ in minimizing cost functions stem from the large‌ number of variables to be inferred, the inherent‌ structure of the graph induced by the interaction‌ terms (e.g., pairwise terms), and the high-arity terms‌ which constrain multiple entities in a graph.

4‌ Application domains

4.1 Visual applications

Any solution to‌ automatically understanding images and videos on a semantic‌ level will have an immediate impact on a‌ wide range of applications. For example:

Semantic-level image‌ and video access is highly relevant for visual‌ search on the Web, in professional archives and‌ personal collections.
Visual data organization is applicable to‌ organizing family photo and video albums as well‌ as to large-scale information retrieval.
Visual object recognition‌ has potential applications ranging from autonomous driving, to‌ service robotics for assistance in day-to-day activities as‌ well as the medical domain.
Real-time scene understanding‌ is relevant for human interaction through devices such‌ as HoloLens, Oculus Rift.

4.2 Pluri-disciplinary research

Machine‌ learning is intrinsically pluri-disciplinary. By developing large-scale machine‌ learning models and algorithms for processing data, the‌ Thoth team became naturally involved in pluri-disciplinary collaborations‌ that go beyond visual modelling. During the last‌ few years, Thoth has conducted several collaborations in‌ other fields such as neuroimaging, bioinformatics, ecology, natural‌ language processing, and remote sensing.

5 Social and‌ environmental responsibility

5.1 Footprint of research activities

Compute‌

A significant amount of the team’s computations are‌ performed on Jean Zay national cluster. According to‌ the cluster's reporting platform, 50k normalized GPU hours‌ have been used by the team, which amounts‌ to 1.2 tons eqCO2. Besides computations performed on‌ this cluster, the team maintained its own cluster,‌ on which part of the computations are done‌ as well. Assuming 10 GPUs are used at‌ all times (which is a rather generous estimate),‌ this amounts to less than 100k GPU hours‌ over the year. Most of these machines are‌ hosted in the datacenter of the IMAG building,‌ which is probably slightly less efficient than the‌ GENCI infrastructure. Overall, we estimate our local consumption‌ to be under 3 tons eqCO2.

In total,‌ we estimate the emissions of the team's compute‌ to be about 4 tons eqCO2. While we‌ do not provide impact if term of resources,‌ the team dedicated a special effort to keep local computing servers running‌ for as long as‌ possible, upgrading them when‌‌ possible to avoid replacing them.

This does not‌ count the Dino (V2‌ and V3) models, which‌‌ are significantly more expensive to train but are‌ also significantly more impactful‌ than an average research‌‌ paper, being used in more than 10K scientific‌ projects, and for which‌ full emissions data is‌‌ available.

Travel

The other main CO2eq footprint is‌ international flights. While we‌ did not gather specific‌‌ numbers, team members take special care in reducing‌ their plane travels (several‌ permanent researchers have not‌‌ traveled by plane for several years), refusing distant‌ invitations, as well as‌ encouraging less travel-hungry community‌‌ practices. This has led to a drastic reduction‌ of our travel impact‌ over the years, which‌‌ we will try to quantify for the next‌ activity report.

5.2 Impact‌ of research results

A‌‌ large part of Thoth's team research contributes to‌ advancing the field of‌ machine learning as a‌‌ whole. This improves and promotes Artificial Intelligence tools,‌ which have a large,‌ still growing, and controversial‌‌ societal impact (automation, recommendation algorithms, mass surveillance...). Besides‌ these impacts, machine learning‌ has a substantial (and‌‌ also growing) environmental footprint, and is especially prone‌ to the rebound effect,‌ making efficiency improvements unable‌‌ to reduce this impact.

Beyond methodological contributions, team‌ members make more targeted‌ applied contributions that leverage‌‌ Machine Learning for advancing other sciences (e.g., astrophysics,‌ earth science, physics simulations…).‌ Some of these projects‌‌ focus on reducing carbon footprints (e.g., by making‌ electricity management more efficient),‌ or preserving biodiversity (e.g.,‌‌ by better understanding ecosystem responses to human pressure‌ and global warming).

“Environmental-friendly”‌ contributions do not offset‌‌ the negative socio-environmental impacts of the current global‌ AI race, which should‌ be tackled at a‌‌ larger scale. Hence, Thoth team members are involved‌ at several levels (scientific‌ policy, popularization of science,‌‌ local socio-environmental initiatives) to support meaningful decision-making regarding‌ these issues and future‌ technological developments at a‌‌ broader level.

6 Highlights of the year

6.1‌ Awards

Prix Jeunes Talents‌ L'Oréal-UNESCO 2025 for Bianca‌‌ Marin Moreno
Karteek Alahari received the Outstanding IJCV‌ Editorial Board Member Award.‌
Julien Mairal received a‌‌ top reviewer award at NeurIPS 2025.
Michael Arbel‌ received a top reviewer‌ award at AISTATS 2025.‌‌
The spin-off Enhance Lab received the i-Lab prize‌ from BPI.
J. Chanussot‌ received the 2025 IEEE‌‌ GRSS Highest Impact Paper Award (HIPA) selected from‌ the 18670 papers published‌ in the journals of‌‌ the IEEE Geoscience and Remote Sensing Society in‌ 2020-2024
J. Chanussot was‌ recognized a Highly Cited‌‌ Research (Clarivate Analytics)

7 Latest software developments, platforms,‌ open data

7.1 Latest‌ software developments

7.1.1 Cyanure‌‌

Name:
Cyanure: An Open-Source Toolbox for Empirical Risk‌ Minimization
Functional Description:
Cyanure‌ is an open-source C++‌‌ software package with a Python interface. The goal‌ of Arsenic is to‌ provide state-of-the-art solvers for‌‌ learning linear models, based on stochastic variance-reduced stochastic‌ optimization with acceleration mechanisms‌ and Quasi-Newton principles. Arsenic‌‌ can handle a large‌ variety of loss functions (logistic, square, squared hinge,‌ multinomial logistic) and regularization functions (l2, l1, elastic-net,‌ fused Lasso, multi-task group Lasso). It provides a‌ simple Python API, which is very close to‌ that of scikit-learn, which should be extended to‌ other languages such as R or Matlab in‌ a near future.
Release Contributions:
packaging on conda‌ and pipy + various improvements
URL:
http://thoth.inrialpes.fr/people/mairal/arsenic/welcome.html
Contact:‌
Julien Mairal
Participant:
2 anonymous participants

7.1.2 MLXP‌

Name:
Machine Learning eXperimentalist for Python
Keywords:
Reproducibility,‌ Replication and consistency, Machine learning
Functional Description:
MLXP‌ is an open-source, simple, and lightweight experiment management‌ tool based on Python. It streamlines the experimental‌ process with minimal practitioner overhead while ensuring a‌ high level of reproducibility. As an open-source package,‌ MLXP facilitates experiment launching, logging, and efficient result‌ exploitation. Key components include automated job launching and‌ hierarchical configuration files, logging of experiment outputs along‌ with metadata, automated code and job version management,‌ seamless multi-job submission to a HPC job scheduler,‌ and intuitive result exploitation capabilities including querying results,‌ grouping and aggregation operations.
URL:
https://inria-thoth.github.io/mlxp/pages/master/index.html
Contact:
Michael‌ Arbel

8 New results

8.1 Visual Recognition

Object-wise‌ Distance Estimation for Event Camera Data

Participants: Nan‌ Cai, Pia Bideau.

Event cameras provide‌ a natural and data efficient representation of visual‌ information, motivating novel computational strategies towards extracting visual‌ information. Inspired by the biological vision system, in‌ this work 26 propose a behavior driven approach‌ for object-wise distance estimation from event camera data.‌ This behavior-driven method mimics how biological systems, like‌ the human eye, stabilize their view based on‌ object distance: distant objects require minimal compensatory rotation‌ to stay in focus, while nearby objects demand‌ greater adjustments to maintain alignment. This adaptive strategy‌ leverages natural stabilization behaviors to estimate relative distances‌ effectively. Unlike traditional vision algorithms that estimate depth‌ across the entire image, our approach targets local‌ depth estimation within a specific region of interest.‌ By aligning events within a small region, we‌ estimate the angular velocity required to stabilize the‌ image motion. We demonstrate that, under certain assumptions,‌ the compensatory rotational flow is inversely proportional to‌ the object's distance. The proposed approach achieves new‌ state-of-the-art accuracy in distance estimation on the dataset‌ EVIMO2.

Figure 1: Distance estimation from‌ event data.

Salience-SGG: Enhancing Unbiased Scene Graph Generation‌ with Iterative Salience Estimation

Participants: Runfeng Qu,‌ Ole Hall, Pia Bideau, Julie Ouerfelli-Ethier‌, Martin Rolfs, Klaus Obermayer, Olaf‌ Hellwich.

Scene Graph Generation (SGG) suffers from‌ a long-tailed distribution, where a few predicate classes‌ dominate while many others are underrepresented, leading to‌ biased models that underperform on rare relations. Unbiased-SGG‌ methods address this by implementing debiasing strategies, but‌ often at the cost of spatial understanding—resulting in‌ over-reliance on semantic priors. In 37, we‌ introduce Salience-SGG, a novel framework featuring an Iterative‌ Salience Decoder (ISD) that emphasizes triplets with salient‌ spatial structures. To support this, we propose semantic-agnostic salience labels guiding ISD.‌ Evaluations on Visual Genome,‌ Open Images V6, and‌‌ GQA-200 show that Salience-SGG achieves state-of-the-art performance and‌ improves existing Unbiased-SGG methods‌ in their spatial understanding‌‌ as demonstrated by the Pairwise Localization Average Precision.‌

Code is available on‌ github.

Figure‌‌ 2: Salience SGG. (a): Methods based on‌ standard debiasing, showing over-reliance‌ on the semantic information,‌‌ i.i.e. coat and hanging from (dashed lines). (b):‌ Our salience-enhanced model favors‌ spatially coherent triplets (bold).‌‌

Watching Swarm Dynamics from Above: A Framework for‌ Advanced Object Tracking in‌ Drone Videos

Participants: Pia‌‌ Bideau, Duc Pham, Félicie Dhellemmes,‌ Matthew Hansen, Jens‌ Krause.

Easily accessible‌‌ technologies, such as drones equipped with diverse onboard‌ sensors, have greatly expanded‌ opportunities to study animal‌‌ behavior in natural environments. However, analyzing large volumes‌ of unlabeled video data,‌ often spanning hours, remains‌‌ a significant challenge for machine learning, particularly in‌ computer vision. Existing approaches‌ typically process only a‌‌ small number of frames, and accurate georeferencing of‌ tracked positions is still‌ largely unresolved, particularly in‌‌ dynamic environments where static landmarks cannot be established.‌ In this work, we‌ focus on long-term tracking‌‌ of animal behavior in real-world geographic coordinates. To‌ address this challenge, we‌ utilize classical probabilistic methods‌‌ for state estimation, such as particle filtering. Particle‌ filters offer a useful‌ algorithmic structure for recursively‌‌ adding new incoming information and thus ensuring time‌ consistency. By incorporating recent‌ developments in semantic object‌‌ segmentation, we enable continuous tracking of rapidly evolving‌ object formations, even in‌ scenarios with limited data‌‌ availability. We propose a novel approach for tracking‌ schools of fish in‌ the open ocean from‌‌ drone videos. Our framework not only performs classical‌ object tracking in image‌ coordinates, instead it additionally‌‌ tracks the position and spatial expansion of the‌ fish school in geographic‌ coordinates by fusing video‌‌ data and the drone's on board sensor information‌ (GPS and IMU). No‌ landmarks with known geographic‌‌ coordinates are required, making the proposed method adaptable‌ to unstructured, dynamic environments‌ like the open ocean,‌‌ where static landmarks are unavailable. With this, the‌ presented framework enables researchers‌ to study the collective‌‌ behavior of fish schools within their social and‌ environmental context.

Code and‌ the newly introduced dataset‌‌ for tracking collective animal behavior over long time‌ horizons in marine environments‌ are available here.‌‌

Figure 3: An illustration of tracking‌ animal swarms in drone‌ videos using particle filters‌‌ and deep learning.

LUDVIG: Learning-free Uplifting of 2D‌ Visual features to Gaussian‌ Splatting scenes.

Participants: Juliette‌‌ Marrie, Romain Menegaux, Michael Arbel,‌ Diane Larlus, Julien‌ Mairal.

In 34‌‌, we address the problem of extending the‌ capabilities of vision foundation‌ models such as DINO,‌‌ SAM, and CLIP, to 3D tasks. Specifically, we‌ introduce a novel method‌ to uplift 2D image‌‌ features into 3D Gaussian Splatting scenes. Unlike traditional‌ approaches that rely on‌ minimizing a reconstruction loss,‌‌ our method employs a‌ simpler and more efficient feature aggregation technique, augmented‌ by a graph diffusion mechanism. Graph diffusion enriches‌ features from a given model, such as CLIP,‌ by leveraging 3D geometry and pairwise similarities induced‌ by another strong model such as DINOv2. Our‌ approach achieves performance comparable to the state of‌ the art on multiple downstream tasks while delivering‌ significant speed-ups. Notably, we obtain competitive segmentation results‌ using generic DINOv2 features, despite DINOv2 not being‌ trained on millions of annotated segmentation masks like‌ SAM. When applied to CLIP features, our method‌ demonstrates strong performance in open-vocabulary object detection tasks,‌ highlighting the versatility of our approach.

Figure‌ 4: Illustration of the LUDVIG approach.

Cluster‌ and Predict Latent Patches for Improved Masked Image‌ Modeling

Participants: Maxime Oquab, Federico Baldassarre,‌ Timothee Darcet, Julien Mairal, Piotr Bojanowski‌.

Masked Image Modeling (MIM) offers a promising‌ approach to self-supervised representation learning, however existing MIM‌ models still lag behind the state-of-the-art. In this‌ paper 8, we systematically analyze target representations,‌ loss functions, and architectures, to introduce CAPI -‌ a novel pure-MIM framework that relies on the‌ prediction of latent clusterings. Our approach leverages a‌ clustering-based loss, which is stable to train, and‌ exhibits promising scaling properties. Our ViT-L backbone, CAPI,‌ achieves 83.8% accuracy on ImageNet and 32.1% mIoU‌ on ADE20K with simple linear probes, substantially outperforming‌ previous MIM methods and approaching the performance of‌ the current state-of-the-art, DINOv2. The approach is illustrated‌ in Figure 5.

Figure 5:‌ Illustration of the CAPI approach.

Entropy Rectifying Guidance‌ for Diffusion and Flow Models

Participants: Tariq Berrada‌ Ifriqi, Adriana Romero-Soriano, Michal Drozdzal,‌ Jakob Verbeek, Karteek Alahari.

Guidance techniques‌ are commonly used in diffusion and flow models‌ to improve image quality and input consistency for‌ conditional generative tasks such as class- conditional and‌ text-to-image generation. In particular, classifier-free guidance (CFG) is‌ the most widely adopted guidance technique. It results,‌ however, in trade-offs across quality, diversity and consistency:‌ improving some at the expense of others. While‌ recent work has shown that it is possible‌ to disentangle these factors to some extent, such‌ methods come with an overhead of requiring an‌ additional (weaker) model, or require more forward passes‌ per sampling step. In this work 29,‌ we propose Entropy Rectifying Guidance (ERG), a simple‌ and effective guidance method based on inference-time changes‌ in the attention mechanism of state-of-the-art diffusion transformer‌ architectures, which allows for simultaneous improvements over image‌ quality, diversity and prompt consistency. ERG is more‌ general than CFG and similar guidance techniques, as‌ it extends to unconditional sampling. We show that‌ ERG results in significant improvements in various tasks,‌ including text-to-image, class-conditional and unconditional image generation (see‌ examples in Figure 6). We also show‌ that ERG can be seamlessly combined with other‌ recent guidance methods such as CADS and APG,‌ further improving generation results.

Figure 6: Qualitative comparison of classifier-free‌ guidance (CFG) and our‌ Entropy Rectifying Guidance (ERG).‌‌

Boosting Latent Diffusion with Perceptual Objectives

Participants: Tariq‌ Berrada Ifriqi, Pietro‌ Astolfi, Melissa Hall‌‌, Marton Havasi, Yohann Benchetrit, Adriana‌ Romero-Soriano, Karteek Alahari‌, Michal Drozdzal,‌‌ Jakob Verbeek.

Latent diffusion models (LDMs) power‌ state-of-the-art high-resolution generative image‌ models. LDMs learn the‌‌ data distribution in the latent space of an‌ autoencoder (AE) and produce‌ images by mapping the‌‌ generated latents into RGB image space using the‌ AE decoder. While this‌ approach allows for efficient‌‌ model training and sampling, it induces a disconnect‌ between the training of‌ the diffusion model and‌‌ the decoder, resulting in a loss of detail‌ in the generated images.‌ To remediate this disconnect,‌‌ we propose to leverage the internal features of‌ the decoder to define‌ a latent perceptual loss‌‌ (LPL) 23. This loss encourages the models‌ to create sharper and‌ more realistic images. Our‌‌ loss can be seamlessly integrated with common autoencoders‌ used in latent diffusion‌ models, and can be‌‌ applied to different generative modeling paradigms such as‌ DDPM with epsilon and‌ velocity prediction, as well‌‌ as flow matching. Extensive experiments with models trained‌ on three datasets at‌ 256 and 512 resolution‌‌ show improved quantitative – with boosts between $6‌ %$ and $20 %‌$ in FID – and‌‌ qualitative results when using our perceptual loss (see‌ examples in Figure 7‌.

Figure 7‌‌: Samples from models trained with and without‌ our latent perceptual loss‌ on CC12M.

Lightweight Structure-Aware‌‌ Attention for Visual Understanding

Participants: Heeseung Kwon,‌ Francisco M. Castro,‌ Manuel J. Marin-Jimenez,‌‌ Nicolas Guil, Karteek Alahari.

Attention operator‌ has been widely used‌ as a basic brick‌‌ in visual understanding since it provides some flexibility‌ through its adjustable kernels.‌ However, this operator suffers‌‌ from inherent limitations: (1) the attention kernel is‌ not discriminative enough, resulting‌ in high redundancy, and‌‌ (2) the complexity in computation and memory is‌ quadratic in the sequence‌ length. In this work‌‌ 13, we propose a novel attention operator,‌ called Lightweight Structure-aware Attention‌ (LiSA), which has a‌‌ better representation power with log-linear complexity (see Figure‌ 8). Our operator‌ transforms the attention kernels‌‌ to be more discriminative by learning structural patterns.‌ These structural patterns are‌ encoded by exploiting a‌‌ set of relative position embeddings (RPEs) as multiplicative‌ weights, thereby improving the‌ representation power of the‌‌ attention kernels. Additionally, the RPEs are approximated to‌ obtain log-linear complexity. Our‌ experiments and analyses demonstrate‌‌ that the proposed operator outperforms self-attention and other‌ existing operators, achieving state-of-the-art‌ results on ImageNet-1K and‌‌ other downstream tasks such as video action recognition‌ on Kinetics-400, object detection‌ and instance segmentation on‌‌ COCO, and semantic segmentation on ADE-20K.

Figure‌ 8: Self-attention vs.‌ LiSA. (a) Process of‌‌ self-attention & LiSA: LiSA updates the attention to‌ the structure-aware attention via‌ RPEs. (b) Feature visualization‌‌ of self-attention & LiSA:‌ compared to self-attention, LiSA learns better features by‌ capturing geometric structural patterns.

Source-free video domain adaptation‌ by learning from noisy labels

Participants: Avijit Dasgupta‌, C. V. Jawahar, Karteek Alahari.‌

Despite the progress seen in classification methods, current‌ approaches for handling videos with distribution shifts in‌ source and target domains remain source-dependent as they‌ require access to the source data during the‌ adaptation stage. In this paper 9, we‌ present a self-training based source-free video domain adaptation‌ approach to address this challenge by bridging the‌ gap between the source and the target domains.‌ We use the source pre-trained model to generate‌ pseudo-labels for the target domain samples, which are‌ inevitably noisy. Thus, we treat the problem of‌ source-free video domain adaptation as learning from noisy‌ labels and argue that the samples with correct‌ pseudo-labels can help us in adaptation. To this‌ end, we leverage the cross-entropy loss as an‌ indicator of the correctness of the pseudo-labels and‌ use the resulting small-loss samples from the target‌ domain for fine-tuning the model. We further enhance‌ the adaptation performance by implementing a teacher–student (TS)‌ framework, in which the teacher, which is updated‌ gradually, produces reliable pseudo-labels. Meanwhile, the student undergoes‌ fine-tuning on the target domain videos using these‌ generated pseudo-labels to improve its performance. Extensive experimental‌ evaluations show that our methods, termed as CleanAdapt,‌ CleanAdapt + TS, achieve state-of-the-art results, outperforming the‌ existing approaches on various open datasets. Our source‌ code is publicly available.

Figure 9‌: Existing approaches have a source-dependent adaptation stage‌ achieving marginal performance gain over the source-pretrained models.‌ On the other hand, our proposed methods CleanAdapt‌ and CleanAdapt + TS achieve significant performance improvements‌ over the source-only model while being source-free (i.e.,‌ the adaptation stage does not require videos from‌ the source domain).

Flowception: Temporally Expansive Flow Matching‌ for Video Generation

Participants: Tariq Berrada Ifriqi,‌ John Nguyen, Karteek Alahari, Jakob Verbeek‌, Ricky T. Q. Chen.

We present‌ Flowception 46, a novel non-autoregressive and variable-length‌ video generation framework. Flowception learns a probability path‌ that interleaves discrete frame insertions with continuous frame‌ denoising. Compared to autoregressive methods, Flowception alleviates error‌ accumulation/drift as the frame insertion mechanism during sampling‌ serves as an efficient compression mechanism to handle‌ long-term context (see examples in Figure 10).‌ Compared to full-sequence flows, our method reduces FLOPs‌ for training three-fold, while also being more amenable‌ to local attention variants, and allowing to learn‌ the length of videos jointly with their content.‌ Quantitative experimental results show improved FVD and VBench‌ metrics over autoregressive and full-sequence baselines, which is‌ further validated with qualitative results. Finally, by learning‌ to insert and denoise frames in a sequence,‌ Flowception seamlessly integrates different tasks such as image-to-video‌ generation and video interpolation.

Figure 10:‌ Examples of image-to-video (I2V) generation and video interpolation‌ with Flowception. Input frames marked by dashed boundaries.

Online In-Context Distillation for‌ Low-Resource Vision Language Models‌

Participants: Zhiqi Kang,‌‌ Rahaf Aljundi, Vaggelis Dorovatas, Karteek Alahari‌.

As the field‌ continues its push for‌‌ ever more resources, this work turns the spotlight‌ on a critical question:‌ how can vision-language models‌‌ (VLMs) be adapted to thrive in low-resource, budget-constrained‌ settings? While large VLMs‌ offer strong performance, they‌‌ are impractical to deploy in such settings. Small‌ VLMs, on the other‌ hand, are efficient but‌‌ typically require costly fine-tuning to close the performance‌ gap with larger models‌ in the deployment domain.‌‌ Inspired by the in-context learning framework, we propose‌ an online In-Context Distillation‌ (ICD) method 48,‌‌ in which a small VLM collaborates with a‌ stronger teacher model at‌ inference time, distilling its‌‌ knowledge via sparse demonstrations to efficiently bridge the‌ gap between them (see‌ overview in Figure 11‌‌). Our method is built on an in-depth‌ analysis that identifies the‌ scale and the choice‌‌ of models for which vision-language ICL is currently‌ feasible, and demonstrates the‌ advantage of ICL over‌‌ fine-tuning under constrained compute budgets. We enhance our‌ method with a novel‌ cross-modal demonstration selection strategy,‌‌ teacher test-time scaling to reduce noise, and student‌ uncertainty conditioning to dynamically‌ populate a demonstration pool‌‌ and minimize teacher queries. Our ICD method significantly‌ boosts the performance of‌ small models (up to‌‌ $33 %$ ) using scarce teacher annotations (as‌ low as $4 %‌$ ), and competes with‌‌ the teacher's zero-shot performance.

Figure 11:‌ Overview of our online‌ In-Context Distillation framework.

8.2‌‌ Statistical Machine Learning and Optimization

Counterfactual Learning of‌ Stochastic Policies with Continuous‌ Actions

Participants: Houssam Zenati‌‌, Pierre Gaillard, Julien Mairal.

Counterfactual‌ reasoning from logged data‌ has become increasingly important‌‌ for many applications such as web advertising or‌ healthcare. In 20,‌ we address the problem‌‌ of counterfactual learning of stochastic policies with continuous‌ actions, which raises difficult‌ challenges about (i) data‌‌ modelization, (ii) optimization, and (iii) evaluation on real‌ data. First, we introduce‌ a modeling strategy based‌‌ on a joint kernel embedding of contexts and‌ actions, illustrated in Figure‌ 12 which overcomes the‌‌ shortcomings of previous discretization strategies as shown in‌ 9. Second, we empirically‌ show that the optimization‌‌ aspect of counterfactual learning is more important than‌ previously thought, and we‌ demonstrate the benefits of‌‌ proximal point algorithms and differentiable estimators. Finally, we‌ propose an evaluation protocol‌ for offline policies in‌‌ real-world logged systems, which is challenging since policies‌ cannot be replayed on‌ test data, and we‌‌ release a new large-scale dataset along with multiple‌ synthetic, yet realistic, evaluation‌ setups.

Figure 12‌‌: Illustration of the counterfactual modeling approach.

MAP‌ Estimation with Denoisers: Convergence‌ Rates and Guarantees

Participants:‌‌ Scott Pesme, Giacomo Meanti, Michael Arbel‌, Julien Mairal.‌

Denoiser models have become‌‌ powerful tools for inverse problems, enabling the use‌ of pretrained networks to‌ approximate the score of‌‌ a smoothed prior distribution.‌ These models are often used in heuristic iterative‌ schemes aimed at solving Maximum a Posteriori (MAP)‌ optimisation problems, where the proximal operator of the‌ negative log-prior plays a central role. In practice,‌ this operator is intractable, and practitioners plug in‌ a pretrained denoiser as a surrogate-despite the lack‌ of general theoretical justification for this substitution. In‌ 36, we show that a simple algorithm,‌ closely related to several used in practice, provably‌ converges to the proximal operator under a log-concavity‌ assumption on the prior p. We show that‌ this algorithm can be interpreted as a gradient‌ descent on smoothed proximal objectives. Our analysis thus‌ provides a theoretical foundation for a class of‌ empirically successful but previously heuristic methods. This result‌ is provided in Figure 13.

Logarithmic Regret for‌ Unconstrained Submodular Maximization Stochastic Bandit

Participants: Julien Zhou‌, Pierre Gaillard, Thibaud Rahier, Julyan‌ Arbel.

In 40, we address the‌ online unconstrained submodular maximization problem (Online USM), in‌ a setting with stochastic bandit feedback. In this‌ framework, a decision-maker receives noisy rewards from a‌ nonmonotone submodular function, taking values in a known‌ bounded interval. This paper proposes Double-Greedy - Explore-then-Commit‌ (DG-ETC), adapting the Double-Greedy approach from the offline‌ and online full-information settings. DG-ETC satisfies a $O‌ (d l o g (d T‌))$ problem dependent upper bound for the‌ $1 / 2$ -approximate pseudo-regret, as well as‌ a $O (d T^{2 / 3‌} l o g {( d T)}^{1‌ / 3})$ problem-free one at the same‌ time, outperforming existing approaches. To that end, we‌ introduce a notion of hardness for submodular functions,‌ characterizing how difficult it is to maximize them‌ with this type of strategy.

Figure 14‌: Illustration of our new notion of hardness‌ for submodular bandits. Logarithmic regret can be achieved‌ as soon as the problem parameters $α$ and‌ $β$ are different.

Locally Adaptive Online Nonparametric Regression‌

Participants: Paul Liautaud, Pierre Gaillard, Olivier‌ Wintenberger.

In 32 and 31, We‌ study online adversarial regression with convex losses against‌ a rich class of continuous yet highly irregular‌ prediction rules, modeled by Besov spaces $B_{p‌, q}^{s}$ with general parameters $1 \leq‌ p, q \leq \infty$ and smoothness $s‌ > \frac{d}{p}$ . We introduce an adaptive‌ wavelet-based algorithm that performs sequential prediction without prior‌ knowledge of $(s, p, q‌)$ , and establish minimax-optimal regret bounds against‌ any comparator in ${B}_{p, q}^{s‌}$ . We further design a locally adaptive extension‌ capable of dynamically tracking spatially inhomogeneous smoothness. This‌ adaptive mechanism adjusts the resolution of the predictions‌ over both time and space, yielding refined regret‌ bounds in terms of local regularity. Consequently, in‌ heterogeneous environments, our adaptive guarantees can significantly surpass‌ those obtained by standard global methods.

Figure 15: Theoretical and‌ practical regrets achieved by‌ our two procedures on‌‌ simulated data.

Online Learning Approach for Survival Analysis‌

Participants: Camila Fernandez,‌ Pierre Gaillard, Olivier‌‌ Wintenberger.

In 10, we introduce an‌ online mathematical framework for‌ survival analysis, allowing real‌‌ time adaptation to dynamic environments and censored data.‌ This framework enables the‌ estimation of event time‌‌ distributions through an optimal second order online convex‌ optimization algorithm—Online Newton Step‌ (ONS). This approach, previously‌‌ unexplored, presents substantial advantages, including explicit algorithms with‌ non-asymptotic convergence guarantees. Moreover,‌ we analyze the selection‌‌ of ONS hyperparameters, which depends on the exp-concavity‌ property and has a‌ significant influence on the‌‌ regret bound. We introduce an adaptive aggregation method‌ that ensures robustness in‌ hyperparameter selection while maintaining‌‌ fast regret bounds. These findings can extend beyond‌ the survival analysis field,‌ and are relevant for‌‌ any case characterized by poor exp-concavity and unstable‌ ONS. Additionally, we propose‌ a stochastic approach for‌‌ ONS that guarantees logarithmic regret in the case‌ of an exponential hazard‌ model. Next, these assertions‌‌ are illustrated by simulation experiments, followed by an‌ application to a real‌ dataset. Fernandez et al.‌‌55 also provides some experimental comparison of existing‌ algorithms for survival analysis.‌

Figure 16:‌‌ Estimation errors of our algorithms on simulated survival‌ data.

Efficient and Near-Optimal‌ Online Portfolio Selection

Participants:‌‌ Rémi Jézéquel, Dmitrii Ostrovski, Pierre Gaillard‌.

In 12,‌ we study online portfolio‌‌ selection as introduced by Cover (1991), where a‌ trader allocates wealth over‌ $d$ assets across $T‌‌$ rounds to maximize logarithmic return. Cover’s Universal Portfolios‌ achieve worst-case optimal $O‌ (d log T‌‌)$ regret but require costly $d$ -dimensional integration,‌ leading to a prohibitive‌ $\tilde{O} ({d‌‌}^{4} {(T + d)}^{1 /‌ 4})$ per-round runtime.‌ We propose a new‌‌ algorithm achieving essentially the same regret—up to constants‌ and replacing $log T‌$ with $log (T‌‌ + d)$ —with a drastically improved runtime‌ of $\tilde{O} (‌ d^{2} (T‌‌ + d))$ per round. Our method‌ selects portfolios by minimizing‌ logarithmic loss regularized by‌‌ a log-determinant barrier, revealing connections between online portfolio‌ selection and classical cutting-plane‌ and interior-point methods.

Online‌‌ Convex Reinforcement Learning with applications to Demand-Side Management.‌

Participants: Bianca Marin Moreno‌, Khaled Eldowa,‌‌ Margaux Brégère, Pierre Gaillard, Nadia Oudjane‌.

To counter the‌ challenge of integrating fluctuating‌‌ renewables into the grid, devices like thermostatically controlled‌ loads (water-heaters, air conditioners,‌ etc) offer flexible demand.‌‌ However, efficiently controlling a large population of these‌ devices to track desired‌ consumption signals remains a‌‌ complex challenge. Existing methods lack convergence guarantees and‌ computational efficiency, or resort‌ to regularization techniques instead‌‌ of tackling the target tracking problem directly. 14‌ addresses these drawbacks. We‌ propose to model the‌‌ problem as a finite horizon episodic Markov decision‌ process, enabling us to‌ adapt convex optimization algorithms‌‌ with convergence guarantees and‌ computational efficiency. This framework also extends to online‌ learning scenarios, where daily control decisions are made‌ without prior knowledge of consumer behavior and with‌ daily-changing target profiles due to fluctuations of energy‌ production and inflexible consumption. We introduce a new‌ algorithm, called Online Target Tracker (OTT), the first‌ online learning load control method, for which we‌ prove sub-linear regret. We demonstrate our claims with‌ realistic experiments. This combination of optimization and learning‌ lays the groundwork for more dynamic and efficient‌ load control methods. 33 studies a generalization of‌ episodic Reinforcement Learning to convex losses that could‌ be applied for Demand-Side Management in an unknown‌ environment. By introducing a reset-free framework called the‌ periodic framework, 49 weakens the episodic assumption to‌ avoid having to reset the population of the‌ devices to the initial distribution at every episode.‌

Figure 17: Demand Side Management Problem.‌

Optimized projection-free algorithms for online learning: construction and‌ worst-case analysis

Participants: Julien Weibel, Pierre Gaillard‌, Wouter Koolen, Adrien Taylor.

In‌ 53, we study projection-free algorithms for online‌ learning with linear optimization oracles (Frank–Wolfe methods) to‌ handle constrained decision sets. We propose an optimized‌ variant of an online Frank–Wolfe algorithm with a‌ simple potential-based analysis, and introduce a semidefinite programming‌ framework to jointly design and analyze such algorithms.‌ Our numerical results suggest that no pure online‌ Frank–Wolfe method in this model class can achieve‌ regret better than $O (T^{3 /‌ 4})$ without additional assumptions. We further observe‌ suboptimal constants in existing methods, anytime guarantees of‌ order $O ({t}^{3 / 4})‌$ , and limited benefits from multiple linear optimization‌ steps per round.

Figure 18: Comparison‌ of known regret upper bounds against tight numerical‌ bounds obtained from our analysis.

Optimal and Efficient‌ Algorithms for Multinomial Logistic Bandits

Participants: Pierre Boudart‌, Pierre Gaillard, Alessandro Rudi, Aadirupa‌ Saha.

In 38 and 44, we‌ study active online assortment optimization with preference feedback,‌ a framework for modeling user choice and subsetwise‌ utility maximization with applications in advertising, online retail,‌ recommendation systems, and language model fine-tuning. Existing approaches‌ often rely on unrealistic assumptions such as strong‌ reference items or repeated identical assortments. In 38‌, we design efficient regret-minimization algorithms that remove‌ both of these assumptions. In 44, we‌ improve the asymptotic regret by a constant that‌ may be exponentially large in some cases.

Figure 19: Comparison of the error‌ obtained when varying the number of feedback in‌ MNL bandits.

Advancing Prompt-Based Methods for Replay-Independent General‌ Continual Learning

Participants: Zhiqi Kang, Liyuan Wang‌, Xingxing Zhang, Karteek Alahari.

General‌ continual learning (GCL) is a broad concept to‌ describe real-world continual learning (CL) problems, which are‌ often characterized by online data streams without distinct‌ transitions between tasks, i.e., blurry task boundaries. Such‌ requirements result in poor initial performance, limited generalizability, and severe catastrophic forgetting,‌ heavily impacting the effectiveness‌ of mainstream GCL models‌‌ trained from scratch (see illustration in Figure 20‌). While the use‌ of a frozen pretrained‌‌ backbone with appropriate prompt tuning can partially address‌ these challenges, such prompt-based‌ methods remain suboptimal for‌‌ CL of remaining tunable parameters on the fly.‌ In this regard, we‌ propose an innovative approach‌‌ named MISA (Mask and Initial Session Adaption) to‌ advance prompt-based methods in‌ GCL 30. It‌‌ includes a forgetting-aware initial session adaption that employs‌ pretraining data to initialize‌ prompt parameters and improve‌‌ generalizability, as well as a non-parametric logit mask‌ of the output layers‌ to mitigate catastrophic forgetting.‌‌ Empirical results demonstrate substantial performance gains of our‌ approach compared to recent‌ competitors, especially without a‌‌ replay buffer (e.g., up to 18.39, 22.06, and‌ 11.96 points performance lead‌ on CIFAR-100, Tiny-ImageNet, and‌‌ ImageNet-R, respectively). Moreover, our approach features the plug-in‌ nature for prompt-based methods,‌ independence of replay, ease‌‌ of implementation, and avoidance of CL-relevant hyperparameters, serving‌ as a strong baseline‌ for GCL research. Our‌‌ source code is publicly available.

Figure‌ 20: Problem setup‌ and motivation. Left: illustration‌‌ of the GCL data stream. Mid: average prediction‌ accuracy at different timesteps‌ in GCL. Right: session‌‌ 1 accuracy, where we evaluate the retention of‌ knowledge acquired at session‌ 1 after each session.‌‌

Unified Breakdown Analysis for Byzantine Robust Gossip

Participants:‌ Renaud Gaucher, Aymeric‌ Dieuleveut, Hadrien Hendrikx‌‌.

Distributed approaches have many computational benefits, but‌ they are vulnerable to‌ attacks from a subset‌‌ of devices transmitting incorrect information. This work 28‌ investigates Byzantine-resilient algorithms in‌ a decentralized setting, where‌‌ devices communicate directly with one another. We investigate‌ the notion of breakdown‌ point, and show an‌‌ upper bound on the number of adversaries that‌ decentralized algorithms can tolerate.‌ This is done through‌‌ careful study of a specific graph topology, presented‌ in Figure 21.‌ We introduce CG +‌‌ , an algorithm at the intersection of ClippedGossip‌ and NNA, two popular‌ approaches for robust decentralized‌‌ learning. CG + meets our upper bound, and‌ thus obtains optimal robustness‌ guarantees, whereas neither of‌‌ the existing two does. We provide experimental evidence‌ for this gap by‌ presenting an attack tailored‌‌ to sparse graphs which breaks NNA but against‌ which CG + is‌ robust.

Figure 21‌‌: This Figure shows the graph construction used‌ for upper bound on‌ the maximum number of‌‌ Byzantine nodes than can be tolerated.

Byzantine-Robust Gossip:‌ Insights from a Dual‌ Approach

Participants: Renaud Gaucher‌‌, Aymeric Dieuleveut, Hadrien Hendrikx.

Distributed‌ learning has many computational‌ benefits but is vulnerable‌‌ to attacks from a subset of devices transmitting‌ incorrect information. This paper‌ 45 investigates Byzantine-resilient algorithms‌‌ in a decentralized setting, where devices communicate directly‌ in a peer-to-peer manner‌ within a communication network.‌‌ We leverage the so-called dual approach for decentralized‌ optimization and propose a‌ Byzantine-robust algorithm. We provide‌‌ convergence guarantees in the‌ average consensus subcase, discuss the potential of the‌ dual approach beyond this subcase, and re-interpret existing‌ algorithms using the dual framework, under the general‌ update rule presented in Figure 22. Lastly,‌ we experimentally show the soundness of our method.‌

Figure 22: This Figure shows the‌ main update of the dual robust algorithm.

A‌ Theoretical Framework for Grokking: Interpolation followed by Riemannian‌ Norm Minimisation

Participants: Etienne Boursier, Stott Pesme‌, Radu-Alexandru Dragomir.

In 25, we‌ study the dynamics of gradient flow with small‌ weight decay on general training losses $F$ .‌ Under mild regularity assumptions and assuming convergence of‌ the unregularised gradient flow, we show that the‌ trajectory with weight decay $λ$ exhibits a two-phase‌ behaviour as $λ \to 0$ . During the‌ initial fast phase, the trajectory follows the unregularised‌ gradient flow and converges to a manifold of‌ critical points of $F$ . Then, at time‌ of order $1 / λ$ , the trajectory‌ enters a slow drift phase and follows a‌ Riemannian gradient flow minimising the $ℓ_{2}$ -norm‌ of the parameters. This purely optimisation-based phenomenon offers‌ a natural explanation for the grokking effect observed‌ in deep learning, where the training loss rapidly‌ reaches zero while the test loss plateaus for‌ an extended period before suddenly improving. We argue‌ that this generalisation jump can be attributed to‌ the slow norm reduction induced by weight decay,‌ as explained by our analysis. We validate this‌ mechanism empirically on several synthetic regression tasks. This‌ mechanism is illustrated in Figure 23.

Figure 23: This Figure illustrate the grokking‌ mechanism.

Flow Matching for Robust Simulation-Based Inference under‌ Model Misspecification

Participants: Pierre-Louis Ruhlmann, Pedro Rodrigues‌, Michael Arbel, Florence Forbes.

Simulation-based‌ inference (SBI) is transforming experimental sciences by enabling‌ parameter estimation in complex non-linear models from simulated‌ data. A persistent challenge, however, is model misspecification:‌ simulators are only approximations of reality, and mismatches‌ between simulated and real data can yield biased‌ or overconfident posteriors. In 51 We address this‌ issue by introducing Flow Matching Corrected Posterior Estimation‌ (FMCPE), a framework that leverages the flow matching‌ paradigm to refine simulation-trained posterior estimators using a‌ small set of real calibration samples, as illustrated‌ in Figure 24. Our approach proceeds in‌ two stages: first, a posterior approximator is trained‌ on abundant simulated data; second, flow matching transports‌ its predictions toward the true posterior supported by‌ real observations, without requiring explicit knowledge of the‌ misspecification. This design enables FMCPE to combine the‌ scalability of SBI with robustness to distributional shift.‌ Across synthetic benchmarks and real-world datasets, we show‌ that our proposal consistently mitigates the effects of‌ misspecification, delivering improved inference accuracy and uncertainty calibration‌ compared to standard SBI baselines, while remaining computationally‌ efficient.

Figure 24: High-level description of‌ FMCPE algorithm.

Simulation-based inference of yeast centromeres

Participants:‌ Eloïse Touron, Pedro Rodrigues, Julyan Arbel, Nelle Varoquaux,‌ Michael Arbel.

The‌ chromatin folding and the‌‌ spatial arrangement of chromosomes in the cell play‌ a crucial role in‌ DNA replication and genes‌‌ expression. An improper chromatin folding could lead to‌ malfunctions and, over time,‌ diseases. For eukaryotes, centromeres‌‌ are essential for proper chromosome segregation and folding.‌ Despite extensive research using‌ de novo sequencing of‌‌ genomes and annotation analysis, centromere locations in yeasts‌ remain difficult to infer‌ and are still unknown‌‌ in most species. Recently, genome-wide chromosome conformation capture‌ coupled with next-generation sequencing‌ (Hi-C) has become one‌‌ of the leading methods to investigate chromosome structures.‌ Some recent studies have‌ used Hi-C data to‌‌ give a point estimate of each centromere, but‌ those approaches highly rely‌ on a good pre-localization.‌‌ In 39, we present a novel approach‌ that infers in a‌ stochastic manner the locations‌‌ of all centromeres in budding yeast based on‌ both the experimental Hi-C‌ map and simulated contact‌‌ maps using a neural network model as illustrated‌ in Figure 25.‌

Figure 25:‌‌ Architecture of the tranformer-based model.

Dual Perspectives on‌ Non-Contrastive Self-Supervised Learning

Participants:‌ Jean Ponce, Basile‌‌ Terver, Martial Hebert, Michael Arbel.‌

The stop gradient and‌ exponential moving average iterative‌‌ procedures are commonly used in non-contrastive approaches to‌ self-supervised learning to avoid‌ representation collapse, with excellent‌‌ performance in downstream applications in practice. In 50‌, we investigate these‌ procedures from the dual‌‌ viewpoints of optimization and dynamical systems. We show‌ that, in general, although‌ they do not optimize‌‌ the original objective, or any other smooth function,‌ they do avoid collapse.‌ Following prior work, but‌‌ without any of the extra assumptions used in‌ their proofs, we then‌ show using a dynamical‌‌ system perspective that, in the linear case, minimizing‌ the original objective function‌ without the use of‌‌ a stop gradient or exponential moving average always‌ leads to collapse, as‌ shown in Figure 26‌‌. Conversely, we characterize explicitly the equilibria of‌ the dynamical systems associated‌ with these two procedures‌‌ in this linear setting as algebraic varieties in‌ their parameter space, and‌ show that they are,‌‌ in general, asymptotically stable. Our theoretical findings‌ are illustrated by empirical‌ experiments with real and‌‌ synthetic data.

Figure 26: Illustration of‌ the optimization landscape for‌ the objective funtion used‌‌ in non-contrastive self-supervised learning.

Learning Theory for Kernel‌ Bilevel Optimization

Participants: Fares‌ El Khoury, Edouard‌‌ Pauwels, Samuel Vaiter, Michael Arbel.‌

Bilevel optimization has emerged‌ as a technique for‌‌ addressing a wide range of machine learning problems‌ that involve an outer‌ objective implicitly determined by‌‌ the minimizer of an inner problem. In 27‌, we investigate the‌ generalization properties for kernel‌‌ bilevel optimization problems where the inner objective is‌ optimized over a Reproducing‌ Kernel Hilbert Space. This‌‌ setting enables rich function approximation while providing a‌ foundation for rigorous theoretical‌ analysis. In this context,‌‌ we establish novel generalization‌ error bounds for the bilevel problem under finite-sample‌ approximation. Our approach adopts a functional perspective, inspired‌ by (Petrulionyte et al., 2024), and leverages tools‌ from empirical process theory and maximal inequalities for‌ degenerate -processes to derive uniform error bounds. The‌ results rely on an equivalence we establish between‌ the estimator implemented in practice and an abstract‌ one derived using the functional perspective that is‌ more amenable to a statistical analysis, as shown‌ in Figure 27. These generalization error estimates‌ allow to characterize the statistical accuracy of gradient-based‌ methods applied to the empirical discretization of the‌ bilevel problem.

Figure 27: A commutative‌ diagram illustrating that plug-in statistical estimation and differentiation‌ can be interchanged.

EquiTabPFN: A Target-Permutation Equivariant Prior‌ Fitted Network

Participants: Michael Arbel, David Salinas‌, Frank Hutter.

Recent foundational models for‌ tabular data, such as TabPFN, have demonstrated remarkable‌ effectiveness in adapting to new tasks through in-context‌ learning. However, these models overlook a crucial equivariance‌ property: the arbitrary ordering of target dimensions should‌ not influence model predictions. In 22, we‌ identify this oversight as a source of incompressible‌ error, termed the equivariance gap, which introduces instability‌ in predictions. To mitigate these issues, we propose‌ a novel model designed to preserve equivariance across‌ output dimensions, as shown in Figure 28.‌ Our experimental results indicate that our proposed model‌ not only addresses these pitfalls effectively but also‌ achieves competitive benchmark performance.

Figure 28:‌ Overview of EquiTabPFN’s architecture.

8.3 Scientific Imaging and‌ Remote Sensing

A New Statistical Model of Star‌ Speckles for Learning to Detect and Characterize Exoplanets‌ in Direct Imaging Observations

Participants: Theo Bodrito,‌ Olivier Flasseur, Julien Mairal, Jean Ponce‌, Maud Langlois, Anne-Marie Lagrange.

The‌ search for exoplanets is an active field in‌ astronomy, with direct imaging as one of the‌ most challenging methods due to faint exoplanet signals‌ buried within stronger residual starlight. Successful detection requires‌ advanced image processing to separate the exoplanet signal‌ from this nuisance component. The paper 24 presents‌ a novel statistical model that captures nuisance fluctuations‌ using a multiscale approach, leveraging problem symmetries and‌ a joint spectral channel representation grounded in physical‌ principles. Our model integrates into an interpretable, end-to-end‌ learnable framework for simultaneous exoplanet detection and flux‌ estimation. The proposed algorithm is evaluated against the‌ state of the art using datasets from the‌ SPHERE instrument operating at the Very Large Telescope‌ (VLT). It significantly improves the precision-recall tradeoff, notably‌ on challenging datasets that are otherwise unusable by‌ astronomers. The proposed approach is computationally efficient, robust‌ to varying data quality, and well suited for‌ large-scale observational surveys. The model is illustrated in‌ Figure 29.

Figure 29: Illustration‌ of the ExoMild model

Unsupervised Imaging Inverse Problems‌ with Diffusion Distribution Matching

Participants: Giacomo Meanti,‌ Thomas Ryckeboer, Michael Arbel, Julien Mairal‌.

This work 35 addresses image restoration tasks through the lens of‌ inverse problems using unpaired‌ datasets. In contrast to‌‌ traditional approaches—which typically assume full knowledge of the‌ forward model or access‌ to paired degraded and‌‌ ground-truth images—the proposed method operates under minimal assumptions‌ and relies only on‌ small, unpaired datasets. This‌‌ makes it particularly well-suited for real-world scenarios, where‌ the forward model is‌ often unknown or mis-specified,‌‌ and collecting paired data is costly or infeasible.‌ The method leverages conditional‌ flow matching to model‌‌ the distribution of degraded observations, while simultaneously learning‌ the forward model via‌ a distribution-matching loss that‌‌ arises naturally from the framework. Empirically, it outperforms‌ both single-image blind and‌ unsupervised approaches on deblurring‌‌ and non-uniform point spread function (PSF) calibration tasks.‌ It also matches state-of-the-art‌ performance on blind super-resolution.‌‌ We also showcase the effectiveness of our method‌ with a proof of‌ concept for lens calibration:‌‌ a real-world application traditionally requiring timeconsuming experiments and‌ specialized equipment. In contrast,‌ our approach achieves this‌‌ with minimal data acquisition effort. This approach is‌ illustrated in Figure 30‌.

Figure 30‌‌: Illustration of our unsupervised learning approach for‌ inverse problems.

Optimal transport‌ unlocks end-to-end learning for‌‌ single-molecule localization

Participants: Romain seailles, Jean-Baptiste Masson‌, Jean Ponce,‌ Julien Mairal.

Single-molecule‌‌ localization microscopy (SMLM) allows reconstructing biology-relevant structures beyond‌ the diffraction limit by‌ detecting and localizing individual‌‌ fluorophores – fluorescent molecules stained onto the observed‌ specimen – over time‌ to reconstruct super-resolved images.‌‌ Currently, efficient SMLM requires non-overlapping emitting fluorophores, leading‌ to long acquisition times‌ that hinders live-cell imaging.‌‌ Recent deep-learning approaches can handle denser emissions, but‌ they rely on variants‌ of non-maximum suppression (NMS)‌‌ layers, which are unfortunately non-differentiable and may discard‌ true positives with their‌ local fusion strategy. In‌‌ this work 52, we reformulate the SMLM‌ training objective as a‌ set-matching problem, deriving an‌‌ optimal-transport loss that eliminates the need for NMS‌ during inference and enables‌ end-to-end training. Additionally, we‌‌ propose an iterative neural network that integrates knowledge‌ of the microscope's optical‌ system inside our model.‌‌ Experiments on synthetic benchmarks and real biological data‌ show that both our‌ new loss function and‌‌ architecture surpass the state of the art at‌ moderate and high emitter‌ densities. This approach is‌‌ illustrated in Figure 31.

Figure 31‌: Illustration of our‌ SMLM approach.

SpectralEarth: Training‌‌ Hyperspectral Foundation Models at Scale

Participants: Nassim Ait‌ Ali Braham, C.‌ Albrecht, Julien Mairal‌‌, Jocelyn Chanussot, Y Wang, Xiao‌ Xiang Zhu.

Foundation‌ models have triggered a‌‌ paradigm shift in computer vision and are increasingly‌ being adopted in remote‌ sensing, particularly for multispectral‌‌ imagery. Yet, their potential in hyperspectral imaging (HSI)‌ remains untapped due to‌ the absence of comprehensive‌‌ and globally representative hyperspectral datasets. To close this‌ gap, in 4 we‌ introduce SpectralEarth, a large-scale‌‌ multitemporal dataset designed to pretrain hyperspectral foundation models‌ leveraging data from the‌ environmental mapping and analysis‌‌ program (EnMAP). SpectralEarth comprises‌ 538 974 image patches covering 415 153 unique‌ locations from 11 636 globally distributed EnMAP scenes‌ spanning two years of archive. In addition, $17‌ . 5 %$ of these locations include multiple‌ timestamps, enabling multitemporal HSI analysis. Utilizing state-of-the-art selfsupervised‌ learning algorithms, we pretrain a series of foundation‌ models on SpectralEarth, integrating a spectral adapter into‌ classical vision backbones to accommodate the unique characteristics‌ of HSI. In tandem, we construct nine downstream‌ datasets for land-cover, crop-type mapping, and tree-species classification,‌ providing benchmarks for model evaluation. Experimental results support‌ the versatility of our models and their generalizability‌ across different tasks and sensors. We also highlight‌ computational efficiency during model fine-tuning. In Figure 32‌, we compare the size of various datasets‌ published for Earth observation.

Figure 32:‌ Comparison of dataset sizes for remote sensing

MicroFlow:‌ Domain-Specific Optical Flow for Ground Deformation Estimation in‌ Seismic Events

Participants: Juliette Bertrand, Sophie Giffard-Roisin‌, James Hollingsworth, Julien Mairal.

Dense‌ ground displacement measurements are crucial for geological studies‌ but are impractical to collect directly. Traditionally, displacement‌ fields are estimated using patch matching on optical‌ satellite images from different acquisition times. While deep‌ learning-based optical flow models are promising, their adoption‌ in ground deformation analysis is hindered by challenges‌ such as the absence of real ground truth,‌ the need for sub-pixel precision, and temporal variations‌ due to geological or anthropogenic changes. In particular,‌ we identify that deep learning models relying on‌ explicit correlation layers struggle at estimating small displacements‌ in real-world conditions. Instead, we propose a model‌ that employs iterative refinements with explicit warping layers‌ and a correlation-independent backbone, enabling sub-pixel precision. Additionally,‌ a non-convex variant of Total Variation regularization preserves‌ fault-line sharpness while maintaining smoothness elsewhere. Our model‌ significantly outperforms widely used geophysics methods on semi-synthetic‌ benchmarks and generalizes well to challenging real-world scenarios‌ captured by both medium- and high-resolution sensors. This‌ work is available in the paper 43 and‌ is illustrated in Figure 33.

Figure‌ 33: Illustration of the MicroFlow approach.

Leveraging‌ very high resolution optical remote sensing data and‌ deep learning to assess the potential for photovoltaïc‌ energy production in urban areas

Participants: Alessia Boccalatte‌, Jocelyn Chanussot.

Convolutional Neural Networks (CNNs)‌ have shown remarkable success in remote sensing tasks.‌ In urban contexts, recent research has utilized CNNs‌ to generate rooftop segmentation masks and determine rooftop‌ section orientation from aerial images. This cost-effective approach‌ is especially valuable for large-scale rooftop solar potential‌ estimations when detailed three-dimensional data is unavailable. This‌ research, published in 3, introduces SolarMTNet, a‌ novel multitask dense-prediction network designed for rooftop solar‌ potential prediction using only aerial images. Unlike previous‌ studies that focus on small manually labeled datasets‌ (approximately 2000 scenes) and only segment rooftop orientations‌ while typically assuming constant slopes, SolarMTNet simultaneously segments‌ both orientations and slopes, enhancing the accuracy of‌ solar potential estimations by 40%. SolarMTNet leverages a large, automatically labeled dataset‌ (up to 280000 scenes)‌ created from open-source Swis‌‌ geospatial and aerial data, significantly improving generalization. The‌ model is trained on‌ rooftop data from the‌‌ Zurich and Geneva cantons and cross-validated on the‌ Canton of Vaud, Switzerland.‌ The results show a‌‌ mean Intersection over Union (mIoU) of 0.67 for‌ orientation segmentation and 0.40‌ for slope segmentation. The‌‌ estimated irradiance exhibits an absolute mean percentage difference‌ of only 5% compared‌ to real solar cadaster‌‌ data derived from detailed model-based calculations, primarily du‌ to shading issues. Finally,‌ SolarMTNet has also been‌‌ tested in different geographica areas outside Switzerland (France‌ and Germany), demonstrating consistent‌ performance across diverse regions‌‌ and pixel resolutions. The quantification of urban solar‌ potential losses from rooftop‌ superstructures via aerial imagery‌‌ and Convolutional Neural Networks has also been considered‌ 2.

Hyperspectral Pansharpening‌

Participants: Jocelyn Chanussot.‌‌

Hyperspectral (HS) pansharpening consists of fusing a high-resolution‌ panchromatic (PAN) band and‌ a low-resolution HS image‌‌ to obtain a new image with high resolution‌ in both the spatial‌ and spectral domains. These‌‌ remote sensing products are valuable for a wide‌ range of applications, driving‌ ever-growing research efforts. Nonetheless,‌‌ results still do not meet application demands. In‌ part, this comes from‌ the technical complexity of‌‌ the task: compared to multispectral (MS) pansharpening, many‌ more bands are involved,‌ in a spectral range‌‌ only partially covered by the PAN component and‌ with overwhelming noise. However,‌ another major limiting factor‌‌ is the absence of a comprehensive framework for‌ the rapid development and‌ accurate evaluation of new‌‌ methods. This article attempts to address this issue.‌ We started by designing‌ a dataset large and‌‌ diverse enough to allow reliable training (for data-driven‌ methods) and testing of‌ new methods. Then, we‌‌ selected a set of state-of-the-art methods, following different‌ approaches characterized by promising‌ performance, and reimplemented them‌‌ in a single PyTorch framework. Finally, we carried‌ out a critical comparative‌ analysis of all methods,‌‌ using the most accredited quality indicators. The analysis‌ highlights the main limitations‌ of current solutions in‌‌ terms of spectral/spatial quality and computational efficiency, and‌ it suggests promising research‌ directions 7.

On‌‌ a related topic, another work presents a critical‌ survey of deep learning‌ in remote sensing image‌‌ fusion 16.

Probing Synergistic High-Order Interaction for‌ Multi-Modal Image Fusion

Participants:‌ Jocelyn Chanussot.

Multi-modal‌‌ image fusion aims to generate a fused image‌ by integrating and distinguishing‌ the cross-modality complementary information‌‌ from multiple source images. While the cross-attention mechanism‌ with global spatial interactions‌ appears promising, it only‌‌ captures second-order spatial interactions, neglecting higher-order interactions in‌ both spatial and channel‌ dimensions. This limitation hampers‌‌ the exploitation of synergies between multi-modalities. To bridge‌ this gap, we introduce‌ in 21 a Synergistic‌‌ High-order Interaction Paradigm (SHIP), designed to systematically investigate‌ spatial fine-grained and global‌ statistics collaborations between the‌‌ multi-modal images across two fundamental dimensions: 1) Spatial‌ dimension: we construct spatial‌ fine-grained interactions through element-wise‌‌ multiplication, mathematically equivalent to‌ global interactions, and then foster high-order formats by‌ iteratively aggregating and evolving complementary information, enhancing both‌ efficiency and flexibility. 2) Channel dimension: expanding on‌ channel interactions with first-order statistics (mean), we devise‌ high-order channel interactions to facilitate the discernment of‌ inter-dependencies between source images based on global statistics.‌ We further introduce an enhanced version of the‌ SHIP model, called SHIP++ that enhances the cross-modality‌ information interaction representation by the cross-order attention evolving‌ mechanism, cross-order information integration, and residual information memorizing‌ mechanism. Harnessing high-order interactions significantly enhances our model’s‌ ability to exploit multi-modal synergies, leading in superior‌ performance over state-of-the-art alternatives, as shown through comprehensive‌ experiments across various benchmarks in two significant multi-modal‌ image fusion tasks: pan-sharpening, and infrared and visible‌ image fusion.

Fully-Connected Transformer for Multi-Source Image Fusion‌

Participants: Jocelyn Chanussot.

Multi-source image fusion combines‌ the information coming from multiple images into one‌ data, thus improving imaging quality. This topic has‌ aroused great interest in the community. How to‌ integrate information from different sources is still a‌ big challenge, although the existing self-attention based transformer‌ methods can capture spatial and channel similarities. In‌ this paper 19, we first discuss the‌ mathematical concepts behind the proposed generalized self-attention mechanism,‌ where the existing self-attentions are considered basic forms.‌ The proposed mechanism employs multilinear algebra to drive‌ the development of a novel fully-connected self-attention (FCSA)‌ method to fully exploit local and non-local domain-specific‌ correlations among multi-source images. Moreover, we propose a‌ multi-source image representation embedding it into the FCSA‌ framework as a non-local prior within an optimization‌ problem. Some different fusion problems are unfolded into‌ the proposed fully-connected transformer fusion network (FC-Former). More‌ specifically, the concept of generalized self-attention can promote‌ the potential development of self-attention. Hence, the FC-Former‌ can be viewed as a network model unifying‌ different fusion tasks. Compared with state-of-the-art methods, the‌ proposed FC-Former method exhibits robust and superior performance,‌ showing its capability of faithfully preserving information.

GeoFlowNet-SAR:‌ Earthquake Displacement Estimation from Synthetic Aperture Radar Images‌

Participants: Jocelyn Chanussot.

Displacement estimation using remote‌ sensing images is an effective approach for assessing‌ surface displacement caused by natural disasters like earthquakes‌ and landslides. By employing pixel correlation algorithms, high-precision‌ displacement maps can be generated from images taken‌ before and after surface movement. However, traditional methods‌ often rely on spatial regularization or frequency masking‌ to reduce high-frequency noise, which can smooth spatial‌ details and result in biased displacement estimates, especially‌ near sharp discontinuities typical of earthquake surface ruptures.‌ Moreover, subpixel displacement estimation using synthetic aperture radar‌ (SAR) images remains a challenge compared to optical‌ images, due to the strong impact of speckle‌ noise. This article 18 presents GeoFlowNet-SAR, an innovative‌ subpixel displacement estimation method leveraging SAR images. SAR‌ offers advantages thanks to all-weather observation and high‌ penetration, making it suitable for conditions typically challenging‌ for optical systems in the visible light spectrum.‌ This study uses Sentinel-1 SAR single look complex (SLC) images with dual-polarization‌ (VV and VH modes)‌ and interferometric wide (IW)‌‌ swath mode to balance coverage and resolution. By‌ training on simulated displacement‌ datasets with realistic sharp‌‌ discontinuities, GeoFlowNet-SAR directly predicts surface displacement fields, providing‌ highly efficient, robust, and‌ precise results while overcoming‌‌ some limitations of traditional methods.The effectiveness of the‌ proposed methodological contribution is‌ first quantitatively demonstrated using‌‌ synthetic simulated earthquake datasets, including comparisons with state-of-the-art‌ correlation methods. The method‌ is further validated using‌‌ two real remote sensing images from the 2019‌ Ridgecrest earthquake and from‌ the 2023 Turkey–Syria earthquake.‌‌ The observed results from these real datasets confirm‌ the effectiveness of GeoFlowNet-SAR‌ in practical applications.

Kolmogorov–Arnold‌‌ Network for Hyperspectral Change Detection

Participants: Jocelyn Chanussot‌.

Hyperspectral change detection‌ (HCD) techniques to monitor‌‌ Earth’s surface processes advanced markedly in recent years.‌ Seasonal variations and associated‌ spectral signatures as well‌‌ as nonlinear noise patterns emanating from sensors and‌ atmospheric sources pose fundamental‌ challenges in HCD. Advanced‌‌ deep learning models, such as those that leverage‌ convolutional neural networks (3D-Siamese)‌ or transformers (MLP-Mixer), are‌‌ increasingly employed to address these challenges. However, they‌ often need substantial training‌ data and computational resources.‌‌ Here, we show that the Kolmogorov–Arnold network (KAN)‌ can enhance HCD capabilities‌ without the excessive training‌‌ demand of deep networks. The Kolmogorov–Arnold theorem provides‌ the theoretical foundation for‌ our approach, which is‌‌ particularly well-suited for hyperspectral data analysis by providing‌ a rigorous basis for‌ handling high-dimensional spectral signatures‌‌ through dimensional reduction and feature extraction. Our architectural‌ design employs this theoretical‌ framework by incorporating specialized‌‌ neural network layers that mirror the theorem’s compositional‌ structure, thereby facilitating efficient‌ processing of spectral bands.‌‌ By replacing the linear weighting scheme with learnable‌ nonlinear functions, the Kolmogorov–Arnold‌ network (KAN) provides a‌‌ unique capability to capture intricate patterns and irregularities‌ in high-dimensional data. Here,‌ we compare five KAN-based‌‌ architectures and deep learning models such as the‌ MLP-Mixer, 3D-Siamese, dual-branch Siamese‌ spatial–spectral Transformer attention network‌‌ (DBS3TAN), and the Swin Transformer for HCD and‌ show that the Chebyshev-KAN‌ model, with an average‌‌ overall accuracy of 97.35% over four real-world benchmark‌ cases, outperforms other models‌ while having a marked‌‌ lower complexity than the deep learning models. We‌ also show that the‌ choice of fit nonlinear‌‌ function and model structure is more important than‌ the number of parameters‌ in KAN-based models 15‌‌.

ECSPLAIN: Explainability Constrained-claSsifier for Pairing the detection‌ and the Localization of‌ moving Areas from SAR‌‌ INterferograms

Participants: Jocelyn Chanussot.

Detecting slope instabilities‌ on synthetic aperture radar‌ (SAR) interferograms using deep‌‌ learning approaches presents several challenges. This detection task‌ suffers from the lack‌ of transparency of deep‌‌ networks, the complexity of the input data (i.e.,‌ complex values, sensitivity to‌ distortions, and presence of‌‌ counterfactuals), and the complexity of the target phenomena‌ (i.e., the variable velocities‌ and the complex underground‌‌ processes). In this article 5, we propose‌ a new framework called‌ explainability-constrained classifier for pairing‌‌ the detection and the‌ localization of moving areas on interferograms (ECSPLAIN), to‌ generate decision, localization, and segmentation maps from a‌ single but explainable classifier network. It consists of‌ training a classifier to detect whether an instability‌ is located in the patch or not, and‌ to explain its decision with a class activation‌ map (CAM) that matches the actual location of‌ the instability. Therefore, by using a single classifier‌ network, the framework can pair the detection and‌ the localization of moving areas. Four CAMs are‌ investigated for the training of the ECSPLAIN framework.‌ Experiments on the ISSLIDE dataset show that our‌ proposal achieves better explainability than standard a posteriori‌ CAMs with more than 0.20 points of improvement‌ in terms of Dice and IoU scores. It‌ also allows competitive performance with segmentation-only networks, with‌ only 0.04 points of difference in terms of‌ Dice and intersection over union (IoU) scores. Thus,‌ the proposed method is competitive with the most‌ efficient methods while being lighter, faster, and delivering‌ a decision based on a human-like reasoning process.‌ Finally, the ECSPLAIN framework is applied to enrich‌ the ISSLIDE dataset, discovering more than 470 manually‌ validated slope instabilities over the Alps.

8.4 Other‌ pluri-disciplinary projects

Challenges in Non-Polymeric Crystal Structure Prediction:‌ Why a Geometric, Permutation-Invariant Loss is Needed

Participants:‌ Emmanuel Jehanno, Romain Menegaux, Julien Mairal‌, Sergei Grudinin.

Crystalline structure prediction is‌ an essential prerequisite for designing materials with targeted‌ properties. Yet, it is still an open challenge‌ in materials design and drug discovery. Despite recent‌ advances in computational materials science, accurately predicting three-dimensional‌ non-polymeric crystal structures remains elusive. In this work‌ 47, we focus on the molecular assembly‌ problem, where a set S of identical rigid‌ molecules is packed to form a crystalline structure.‌ Such a simplified formulation provides a useful approximation‌ to the actual problem. However, while recent state-of-the-art‌ methods have increasingly adopted sophisticated techniques, the underlying‌ learning objective remains ill-posed. We propose a better‌ formulation that introduces a loss function, illustrated in‌ Figure 34, capturing key geometric molecular properties‌ while ensuring permutation invariance over S. Remarkably, we‌ demonstrate that within this framework, a simple regression‌ model already outperforms prior approaches, including flow matching‌ techniques, on the COD-Cluster17 benchmark, a curated non-polymeric‌ subset of the Crystallography Open Database (COD).

Figure 34: Illustration of the geometric loss.‌

9 Bilateral contracts and grants with industry

9.1‌ Bilateral contracts with industry

Participants: Julien Mairal,‌ Karteek Alahari, Pierre Gaillard.

In 2025,‌ we had:

four CIFRE PhD students with Meta:‌ Timothée Darcet (co-advised by J. Mairal), who defended‌ in June 2025, Eyal Benaroche, who started in‌ December 2025, Tariq Berrada Ifriqi (co-advised by K.‌ Alahari), and Francois Porcher (co-advised by K. Alahari),‌ who started in April 2025.
one CIFRE PhD‌ student with Naver Labs Europe: Juliette Marrie (co-advised‌ by J. Mairal and M. Arbel) who defended‌ in June 2025.
one CIFRE PhD student with EDF R&D: Bianca Marin‌ Moreno who defended in‌ October 2025 (co-advised by‌‌ P. Gaillard).
one CIFRE PhD student with Criteo:‌ Julien Zhou (co-advised by‌ P. Gaillard).
one CIFRE‌‌ PhD student with Ekimetrics: Yedidia Agnimo (co-advised by‌ K. Alahari), who started‌ in July 2025.
one‌‌ CIFRE PhD student with Enhance Lab: Vincent Herfeld‌ (co-advised by J. Mairal).‌
a collaboration led by‌‌ K. Alahari with Toyota Motor Europe.

10 Partnerships‌ and cooperations

10.1 International‌ initiatives

10.1.1 Participation in‌‌ other International Programs

Project EIFFEL

Participants: Karteek Alahari‌, Pia Bideau.‌

Title:
Efficient Distillation of‌‌ Foundation Models for Computer Vision
Duration:
2025 -‌ 2028
Summary:
This collaborative‌ project with South Korea‌‌ is supported by the Institute of Information Communications‌ Technology Planning & Evaluation‌ (IITP) grant funded by‌‌ the Korean Government (MSIT) (No. RS-2024-00457882, National AI‌ Research Lab Project). Its‌ focus is on efficient‌‌ foundation models. Foundation models, which have been trained‌ on massive amounts of‌ curated data by using‌‌ huge resources, constitute one of the most recent‌ advancements in machine learning‌ for computer vision and‌‌ other domains. These are being typically produced by‌ large corporations or as‌ part of industrial/academic collaborations,‌‌ which raises fundamental challenges for academia. One of‌ the scientific objectives is‌ to widen the reach‌‌ of these models by proposing computationally efficient counterparts‌ as well as variants‌ that leverage multiple modalities,‌‌ e.g., text, image, video, audio, collectively. In particular,‌ we are interested in‌ developing new models under‌‌ challenging but realistic scenarios, such as limited data‌ or data with temporally‌ evolving distribution, low computational‌‌ resources, which occur in many industrial and scientific‌ applications.

10.2 European initiatives‌

10.2.1 Horizon Europe

APHELEIA‌‌

APHELEIA project on cordis.europa.eu

Title:
Reconciling Classical and‌ Modern (Deep) Machine Learning‌ for Real-World Applications
Duration:‌‌
From September 1, 2023 to August 31, 2028‌
Partners:
- INSTITUT NATIONAL DE‌ RECHERCHE EN INFORMATIQUE ET‌‌ AUTOMATIQUE (INRIA), France
Inria contact:
Mairal Julien
Summary:‌

Despite the undeniable success‌ of machine learning in‌‌ addressing a wide variety of technological and scientific‌ challenges, the current trend‌ of training predictive models‌‌ with an evergrowing number of parameters from an‌ evergrowing amount of data‌ is not sustainable. These‌‌ huge models, often engineered by large corporations benefiting‌ from huge computational resources,‌ typically require learning a‌‌ billion or more of parameters. They have proven‌ to be very effective‌ in solving prediction tasks‌‌ in computer vision, natural language processing, and computational‌ biology, for example, but‌ they mostly remain black‌‌ boxes that are hard to interpret, computationally demanding,‌ and not robust to‌ small data perturbations.

With‌‌ a strong emphasis on visual modeling, the grand‌ challenge of APHELEIA is‌ to develop a new‌‌ generation of machine learning models that are more‌ robust, interpretable, and efficient,‌ and do not require‌‌ massive amounts of data to produce accurate predictions.‌ To achieve this objective,‌ we will foster new‌‌ interactions between classical signal processing, statistics, optimization, and‌ modern deep learning. Our‌ goal is to reduce‌‌ the need for massive‌ data by enabling scientists and engineers to design‌ trainable machine learning models that directly encode a‌ priori knowledge of the task semantics and data‌ formation process, while automatically prefering simple and stable‌ solutions over complex ones. These models will be‌ built on solid theoretical foundations with convergence and‌ robustness guarantees, which are important to make real-life‌ trustworthy predictions in the wild. We will implement‌ these ideas in an open-source software toolbox readily‌ applicable to visual recognition and inverse imaging problems,‌ which will also handle other modalities. This will‌ stimulate interdisciplinary collaborations, with the potential to be‌ a game changer in the way scientists and‌ engineers design machine learning problems.

10.2.2 Other european‌ programs/initiatives

J. Chanussot is involved in a project‌ funded by the European Space Agency (ESA): ROSE-L‌ in Harmony: EO Data Integration for Global Land‌ Cover and Vegetation Mapping led by the Canadian‌ company C-Core (2025-2028)

10.3 National initiatives

10.3.1 ANR‌ Project BONSAI

Participants: Michael Arbel.

Project BONSAI‌ is a multi-disciplinary project aiming at integrating knowledge‌ produced by experts, in the form of simulators,‌ into current machine learning frameworks through bilevel optimization‌ for accurate and efficient inference. We address three‌ challenges. The first one is to develop a‌ deep learning-based approach to simulation-based inference that can‌ adapt to data using bilevel optimization. A second‌ challenge is to depoly the methods to real-world‌ problems which have their specificities. A third challenge‌ is to develop bilevel optimization methods that can‌ handle the non-convexity and over-parameterization arising from using‌ deep learning. The principal investigator is Michael Arbel,‌ and the project involves participants from Toulouse School‌ of Economics, TIMC team at UGA and other‌ INRIA teams (Statify). This project started in April‌ 2024.

10.3.2 MIAI chair: Learning Visual Representations from‌ Interaction for Robot Manipulation Tasks

Participants: Pia Bideau‌, Karteek Alahari.

How to grasp an‌ object has been studied in computer vision and‌ robotics and several approaches to this problem exist‌ - either given a 3D shape of an‌ object contact points are determined that lead to‌ a stable hand object configuration or an other‌ line of work aims at reconstructing stable hand‌ object configurations modelling the reconstruction process of hand‌ pose and object pose jointly. In both cases‌ many solutions are possible, although a majority might‌ not be the natural approach that humans would‌ chose - mainly because the intention behind the‌ grasp is omitted. This project aims at learning‌ visual representations from interaction that encode activity information.‌ Encoding such contextual information appears not only to‌ be relevant to synthesise feasible grasps furthermore this‌ is likely to enhance future generalisation skills facilitating‌ adaptation across the same activity but different objects‌ - grasping a cup to pour something into‌ something shares similar motion pattern as grasping a‌ bottle to pour something into something. Inspired by‌ the effectiveness of human grasping, we aim at‌ finding similarly adaptable representations that are capable of guiding complex manipulation skills.‌ To this end we‌ will fuse ideas relying‌‌ on classical probabilistic modeling of distributions over possible‌ motion trajectories and latent‌ action representations from a‌‌ conditional variational autoencoder (CVAE). Both of these directions‌ come with complementary strengths‌ and thus provide promising‌‌ capabilities of modulating the degree of action abstractions‌ at test time to‌ enable both coarse and‌‌ fine-grained control for real world robot manipulation tasks.‌ The chair is taking‌ place in collaboration with‌‌ Karteek Alahari, Xavier Alameda-Pineda, and Pierre-Brice Wieber. We‌ have recruited one PhD‌ student and have an‌‌ intern starting in February 2025.

10.3.3 MIAI Cluster‌ chair: MOnitoring natural Hazards‌ using AI and Remote‌‌ sensing (MOHAIR)

Participants: Jocelyn Chanussot.

J. Chanussot‌ is the co-chair, with‌ Sophie Giffard-Roisin (IRD junior‌‌ researcher, Laboratoire IsTerre) and Yajing Yang (Associate Professor,‌ LISTIC Univ. Savoie Mont-Blanc).‌ This project started in‌‌ September 2025. It gathers members from 7 different‌ teams of 6 laboratories‌ in Grenoble, Annecy and‌‌ Clermont-Ferrand.

Satellite based remote sensing, using a variety‌ of sensing modalities (optical,‌ radar, hyperspectral, lidar) offers‌‌ a unique source of information to monitor the‌ environment, with fine spatial‌ resolution, wide coverage and‌‌ frequent revisit. This enables addressing the challenge of‌ natural hazard monitoring and‌ forecasting, which has a‌‌ significant societal impact. To fully harness the potential‌ of remote sensing data,‌ advanced algorithms in machine‌‌ learning, deep learning, or more broadly artificial intelligence,‌ must be developed. Gathering‌ an interdisciplinary team of‌‌ experts, from data science, environmental and Earth sciences,‌ as well as social‌ sciences, this chair will‌‌ focus on three important topics: forest monitoring, Earth‌ deformation estimation and volcanic‌ inverse modeling. From a‌‌ methodological point of view, research will be conducted‌ on the analysis of‌ multimodal time series, multimodal‌‌ deep and graph learning and foundation models.

10.3.4‌ MIAI chair: Fundamentals of‌ Reinforcement Learning

Participants: Pierre‌‌ Gaillard.

P. Gaillard is the co-chair, with‌ Bruno Gaujal (LIG, UGA)‌ of this MIAI chair‌‌ that focuses on developping advanced methodologies for Reinforcement‌ Learning (RL). The project‌ aims to develop new‌‌ RL algorithms with strong theoretical foundations and practical‌ effectiveness by exploiting the‌ problem's inherent structure. The‌‌ focus areas include online control of queueing networks,‌ weakly coupled stochastic dynamic‌ systems (sometimes associated with‌‌ bandits) and parametric learning for adaptive policies. These‌ three approaches to structured‌ learning will be used‌‌ for innovative applications in energy, cloud computing, and‌ resource allocation.

10.3.5 Deep‌ Red

Participants: Jocelyn Chanussot‌‌.

J. Chanussot is the chair of the‌ Deep Red project from‌ the Foundation Grenoble INP‌‌ under the patronage of Lynred company (2022-2026). The‌ project aims at popularizing‌ the technology of infrared‌‌ imaging for new usages.

10.3.6 PEPR project Numpex‌

Participants: Hadrien Hendrikx.‌

The 'Numpex' programme's objectives‌‌ are to design and develop the software building‌ blocks required to equip‌ future 'exascale machines' and‌‌ to prepare the major application domains that aim‌ to fully exploit the‌ capabilities of such machines‌‌ for scientific research and‌ industry alike. This project is part of France's‌ response to the next EuroHPC call for expressions‌ of interest (Projet Exascale France) in hosting one‌ of the two major exascale machines planned in‌ Europe for 2024. In this way 'Numpex' will‌ contribute to the creation of a set of‌ tools, software, applications and training which will enable‌ France to remain one of the leaders in‌ the field of international competition through its national‌ Exascale ecosystem that is in step with European‌ strategy.

10.3.7 PEPR project Origins

Participants: Julien Mairal‌.

Thoth is involved in the axis “Direct‌ imaging and exoplanet characterization” of the PEPR Origins.‌ This is an on-going collaboration with astronomers from‌ Observatoire de Paris and Lyon and with the‌ Willow team.

11 Dissemination

Participants: Julien Mairal,‌ Karteek Alahari, Jocelyn Chanussot, Hadrien Hendrikx‌, Michael Arbel, Pierre Gaillard, Pia‌ Bideau, Scott Pesme.

11.1 Promoting scientific‌ activities

11.1.1 Scientific events: organisation

Member of the‌ organizing committees

M. Arbel, P. Gaillard, H. Hendrikx,‌ J. Mairal, G. Meanti, S. Pesme and N.‌ Gillot co-organized the PAISS summer school in Grenoble,‌ which attracted about 200 students.
P. Gaillard co-organized‌ with EDF R&D a workshop on Meta-models that‌ attracted arround 50 participants.
J. Chanussot was the‌ general co-chair (with Prof Xiuping Jia, University of‌ New South Wales, and Prof Jeffrey Walker, Monash‌ University) of the IEEE Geoscience and Remote Sensing‌ Symposium (IGARSS) that attracted 3200 participants in Brisbane,‌ Australia, august 3-8 2025.
J. Chanussot was the‌ co-chair of GeoCV (First Workshop on Computer Vision‌ for Geospatial Image Analysis) at the IEEE/CVF Winter‌ Conference on Applications of Computer Vision (WACV workshop)‌ Tucson, AZ, March, 2025.
J. Chanussot was the‌ co-chair of MORSE (Workshop on Foundation and Large‌ Vision Models in Remote Sensing) at the IEEE/CVF‌ Conference on Computer Vision and Pattern Recognition (CVPR‌ Workshop) Nashville, TN, June 2025.

11.1.2 Scientific events:‌ selection

Chair of conference program committees

K. Alahari‌ will be a program co-chair for ECCV 2028‌ (Bucharest, Romania).
J. Chanussot will be the Technical‌ Program Committee chair for the IEEE Geoscience and‌ Remote Sensing Symposium (IGARSS) to be held in‌ Reykjavik, Iceland in 2027

Member of the conference‌ program committees

K. Alahari was an area chair‌ for CVPR 2025, ICCV 2025, NeurIPS 2025, and‌ will be area chair for upcoming ICML 2026.‌
P. Gaillard was an area chair for ICML‌ 2025, and will be area chair for upcoming‌ ICML 2026.
M. Arbel was an area chair‌ for NeurIPS 2025, and will be area chair‌ for the upcoming ICML 2026.
J. Mairal will‌ be an area chair for the upcoming ICML‌ 2026.

Reviewer

J. Mairal was reviewer for ICCV‌ 2025, ICLR 2026 and NeurIPS 2025 (where he‌ received a top reviewer award).
K. Alahari was‌ reviewer for CVPR 2026, BMVC 2025.
P. Gaillard‌ was reviewer for NeurIPS 2025.
H. Hendrikx was‌ reviewer for ICML 2025
M. Arbel was reviewer for AISTATS 2025 (where‌ he received a top‌ reviewer award), ICCV 2025,‌‌ ICLR 2026.

11.1.3 Journal

Member of the editorial‌ boards

J. Mairal. Editor‌ for Journal of Machine‌‌ Learning Reseach (JMLR).
K. Alahari. Associate editor of‌ International Journal of Computer‌ Vision (IJCV).
J. Chanussot‌‌ is an Associate Editor for the IEEE Transactions‌ on Geoscience and Remote‌ Sensing

Reviewer - reviewing‌‌ activities

P. Gaillard was reviewer for JMLR.
H.‌ Hendrikx was reviewer for‌ JMLR and SIOPT
M.‌‌ Arbel was reviewer for JMLR.

11.1.4 Invited talks‌

J. Mairal was an‌ invited speaker at the‌‌ BASP workshop, Villars sur Ollon. Feb. 2025.
J.‌ Mairal was an invited‌ speaker at the OSKI‌‌ workshop, Aussois. March 2025.
J. Mairal was an‌ invited speaker at the‌ GDR-IASIS workshop, Lyon. March‌‌ 2025.
J. Mairal was an invited speaker at‌ Academie des Sciences (inter-section‌ meeting). June 2025.
J.‌‌ Mairal gave an invited seminar at the ELLIS‌ Stuttgart unit. June 2025.‌
J. Mairal was an‌‌ invited speaker at the Non-convex optimization: landscapes, dynamics‌ and learning workshop, EPFL,‌ Aug. 2025.
J. Mairal‌‌ was an invited speaker at the GDR-IASIS workshop,‌ Paris. Sept. 2025.
K.‌ Alahari was a keynote‌‌ speaker at the Inria-Waterloo workshop at Univ. Waterloo,‌ Canada. May 2025
K.‌ Alahari was an invited‌‌ speaker at Journées de statistique de la SFdS,‌ Marseille. June 2025.
K.‌ Alahari was an invited‌‌ speaker at the Global AI Frontiers Symposium, Seoul,‌ South Korea. Oct. 2025.‌
K. Alahari was an‌‌ invited speaker at the Open Science Days@UGA, Grenoble.‌ Nov. 2025.
K. Alahari‌ was a keynote speaker‌‌ at the Sfen workshop on apport de l'IA‌ a la science des‌ materiaux pour l'industrie nucleaire,‌‌ Paris. Dec. 2025.
S. Pesme was an invited‌ speaker at Journée scientifique‌ du groupe SMAI-SIGMA, Dec.‌‌ 2025, Paris.
S. Pesme gave an invited seminar‌ at Centrale Supelec. Nov.‌ 2025.
S. Pesme gave‌‌ a talk at the Oberwolfach Mini-Workshop on Probabilistic‌ Perspectives in Neural Network-Based‌ Machine Learning, Oct. 2025,‌‌ Oberwolfach, Germany
S. Pesme gave an invited talk‌ at the Workshop sur‌ les modèles génératifs :‌‌ diffusion, flow matching et leurs applications, Oct. 2025,‌ Lyon.
S. Pesme gave‌ an invited seminar at‌‌ Eindhoven University of Technology, Netherlands, Oct 2025
S.‌ Pesme gave an invited‌ talk at the Workshop‌‌ on the Statistical Theory of Neural Networks, May‌ 2025, University of Twente,‌ Netherlands.
J. Marrie gave‌‌ an invited seminar at Ecole des Ponts, Marne‌ la Vallée, March 2025.‌
T. Darcet gave an‌‌ invited talk at the BLISS summer school, TU‌ Berlin. May 2025.
T.‌ Bodrito gave a talk‌‌ at the COBREX seminar. Feb. 2025.
T. Bodrito‌ gave a talk at‌ the Journées de la‌‌ Société Française d'Astronomie (SF2A). July 2025.
P. Gaillard‌ was an invited speaker‌ at a scientific seminar‌‌ organized by the LabEx EnergyAlps. May 2025.
H.‌ Hendrikx gave an invited‌ seminar at Inria Montpellier,‌‌ May 2025.
H. Hendrikx gave a talk at‌ project Redeem (PEPR IA)‌ annual meeting, October 2025.‌‌
M. Arbel was an‌ invited speaker at the RKHS Seminars, METU, February‌ 2025.

11.1.5 Scientific expertise

J. Mairal was a‌ member of the Hemholtz panel on scientific imaging.‌
J. Mairal was a member of the Prairie‌ panel for junior chairs.
J. Mairal was a‌ panel member for the research council of Norway.‌
K. Alahari was a member of the CRCN/ISFP‌ 2025 recruitment committee at Grenoble.
P. Gaillard was‌ a reviewer for the JCJC call from ANR.‌
H. Hendrikx was a panel member for the‌ TSIA call from ANR, subcommittee "IA & Environnements,‌ écosystèmes, ressources biologiques"

11.1.6 Research administration

J. Mairal‌ is a member of the scientific committee (COS)‌ of Inria Grenoble's research center, and also a‌ member of the scientific committee of MIAI.
K.‌ Alahari is the deputy scientific director in charge‌ of AI at Inria.
K. Alahari is one‌ of the scientific directors of the PEPR IA‌ national research programme.
K. Alahari is responsible for‌ the Mathematics and Computer Science specialist field at‌ the MSTII doctoral school.
K. Alahari is a‌ member of commission prospection postes at LJK.
H.‌ Hendrikx is Chargé de mission Science Environnement Société‌ (SEnS) for Inria Grenoble.
H. Hendrikx is the‌ Inria Transformation Ecologique (TREC) representative at UGA.
H.‌ Hendrikx is a leading member of the Inria‌ Grenoble socio-environmental roadmap.
J. Chanussot is a member‌ of the Commission Recherche, University Grenoble Alpes.‌

11.2 Teaching - Supervision - Juries - Educational‌ and pedagogical outreach

11.2.1 Supervision

Théo Bodrito defended‌ his Phd in June 2025. He was co-advised‌ by Olivier Flasseur, Jean Ponce and Julien Mairal.‌ See the manuscript 41.
Timothée Darcet defended‌ his Phd in June 2025. He was co-advised‌ by Piotr Bojanowski, Maxim Oquab, and Julien Mairal.‌ See the manuscript 42
Juliette Marrie defended her‌ PhD in June 2025. She was co-advised by‌ Michael Arbel, Diane Larlus and Julien Mairal.
Bianca‌ Marin Moreno defended her PhD in October 2025.‌ She was co-advised by P. Gaillard.
Zhiqi Kang‌ defended his PhD in November 2025. He was‌ advised by Karteek Alahari.
Anandaramane Candassamy,defended his PhD‌ in september 2025. He was co-advised by J.‌ Chanussot.
Colin Prieur defended his PhD in november‌ 2025. He was co-advised by J. Chanussot

11.2.2‌ Juries

J. Mairal was reviewer for the PhD‌ thesis of Samuel Gruffaz, Univ. Paris Saclay. 2025.‌
J. Mairal was reviewer for the HdR of‌ Thomas Moreau, Univ. Paris Saclay. 2025.
J. Mairal‌ was a member of the PhD committee of‌ Gaspard Dussert, Univ. Lyon 1. 2025.
J. Mairal‌ was a member of the HdR commitee of‌ Maxime Sangnier, PSL Sorbonne Université. 2025.
K. Alahari‌ was a member of the PhD jury of‌ Mohammmed-Yasser Benigmim, IP Paris. 2025.
K. Alahari was‌ a reviewer for the PhD thesis of Corentin‌ Sautier, Ecole des Ponts ParisTech. 2025.
K. Alahari‌ was a reviewer for the PhD thesis of‌ Tanay Agrawal, Université Côte d'Azur. 2025.
K. Alahari‌ was the president of the PhD jury of Guillaume Déau, Univ. Poitiers.‌ 2025.
P. Gaillard was‌ reviewer for the PhD‌‌ thesis of Lukas Zierahn, Politecnico di Torino, Italy.‌ 2025.
P. Gaillard was‌ a member of the‌‌ PhD committee of Antoine Picard, Univ. Lille. 2025.‌
J. Chanussot was a‌ reviewer for the PhD‌‌ of Liang Zhao, University of South Australia (Australia)‌ 2025.
J. Chanussot was‌ a reviewer for the‌‌ PhD of Kimmo Riihiaho, University of Jyväskylä (Finland)‌ 2025.
J. Chanussot was‌ a reviewer for the‌‌ PhD of Dan Pineau, Université Paris-Saclay, 2025.
J.‌ Chanussot was a reviewer‌ for the PhD of‌‌ Yi Wang, TU Munich (Germany) 2025.
J. Chanussot‌ was a reviewer for‌ the PhD of Sai‌‌ Reddy B., GITAM - Deemed to be University‌ (India) 2025.
J. Chanussot‌ was a reviewer for‌‌ the PhD of Triem Pham, Université Paris-Saclay, 2025.‌
J. Chanussot was the‌ president of the PhD‌‌ jury of Astrid Tazzioli, Université PSL Paris, 2025.‌
J. Chanussot was a‌ reviewer for the PhD‌‌ of Ritu Yadav, KTH (Sweden), 2025
J. Chanussot‌ was a reviewer and‌ the president of the‌‌ committee for the PhD of Vadim Becquet, Université‌ Paris PSL - Mines‌ de Paris, 2025
J.‌‌ Chanussot was a reviewer for the HdR of‌ Minh-Tan PHAM, Université de‌ Bretagne Sud, 2025
M.‌‌ Arbel was a member of the PhD committee‌ of Alessandro Pasqui, PSL‌ Université de Paris, 2025.‌‌

11.2.3 Educational and pedagogical outreach

Master: M. Arbel‌ and J. Mairal, Kernel‌ methods for statistical learning,‌‌ 36h eqTD, M2, ENS Paris-Saclay/PSL, France.
Master: M.‌ Arbel, J. Mairal and‌ S. Pesme, From Basic‌‌ Machine Learning models to Advanced Kernel Learning, 54h‌ eqTD, M2, UGA, Grenoble.‌
Master: P. Gaillard, Sequential‌‌ Learning, 12h eqTD, M2, MVA, ENS Paris-Saclay, France.‌
Master: H. Hendrikx, Numerical‌ Optimization, 40h eqTD, M1,‌‌ UGA, Grenoble
Master: J. Chanussot, Hyperspectral imaging, 25h‌ eqTD, M2, Grenoble INP‌

11.3 Popularization

11.3.1 Productions‌‌ (articles, videos, podcasts, serious games, ...)

K. Alahari‌ participated in a podcast‌ interview for Interstices 54‌‌.

11.3.2 Participation in Live events

K. Alahari‌ co-animated the “Café IA"‌ event at Inria Grenoble.‌‌
S. Pesme participated to the “Ateliers scolaires les‌ 9 et 10 octobre‌ au sein du parcours‌‌ "Éclats de sciences" sur le campus de l'Université‌ Grenoble Alpes à Saint-Martin‌ d'Hères”.
S. Pesme participated‌‌ two “Café IA”: at Inria (September 30th 2025),‌ and another with Digital‌ League (December 2nd 2025)Talk‌‌ at the Math Olympiad Ceremony (June 4, 2025,‌ Université Grenoble-Alpes)
S. Pesme‌ participated to Classroom sessions‌‌ for the "Semaine des Maths" (March 10–19, 2025,‌ schools in the Grenoble‌ academy)

11.3.3 Others science‌‌ outreach relevant activities

S. Pesme was interviewed for‌ the fête de la‌ science 2025.
J.‌‌ Chanussot is a member of the scientific advisory‌ board of the établissement‌ public de coopération culturelle‌‌ « Territoire de Sciences » with its two‌ components: Cosmocité Museum and‌ Grenoble Casemate.
J. Chanussot‌‌ organized a half-day event about thermal imaging for‌ the 10th grade students‌ doing their internship at‌‌ INRIA.

12 Scientific production‌

12.1 Publications of the year

International journals

1‌ articleP.Pia Bideau, D.Duc Pham‌, F.Félicie Dhellemmes, M.Matthew Hansen‌ and J.Jens Krause. Watching Swarm Dynamics‌ from Above: A Framework for Advanced Object Tracking‌ in Drone Videos.International Journal of Computer‌ Vision2026. In press. HAL DOI
2‌ articleA.Alessia Boccalatte and J.Jocelyn Chanussot‌. Quantifying urban solar potential losses from rooftop‌ superstructures via aerial imagery and Convolutional Neural Networks‌.Renewable Energy249August 2025, 123088‌HAL DOI back to text
3 articleA.‌Alessia Boccalatte, A.Ankit Jha and J.‌Jocelyn Chanussot. Leveraging large-scale aerial data for‌ accurate urban rooftop solar potential estimation via multitask‌ learning.Solar Energy290April 2025,‌ 113336HAL DOI back to text
4 article‌N. A.Nassim Ait Ali Braham, C.‌ M.Conrad M Albrecht, J.Julien Mairal‌, J.Jocelyn Chanussot, W.Wang Yi‌ and X. X.Xiao Xiang Zhu. SpectralEarth:‌ Training Hyperspectral Foundation Models at Scale.IEEE‌ Journal of Selected Topics in Applied Earth Observations‌ and Remote Sensing18June 2025, 16780-16797‌HAL DOI back to text
5 articleA.‌Antoine Bralet, A.Abdourrahmane Atto, J.‌Jocelyn Chanussot and E.Emmanuel Trouvé. ECSPLAIN:‌ Explainability Constrained-claSsifier for Pairing the detection and the‌ Localization of moving Areas from SAR INterferograms.‌IEEE Transactions on Geoscience and Remote Sensing63‌2025, 5217618:1-18HALDOI back to text‌
6 articleA.Antoine Bralet, A. M.‌Abdourrahmane M Atto, J.Jocelyn Chanussot and‌ E.Emmanuel Trouvé. Translation-classification loss for SAR‌ image understanding with deep learning.Computer Vision‌ and Image Understanding2572025, 104374HAL‌DOI
7 articleM.Matteo Ciotola, G.‌Giuseppe Guarino, G.Gemine Vivone, G.‌Giovanni Poggi, J.Jocelyn Chanussot, A.‌Antonio Plaza and G.Giuseppe Scarpa. Hyperspectral‌ Pansharpening: Critical review, tools, and future perspectives.‌IEEE geoscience and remote sensing magazine131‌March 2025, 311-338HAL DOI back to‌ text
8 articleT.Timothée Darcet, F.‌Frederico Baldassarre, M.Maxime Oquab, J.‌Julien Mairal and P.Piotr Bojanowski. Cluster‌ and Predict Latent Patches for Improved Masked Image‌ Modeling.Transactions on Machine Learning Research Journal‌June 2025, 1-26HAL back to text‌
9 articleA.Avijit Dasgupta, C.C.V.‌ Jawahar and K.Karteek Alahari. Source-free video‌ domain adaptation by learning from noisy labels.‌Pattern Recognition161May 2025, 111328HAL‌DOI back to text
10 articleC.Camila‌ Fernandez, P.Pierre Gaillard, J.Joseph‌ de Vilmarest and O.Olivier Wintenberger. Online‌ Convex Optimization for Survival Analysis: An Adaptive and‌ Stochastic Approach.Statistical Papers664May‌ 2025, 86HALDOI back to text‌
11 articleE. K.Elham Kordi Ghasrodashti, P.Peyman Adibi,‌ H.Hossein Karshenas,‌ H. B.Hamidreza Baradaran‌‌ Kashani and J.Jocelyn Chanussot. Multimodal Image‌ Classification Based on Convolutional‌ Network and Attention-Based Hidden‌‌ Markov Random Field.IEEE Transactions on Geoscience‌ and Remote Sensing63‌2025, 1-14HAL‌‌DOI
12 articleR.Rémi Jézéquel, D.‌ M.Dmitrii M. Ostrovskii‌ and P.Pierre Gaillard‌‌. Efficient and Near-Optimal Online Portfolio Selection.‌Mathematics of Operations Research‌May 2025HAL DOI‌‌back to text
13 articleH.Heeseung Kwon‌, F. M.Francisco‌ M. Castro, M.‌‌ J.Manuel J. Marin-Jimenez, N.Nicolas Guil‌ and K.Karteek Alahari‌. Lightweight Structure-Aware Attention‌‌ for Visual Understanding.International Journal of Computer‌ Vision133June 2025‌, 6129–6144HAL DOI‌‌back to text
14 articleB.Bianca Moreno‌, M.Margaux Brégère‌, P.Pierre Gaillard‌‌ and N.Nadia Oudjane. (Online) Convex Optimization‌ for Demand-Side Management: Application‌ to Thermostatically Controlled Loads‌‌.Journal of Optimization Theory and Applications205‌3April 2025,‌ 43HAL DOI back‌‌ to text
15 articleS.Seyd Teymoor Seydi‌, M.Mojtaba Sadegh‌ and J.Jocelyn Chanussot‌‌. Kolmogorov–Arnold Network for Hyperspectral Change Detection.‌IEEE Transactions on Geoscience‌ and Remote Sensing63‌‌2025, 1-15HALDOI back to text‌
16 articleG.Gemine‌ Vivone, L.-J.Liang-Jian‌‌ Deng, S.Shangqi Deng, D.Danfeng‌ Hong, M.Menghui‌ Jiang, C.Chenyu‌‌ Li, W.Wei Li, H.Huanfeng‌ Shen, X.Xiao‌ Wu, J.-L.Jin-Liang‌‌ Xiao, J.Jing Yao, M.Mengmeng‌ Zhang, J.Jocelyn‌ Chanussot, S.Salvador‌‌ García and A.Antonio Plaza. Deep Learning‌ in Remote Sensing Image‌ Fusion: Methods, protocols, data,‌‌ and future perspectives.IEEE geoscience and remote‌ sensing magazine131‌March 2025, 269-310‌‌HAL DOI back to text
17 articleP.‌Peng Wang, Z.‌Zhongchen He, B.‌‌Bo Huang, M. D.Mauro Dalla Mura‌, H.Henry Leung‌ and J.Jocelyn Chanussot‌‌. VOGTNet: Variational Optimization-Guided Two-Stage Network for Multispectral‌ and Panchromatic Image Fusion‌.IEEE Transactions on‌‌ Neural Networks and Learning Systems365May‌ 2025, 9268-9282HAL‌DOI
18 articleJ.‌‌Junjie Wang, J.James Hollingsworth, E.‌Erwan Pathier, T.‌Tristan Montagnon, W.‌‌Wei Li, M.Mengmeng Zhang, R.‌Ran Tao, J.‌Jocelyn Chanussot and S.‌‌Sophie Giffard-Roisin. GeoFlowNet-SAR: Earthquake Displacement Estimation from‌ Synthetic Aperture Radar Images‌.IEEE Transactions on‌‌ Geoscience and Remote SensingNovember 2025, 1-12‌HAL DOI back to‌ text
19 articleX.‌‌Xiao Wu, Z.-H.Zi-Han Cao, T.-Z.‌Ting-Zhu Huang, L.-J.‌Liang-Jian Deng, J.‌‌Jocelyn Chanussot and G.Gemine Vivone. Fully-Connected‌ Transformer for Multi-Source Image‌ Fusion.IEEE Transactions‌‌ on Pattern Analysis and Machine Intelligence473‌March 2025, 2071-2088‌HAL DOI back to‌‌ text
20 articleH.‌Houssam Zenati, A.Alberto Bietti, M.‌Matthieu Martin, E.Eustache Diemert, P.‌Pierre Gaillard and J.Julien Mairal. Counterfactual‌ Learning of Stochastic Policies with Continuous Actions.‌Transactions on Machine Learning Research JournalMarch 2025‌HAL back to text
21 articleM.Man‌ Zhou, N.Naishan Zheng, X.Xuanhua‌ He, D.Danfeng Hong and J.Jocelyn‌ Chanussot. Probing Synergistic High-Order Interaction for Multi-Modal‌ Image Fusion.IEEE Transactions on Pattern Analysis‌ and Machine Intelligence472February 2025,‌ 840-857HAL DOI back to text

International peer-reviewed‌ conferences

22 inproceedingsM.Michael Arbel, D.‌David Salinas and F.Frank Hutter. EquiTabPFN:‌ A Target-Permutation EquivariantPrior Fitted Network.Proceedings in‌ Advances in Neural Information Processing Systems 38NeurIPS‌ 202538San diego (Californie), United States2025‌HAL back to text
23 inproceedingsT.Tariq‌ Berrada, P.Pietro Astolfi, M.Melissa‌ Hall, M.Marton Havasi, Y.Yohann‌ Benchetrit, A.Adriana Romero-Soriano, K.Karteek‌ Alahari, M.Michal Drozdzal and J.Jakob‌ Verbeek. Boosting Latent Diffusion with Perceptual Objectives‌.ICLR 2025 - 13rd International Conference on‌ Learning RepresentationsSingapore, SingaporeApril 2025, 1-28‌HAL back to text
24 inproceedingsT.Théo‌ Bodrito, O.Olivier Flasseur, J.Julien‌ Mairal, J.Jean Ponce, M.Maud‌ Langlois and A.-M.Anne-Marie Lagrange. A New‌ Statistical Model of Star Speckles for Learning to‌ Detect and Characterize Exoplanets in Direct Imaging Observations‌.CVPR 2025 - IEEE / CVF Conference‌ on Computer Vision and Pattern RecognitionNashville, United‌ StatesIEEE2025, 1-15HAL back to‌ text
25 inproceedingsE.Etienne Boursier, S.‌Scott Pesme and R.-A.Radu-Alexandru Dragomir. A‌ Theoretical Framework for Grokking: Interpolation followed by Riemannian‌ Norm Minimisation.Advances in Neural Information Processing‌ SystemsNeurIPS 2025 - Neural Information Processing Systems‌38San Diego, United StatesDecember 2025HAL‌back to text
26 inproceedingsN.Nan Cai‌ and P.Pia Bideau. Active Event Alignment‌ for Monocular Distance Estimation.WACV 2024 -‌ IEEE/CVF Winter Conference on Applications of Computer Vision‌Arizona, Tucson, United StatesIEEE2025, 2464-2473‌HAL DOI back to text
27 inproceedingsF.‌Fares El Khoury, E.Edouard Pauwels,‌ S.Samuel Vaiter and M.Michael Arbel.‌ Learning Theory for Kernel Bilevel Optimization.NeurIPS‌ 2025 – Advances in Neural Information Processing Systems‌NeurIPS 2025 - 39th Annual Conference on Neural‌ Information Processing SystemsSan Diego, United StatesDecember‌ 2025, pp.1-47HALback to text
28‌ inproceedingsR.Renaud Gaucher, A.Aymeric Dieuleveut‌ and H.Hadrien Hendrikx. Unified Breakdown Analysis‌ for Byzantine Robust Gossip.Proceedings of Machine‌ Learning ResearchInternational Conference on Machine Learning267‌Vancouver, CanadaJuly 2025, 18868-18896HAL back‌ to text
29 inproceedingsT. B.Tariq Berrada‌ Ifriqi, A.Adriana Romero-Soriano, M.Michal Drozdzal, J.Jakob‌ Verbeek and K.Karteek‌ Alahari. Entropy Rectifying‌‌ Guidance for Diffusion and Flow Models.NeurIPS‌ 2025 - Thirty-ninth Conference‌ on Neural Information Processing‌‌ SystemsSan Diego (CA), United StatesDecember 2025‌, 1-20HAL back‌ to text
30 inproceedings‌‌Z.Zhiqi Kang, L.Liyuan Wang,‌ X.Xingxing Zhang and‌ K.Karteek Alahari.‌‌ Advancing Prompt-Based Methods for Replay-Independent General Continual Learning‌.ICLR 2025 -‌ International Conference on Learning‌‌ RepresentationsSingapore, Singapore2025, 1-19HAL back‌ to text
31 inproceedings‌P.Paul Liautaud,‌‌ P.Pierre Gaillard and O.Olivier Wintenberger.‌ Minimax Adaptive Online Nonparametric‌ Regression over Besov Spaces‌‌.NeurIPS 2025 - 39th Annual Conference on‌ Neural Information Processing Systems‌San Diego, United States‌‌2025HAL back to text
32 inproceedingsP.‌Paul Liautaud, P.‌Pierre Gaillard and O.‌‌Olivier Wintenberger. Minimax-optimal and Locally-adaptive Online Nonparametric‌ Regression.Proceedings of‌ Machine Learning ResearchALT‌‌ 2025 - 36th International Conference on Algorithmic Learning‌ Theory272Milan, Italy‌February 2025, 702-735‌‌HAL back to text
33 inproceedingsB.Bianca‌ Marin Moreno, K.‌Khaled Eldowa, P.‌‌Pierre Gaillard, M.Margaux Brégère and N.‌Nadia Oudjane. Online‌ Episodic Convex Reinforcement Learning‌‌.ICML 2025 - Proceedings of the 42nd‌ International Conference on Machine‌ Learning267Vancouver, Canada‌‌2025, 44775--44824HALback to text
34‌ inproceedingsJ.Juliette Marrie‌, R.Romain Ménégaux‌‌, M.Michael Arbel, D.Diane Larlus‌ and J.Julien Mairal‌. LUDVIG: Learning-free uplifting‌‌ of 2d visual features to Gaussian splatting scenes‌.ICCV 2025 -‌ International Conference on Computer‌‌ VisionHonolulu, Hawai'i, United StatesOctober 2025,‌ 1-24HAL back to‌ text
35 inproceedingsG.‌‌Giacomo Meanti, T.Thomas Ryckeboer, M.‌Michael Arbel and J.‌Julien Mairal. Unsupervised‌‌ Imaging Inverse Problems with Diffusion Distribution Matching.‌International Conference on Computer‌ Vision (ICCV)Honolulu, United‌‌ StatesJune 2025HALback to text
36‌ inproceedingsS.Scott Pesme‌, G.Giacomo Meanti‌‌, M.Michael Arbel and J.Julien Mairal‌. MAP Estimation with‌ Denoisers: Convergence Rates and‌‌ Guarantees.NeurIPS 2025 - 39th Annual Conference‌ on Neural Information Processing‌ SystemsSan Diego, United‌‌ StatesAugust 2025, 1-30HAL back to‌ text
37 inproceedingsR.‌Runfeng Qu, O.‌‌Ole Hall, P. K.Pia K Bideau‌, J.Julie Ouerfelli-Ethier‌, M.Martin Rolfs‌‌, K.Klaus Obermayer and O.Olaf Hellwich‌. Salience-SGG: Enhancing Unbiased‌ Scene Graph Generation with‌‌ Iterative Salience Estimation.WACV 2026 - IEEE/CVF‌ Winter Conference on Applications‌ of Computer VisionArizona,‌‌ Tucson, United StatesJanuary 2026HAL back to‌ text
38 inproceedingsA.‌Aadirupa Saha and P.‌‌Pierre Gaillard. Finally Rank-Breaking Conquers MNL Bandits:‌ Optimal and Efficient Algorithms‌ for MNL Assortment.‌‌ICLR 2025 - The Thirteenth International Conference on‌ Learning RepresentationsSingapour, Singapore‌April 2025HAL back‌‌ to text back to‌ text
39 inproceedingsE.Eloïse Touron, P.‌ L.Pedro Luiz Coelho Rodrigues, J.Julyan‌ Arbel, N.Nelle Varoquaux and M.Michael‌ Arbel. Simulation-based inference of yeast centromeres.‌NeurIPS 2025 - 39th Conference on Neural Information‌ Processing Systems Workshop : The 3rd Workshop on‌ Imageomics: Discovering Biological Knowledge from Images Using AI.‌Copenhagen, DenmarkNovember 2025, 1-13HAL DOI‌back to text
40 inproceedingsJ.Julien Zhou‌, P.Pierre Gaillard, T.Thibaud Rahier‌ and J.Julyan Arbel. Logarithmic Regret for‌ Unconstrained Submodular Maximization Stochastic Bandit.Proceedings of‌ Machine Learning ResearchALT 2025 - 36th International‌ Conference on Algorithmic Learning Theory272Milan, Italy‌PMLR2025, 1-25HAL DOI back to‌ text

Doctoral dissertations and habilitation theses

41 thesis‌T.Théo Bodrito. Deep learning for exoplanet‌ detection in high contrast imaging.Ecole normale‌ Supérieur - PSLJune 2025HAL back to‌ text
42 thesisT.Timothée Darcet. Discovering‌ Complex Structures in Images using Large-Scale Self-Supervised Learning‌.Université Grenoble Alpes [2020-....]July 2025HAL‌back to text

Reports & preprints

43 misc‌J.Juliette Bertrand, S.Sophie Giffard-Roisin,‌ J.James Hollingsworth and J.Julien Mairal.‌ MicroFlow: Domain-Specific Optical Flow for Ground Deformation Estimation‌ in Seismic Events.2025HAL back to‌ text
44 miscP.Pierre Boudart, P.‌Pierre Gaillard and A.Alessandro Rudi. Enjoying‌ Non-linearity in Multinomial Logistic Bandits.July 2025‌HAL back to textback to text
45‌ miscR.Renaud Gaucher, A.Aymeric Dieuleveut‌ and H.Hadrien Hendrikx. Byzantine-Robust Gossip: Insights‌ from a Dual Approach.July 2025HAL‌back to text
46 miscT. B.Tariq‌ Berrada Ifriqi, J.John Nguyen, K.‌Karteek Alahari, J.Jakob Verbeek and R.‌ T.Ricky T. Q. Chen. Flowception: Temporally‌ Expansive Flow Matching for Video Generation.December‌ 2025HAL back to text
47 miscE.‌Emmanuel Jehanno, R.Romain Menegaux, J.‌Julien Mairal and S.Sergei Grudinin. Challenges‌ in Non-Polymeric Crystal Structure Prediction: Why a Geometric,‌ Permutation-Invariant Loss is Needed.2025HAL back‌ to text
48 miscZ.Zhiqi Kang,‌ R.Rahaf Aljundi, V.Vaggelis Dorovatas and‌ K.Karteek Alahari. Online In-Context Distillation for‌ Low-Resource Vision Language Models.October 2025HAL‌back to text
49 miscB. M.Bianca‌ Marin Moreno, M.Margaux Brégère, P.‌Pierre Gaillard and N.Nadia Oudjane. Online‌ Markov Decision Processes with Terminal Law Constraints.‌January 2026HAL back to text
50 misc‌J.Jean Ponce, B.Basile Terver,‌ M.Martial Hebert and M.Michael Arbel.‌ Dual Perspectives on Non-Contrastive Self-Supervised Learning.October‌ 2025HAL back to text
51 miscP.-L.‌Pierre-Louis Ruhlmann, P. L.Pedro Luiz Coelho‌ Rodrigues, M.Michael Arbel and F.Florence‌ Forbes. Flow Matching for Robust Simulation-Based Inference under Model Misspecification.‌December 2025HAL back‌ to text
52 misc‌‌R.Romain Seailles, J.-B.Jean-Baptiste Masson,‌ J.Jean Ponce and‌ J.Julien Mairal.‌‌ Optimal transport unlocks end-to-end learning for single-molecule localization‌.2025HAL back‌ to text
53 misc‌‌J.Julien Weibel, P.Pierre Gaillard,‌ W. M.Wouter M.‌ Koolen and A.Adrien‌‌ Taylor. Optimized projection-free algorithms for online learning:‌ construction and worst-case analysis‌.June 2025HAL‌‌back to text

Scientific popularization

54 articleK.‌Karteek Alahari and L.‌Lorenzo Jacques. Des‌‌ réseaux de neurones artificiels pour aider les robots‌ à comprendre leur environnement‌ [podcast].IntersticesFebruary‌‌ 2025HAL back to text

12.2 Cited publications‌

55 unpublishedC.Camila‌ Fernandez, C. S.‌‌Chung Shue Chen, P.Pierre Gaillard and‌ A.Alonso Silva.‌ Experimental Comparison of Ensemble‌‌ Methods and Time-to-Event Analysis Models Through Integrated Brier‌ Score and Concordance Index‌.2024, working‌‌ paper or preprintHALback to text

THOTH - 2025

THOTH - 2025

2025​‌﻿﻿Activity reportProject-TeamTHOTH​​﻿﻿

Keywords

Computer Science​​﻿﻿ and Digital Science

Other​​​‌ Research Topics and Application﻿​﻿﻿ Domains

1 Team members, visitors,​​﻿﻿ external collaborators

Research Scientists​​​‌

Post-Doctoral Fellows﻿﻿﻿‌

PhD Students

Technical﻿​​﻿ Staff

Interns and Apprentices﻿​​﻿

Administrative Assistant​‌﻿﻿

Visiting Scientists

External Collaborator

2​​​‌ Overall objectives

3 Research program

3.1﻿​﻿﻿ Designing and learning structured​‌﻿﻿ models

3.2 Learning of visual﻿​﻿﻿ models from minimal supervision​‌﻿﻿

3.3 Large-scale​​﻿﻿ learning and optimization

4​‌﻿﻿ Application domains

4.1 Visual​​﻿﻿ applications

4.2 Pluri-disciplinary research

5 Social and​​​‌ environmental responsibility

5.1 Footprint﻿​﻿﻿ of research activities

Compute​‌﻿﻿

Travel

5.2 Impact﻿﻿﻿‌ of research results

6 Highlights﻿​​﻿ of the year

6.1​​​‌ Awards

7﻿​​﻿ Latest software developments, platforms,​​​‌ open data

7.1 Latest﻿﻿﻿‌ software developments

7.1.1 Cyanure﻿‌​‌

7.1.2 MLXP​‌﻿﻿

8 New results﻿​﻿﻿

8.1 Visual Recognition

Object-wise​‌﻿﻿ Distance Estimation for Event​​﻿﻿ Camera Data

Salience-SGG: Enhancing﻿​﻿﻿ Unbiased Scene Graph Generation​‌﻿﻿ with Iterative Salience Estimation​​﻿﻿

Watching Swarm Dynamics from﻿​​﻿ Above: A Framework for​​​‌ Advanced Object Tracking in﻿﻿﻿‌ Drone Videos

LUDVIG:﻿​​﻿ Learning-free Uplifting of 2D​​​‌ Visual features to Gaussian﻿﻿﻿‌ Splatting scenes.

Cluster​​​‌ and Predict Latent Patches﻿​﻿﻿ for Improved Masked Image​‌﻿﻿ Modeling

Entropy Rectifying Guidance​​​‌ for Diffusion and Flow﻿​﻿﻿ Models

Boosting Latent Diffusion with﻿​​﻿ Perceptual Objectives

Lightweight Structure-Aware﻿‌​‌ Attention for Visual Understanding﻿​​﻿

Source-free video domain adaptation​​​‌ by learning from noisy﻿​﻿﻿ labels

Flowception:​​﻿﻿ Temporally Expansive Flow Matching​​​‌ for Video Generation

Online In-Context Distillation for​​​‌ Low-Resource Vision Language Models﻿﻿﻿‌

8.2﻿‌​‌ Statistical Machine Learning and﻿​​﻿ Optimization

Counterfactual Learning of​​​‌ Stochastic Policies with Continuous﻿﻿﻿‌ Actions

MAP​​​‌ Estimation with Denoisers: Convergence﻿﻿﻿‌ Rates and Guarantees

Logarithmic Regret for​‌﻿﻿ Unconstrained Submodular Maximization Stochastic​​﻿﻿ Bandit

Locally﻿​﻿﻿ Adaptive Online Nonparametric Regression​‌﻿﻿

Online Learning﻿​​﻿ Approach for Survival Analysis​​​‌

Efficient and Near-Optimal﻿﻿﻿‌ Online Portfolio Selection

Online﻿‌​‌ Convex Reinforcement Learning with﻿​​﻿ applications to Demand-Side Management.​​​‌

Optimized projection-free algorithms for﻿​﻿﻿ online learning: construction and​‌﻿﻿ worst-case analysis

Optimal and Efficient​‌﻿﻿ Algorithms for Multinomial Logistic​​﻿﻿ Bandits

Advancing Prompt-Based﻿​﻿﻿ Methods for Replay-Independent General​‌﻿﻿ Continual Learning

Unified Breakdown Analysis for﻿​​﻿ Byzantine Robust Gossip

Byzantine-Robust Gossip:​​​‌ Insights from a Dual﻿﻿﻿‌ Approach

A​‌﻿﻿ Theoretical Framework for Grokking:​​﻿﻿ Interpolation followed by Riemannian​​​‌ Norm Minimisation

Flow Matching for​​﻿﻿ Robust Simulation-Based Inference under​​​‌ Model Misspecification

Simulation-based inference﻿​﻿﻿ of yeast centromeres

Dual Perspectives on​​​‌ Non-Contrastive Self-Supervised Learning

Learning Theory for Kernel​​​‌ Bilevel Optimization

EquiTabPFN:﻿​﻿﻿ A Target-Permutation Equivariant Prior​‌﻿﻿ Fitted Network

8.3 Scientific Imaging and​‌﻿﻿ Remote Sensing

A New​​﻿﻿ Statistical Model of Star​​​‌ Speckles for Learning to﻿​﻿﻿ Detect and Characterize Exoplanets​‌﻿﻿ in Direct Imaging Observations​​﻿﻿

Unsupervised Imaging Inverse Problems​‌﻿﻿ with Diffusion Distribution Matching​​﻿﻿

Optimal transport﻿﻿﻿‌ unlocks end-to-end learning for﻿‌​‌ single-molecule localization

SpectralEarth: Training﻿‌​‌ Hyperspectral Foundation Models at﻿​​﻿ Scale

MicroFlow:​‌﻿﻿ Domain-Specific Optical Flow for​​﻿﻿ Ground Deformation Estimation in​​​‌ Seismic Events

Leveraging​​​‌ very high resolution optical﻿​﻿﻿ remote sensing data and​‌﻿﻿ deep learning to assess​​﻿﻿ the potential for photovoltaïc​​​‌ energy production in urban﻿​﻿﻿ areas

Hyperspectral Pansharpening﻿﻿﻿‌

Probing﻿​​﻿ Synergistic High-Order Interaction for​​​‌ Multi-Modal Image Fusion

Fully-Connected Transformer​​﻿﻿ for Multi-Source Image Fusion​​​‌

GeoFlowNet-SAR:​​​‌ Earthquake Displacement Estimation from﻿​﻿﻿ Synthetic Aperture Radar Images​‌﻿﻿

Kolmogorov–Arnold﻿‌​‌ Network for Hyperspectral Change﻿​​﻿ Detection

ECSPLAIN: Explainability Constrained-claSsifier﻿​​﻿ for Pairing the detection​​​‌ and the Localization of﻿﻿﻿‌ moving Areas from SAR﻿‌​‌ INterferograms

8.4 Other​​​‌ pluri-disciplinary projects

2025‌Activity reportProject-TeamTHOTH

Computer Science and Digital Science

Other‌ Research Topics and Application Domains

1 Team members, visitors, external collaborators

Research Scientists‌

Post-Doctoral Fellows‌

Technical Staff

Interns and Apprentices

Administrative Assistant‌

2‌ Overall objectives

3.1 Designing and learning structured‌ models

3.2 Learning of visual models from minimal supervision‌

3.3 Large-scale learning and optimization

4‌ Application domains

4.1 Visual applications

5 Social and‌ environmental responsibility

5.1 Footprint of research activities

Compute‌

5.2 Impact‌ of research results

6 Highlights of the year

6.1‌ Awards

7 Latest software developments, platforms,‌ open data

7.1 Latest‌ software developments

7.1.1 Cyanure‌‌

7.1.2 MLXP‌

8 New results

Object-wise‌ Distance Estimation for Event Camera Data

Salience-SGG: Enhancing Unbiased Scene Graph Generation‌ with Iterative Salience Estimation

Watching Swarm Dynamics from Above: A Framework for‌ Advanced Object Tracking in‌ Drone Videos

LUDVIG: Learning-free Uplifting of 2D‌ Visual features to Gaussian‌ Splatting scenes.

Cluster‌ and Predict Latent Patches for Improved Masked Image‌ Modeling

Entropy Rectifying Guidance‌ for Diffusion and Flow Models

Boosting Latent Diffusion with Perceptual Objectives

Lightweight Structure-Aware‌‌ Attention for Visual Understanding

Source-free video domain adaptation‌ by learning from noisy labels

Flowception: Temporally Expansive Flow Matching‌ for Video Generation

Online In-Context Distillation for‌ Low-Resource Vision Language Models‌

8.2‌‌ Statistical Machine Learning and Optimization

Counterfactual Learning of‌ Stochastic Policies with Continuous‌ Actions

MAP‌ Estimation with Denoisers: Convergence‌ Rates and Guarantees

Logarithmic Regret for‌ Unconstrained Submodular Maximization Stochastic Bandit

Locally Adaptive Online Nonparametric Regression‌

Online Learning Approach for Survival Analysis‌

Efficient and Near-Optimal‌ Online Portfolio Selection

Online‌‌ Convex Reinforcement Learning with applications to Demand-Side Management.‌

Optimized projection-free algorithms for online learning: construction and‌ worst-case analysis

Optimal and Efficient‌ Algorithms for Multinomial Logistic Bandits

Advancing Prompt-Based Methods for Replay-Independent General‌ Continual Learning

Unified Breakdown Analysis for Byzantine Robust Gossip

Byzantine-Robust Gossip:‌ Insights from a Dual‌ Approach

A‌ Theoretical Framework for Grokking: Interpolation followed by Riemannian‌ Norm Minimisation

Flow Matching for Robust Simulation-Based Inference under‌ Model Misspecification

Simulation-based inference of yeast centromeres

Dual Perspectives on‌ Non-Contrastive Self-Supervised Learning

Learning Theory for Kernel‌ Bilevel Optimization

EquiTabPFN: A Target-Permutation Equivariant Prior‌ Fitted Network

8.3 Scientific Imaging and‌ Remote Sensing

A New Statistical Model of Star‌ Speckles for Learning to Detect and Characterize Exoplanets‌ in Direct Imaging Observations

Unsupervised Imaging Inverse Problems‌ with Diffusion Distribution Matching

Optimal transport‌ unlocks end-to-end learning for‌‌ single-molecule localization

SpectralEarth: Training‌‌ Hyperspectral Foundation Models at Scale

MicroFlow:‌ Domain-Specific Optical Flow for Ground Deformation Estimation in‌ Seismic Events

Leveraging‌ very high resolution optical remote sensing data and‌ deep learning to assess the potential for photovoltaïc‌ energy production in urban areas

Hyperspectral Pansharpening‌

Probing Synergistic High-Order Interaction for‌ Multi-Modal Image Fusion

Fully-Connected Transformer for Multi-Source Image Fusion‌

GeoFlowNet-SAR:‌ Earthquake Displacement Estimation from Synthetic Aperture Radar Images‌

Kolmogorov–Arnold‌‌ Network for Hyperspectral Change Detection

ECSPLAIN: Explainability Constrained-claSsifier for Pairing the detection‌ and the Localization of‌ moving Areas from SAR‌‌ INterferograms

8.4 Other‌ pluri-disciplinary projects

Challenges in Non-Polymeric Crystal Structure Prediction:‌ Why a Geometric, Permutation-Invariant Loss is Needed

9 Bilateral contracts and grants with industry

9.1‌ Bilateral contracts with industry

10 Partnerships‌ and cooperations

10.1 International‌ initiatives

10.1.1 Participation in‌‌ other International Programs

Project EIFFEL

10.2 European initiatives‌

APHELEIA‌‌

10.2.2 Other european‌ programs/initiatives