EN FR
EN FR

2025Activity​​ reportProject-TeamDATAVERS

RNSR:​​​‌ 202524718N
  • Research center Inria​ Centre at the University​‌ of Lille
  • In partnership​​ with:Université de Lille,​​​‌ Centre Hospitalier Universitaire de​ Lille
  • Team name: From​‌ health Data universe to​​ advances in statistical learning​​​‌
  • In collaboration with:Laboratoire​ Paul Painlevé (LPP), Evaluation​‌ des technologies de santé​​ et des pratiques médicales​​​‌

Creation of the Project-Team:​ 2025 August 01

Each​‌ year, Inria research teams​​ publish an Activity Report​​​‌ presenting their work and​ results over the reporting​‌ period. These reports follow​​ a common structure, with​​​‌ some optional sections depending​ on the specific team.​‌ They typically begin by​​ outlining the overall objectives​​​‌ and research programme, including​ the main research themes,​‌ goals, and methodological approaches.​​ They also describe the​​​‌ application domains targeted by​ the team, highlighting the​‌ scientific or societal contexts​​ in which their work​​​‌ is situated.

The reports​ then present the highlights​‌ of the year, covering​​ major scientific achievements, software​​​‌ developments, or teaching contributions.​ When relevant, they include​‌ sections on software, platforms,​​ and open data, detailing​​​‌ the tools developed and​ how they are shared.​‌ A substantial part is​​ dedicated to new results,​​​‌ where scientific contributions are​ described in detail, often​‌ with subsections specifying participants​​ and associated keywords.

Finally,​​​‌ the Activity Report addresses​ funding, contracts, partnerships, and​‌ collaborations at various levels,​​ from industrial agreements to​​​‌ international cooperations. It also​ covers dissemination and teaching​‌ activities, such as participation​​ in scientific events, outreach,​​​‌ and supervision. The document​ concludes with a presentation​‌ of scientific production, including​​ major publications and those​​​‌ produced during the year.​

Keywords

Computer Science and​‌ Digital Science

  • A3.1.4. Uncertain​​ data
  • A3.1.10. Heterogeneous data​​​‌
  • A3.2.3. Inference
  • A3.3.2. Data​ mining
  • A3.3.3. Big data​‌ analysis
  • A5.2. Data visualization​​
  • A5.9.2. Estimation, modeling
  • A6.2.3.​​​‌ Probabilistic methods
  • A6.2.4. Statistical​ methods
  • A6.3.3. Data processing​‌
  • A9.2. Machine learning
  • A9.2.1.​​ Supervised learning
  • A9.2.2. Unsupervised​​​‌ learning
  • A9.2.5. Bayesian methods​
  • A9.2.7. Kernel methods

Other​‌ Research Topics and Application​​ Domains

  • B2.2.3. Cancer
  • B9.5.6.​​​‌ Data science
  • B9.6.3. Economy,​ Finance
  • B9.6.5. Sociology

1​‌ Team members, visitors, external​​ collaborators

Research Scientist

  • Christophe​​​‌ Biernacki [INRIA,​ Senior Researcher, from​‌ Aug 2025, HDR​​]

Faculty Members

  • Cristian​​​‌ Preda [Team leader​, UNIV LILLE,​‌ Professor, from Aug​​ 2025, HDR]​​​‌
  • Evgéniya Babykina [UNIV​ LILLE, Associate Professor​‌, from Aug 2025​​]
  • Emmanuel Chazard [​​​‌UNIV LILLE, Professor​, from Aug 2025​‌]
  • Sophie Dabo [​​UNIV LILLE, Professor​​​‌, from Aug 2025​]
  • Guillemette Marot [​‌UNIV LILLE, Professor​​, from Aug 2025​​]

Post-Doctoral Fellows

  • Gaurav​​​‌ Dhar [INRIA,‌ Post-Doctoral Fellow, from‌​‌ Aug 2025 until Aug​​ 2025]
  • Komlan Midodzi​​​‌ Noukpoape [UNIV LILLE‌, Post-Doctoral Fellow,‌​‌ from Aug 2025 until​​ Nov 2025]

PhD​​​‌ Students

  • Mustapha Atmani [‌CEREMA]
  • François Bassac‌​‌ [DECATHLON, CIFRE​​, from Aug 2025​​​‌]
  • Clarisse Boinay [‌DAUPHINE PSL, from‌​‌ Aug 2025 until Aug​​ 2025]
  • Hugo Cannafarina​​​‌ [INRIA, from‌ Aug 2025 until Oct‌​‌ 2025]
  • Violaine Courrier​​ [WITHINGS, CIFRE​​​‌, from Aug 2025‌]
  • Clara Dubois [‌​‌UNIV LILLE, CIFRE​​, from Aug 2025​​​‌, UCCS]
  • Cécile‌ Verrier [UNIV LILLE‌​‌, from Aug 2025​​, LPP]

Technical​​​‌ Staff

  • Paul Faye [‌INRIA, Engineer,‌​‌ from Nov 2025]​​
  • Nicolas Jankovsky [SATT​​​‌ NORD, from Aug‌ 2025]

Interns and‌​‌ Apprentices

  • Marin Bahut [​​INRIA, Intern,​​​‌ from Aug 2025 until‌ Aug 2025]
  • Theo‌​‌ Dufresne [INRIA,​​ Intern, from Aug​​​‌ 2025 until Aug 2025‌]

Administrative Assistant

  • Anne‌​‌ Rejl [INRIA]​​

Visiting Scientist

  • Rahul Bordoloi​​​‌ [UNIV ROSTOCK,‌ from Oct 2025]‌​‌

2 Overall objectives

The​​ overall objective of Datavers​​​‌ is to provide a‌ framework where theoretical and‌​‌ applied developments in statistical​​ learning with complex and​​​‌ heterogeneous data meet the‌ expectations of clinical decision‌​‌ support system's users. The​​ aim is to orient​​​‌ the methodological developments according‌ to the needs of‌​‌ clinical decision support, and​​ reciprocally, to find applications​​​‌ in clinical decision support‌ for methodological developments.

2.1‌​‌ Context

Health data is​​ the main ingredient for​​​‌ building clinical decision systems,‌ helping doctors and health‌​‌ policy makers in their​​ decisions. There is a​​​‌ permanent evolution of collecting‌ (volume) and organizing health‌​‌ data (type complexity). It​​ becomes more and more​​​‌ available, accessible and, clearly,‌ it represents a considerable‌​‌ source of inspiration for​​ new research, firstly medical​​​‌ but also applicable to‌ other domains such as‌​‌ computer science, physics, chemistry.​​ In France, the universe​​​‌ of health data is‌ under a continuous process‌​‌ of structuring: EHR (Electronic​​ Health Record) known also​​​‌ as EDS (Entrepôt‌ de données de santé‌​‌), SNDS (Système​​ National des Données de​​​‌ Santé), Health Data‌ Hub, IoT health data,‌​‌ etc. As a consequence,​​ the complexity of the​​​‌ health data to be‌ analyzed has significantly increased.‌​‌ It can be understood​​ in terms of size​​​‌ (number of individuals and‌ variables), structural complexity (relational‌​‌ scheme, number of levels​​ of qualitative variables), missingness,​​​‌ and relation to time‌ (almost all variables are‌​‌ time-dependent, and are collected​​ in real life according​​​‌ to a temporality which‌ is not protocolized) or‌​‌ to space or space/time​​ (in a large number​​​‌ of real problems, data‌ have complex spatial or‌​‌ spatio-temporal dependence structure). This​​ new generation of data​​​‌ represents a provocation for‌ the current statistical learning‌​‌ methods, which need to​​ be adapted or conceptually​​​‌ reshaped: curse of dimensionality,‌ variable selection, visualisation and‌​‌ interpretation, new algorithms for​​​‌ new data types, etc.​ Nevertheless, the objectives of​‌ data analyses remain the​​ same: description (visualisation, clustering),​​​‌ prediction (supervised learning, regression)​ or any combination of​‌ those.

2.2 Goals

New​​ generation of data is​​​‌ an increasing challenge for​ nowadays statistical learning methods.​‌ Datavers addresses it developping​​ data-driven research in statistics​​​‌ resulting from strong ongoing​ collaborations with partners from​‌ health domain. The main​​ goal of Datavers is​​​‌ to develop statistical learning​ methods in order to​‌ build a clinical decision​​ support system based on​​​‌ heterogeneous, high-dimensional, time-space dependent​ data such as public​‌ health, clinical, multi-omics.

Datavers​​ intends to address clinical​​​‌ questions related to patient​ health trajectory (define patterns,​‌ rehospitalisation prediction, disease dynamics,​​ etc.) and precision​​​‌ medicine (definition of subgroups​ of patients, biomarker selection,​‌ notably with application to​​ adverse drug event prevention,​​​‌ etc.). In this​ applied framework, statistical learning​‌ theoretical developpments are carried​​ out with focus on​​​‌ many directions such as:​ variable selection and prediction​‌ with heteregenous and censored​​ data, clustering of massive​​​‌ data, space-time and functional​ data analysis.

3 Research​‌ program

The research program​​ is structured within three​​​‌ main interacting axes:​

  • Axis 1 provides a​‌ framework for building a​​ clinical decision system based​​​‌ on heterogeneous data coming​ from EHR or other​‌ databases (public health, clinical,​​ biological and multi-omics data).​​​‌
  • Axis 2 deals with​ modeling time-space dependent data.​‌ It includes functional data​​ analysis, point processes, recurrent​​​‌ events and competing risks.​
  • Axis 3 develops methods​‌ for other data types,​​ namely complex, high-dimensional and​​​‌ tall data. It includes​ networks, multivariate censored, missing​‌ data and frugal methods​​ for massive data.

3.1​​​‌ Axis 1: Clinical decision​ system based on complex​‌ health data

This axis​​ provides a framework for​​​‌ building a clinical decision​ system based on heterogeneous​‌ data coming from EHR​​ and other complementary databases​​​‌ (it includes public health,​ clinical, biological, multi-omics data).​‌ It represents an applied​​ research based on collaborations​​​‌ with researchers from CHU​ Lille and on medical​‌ data issued from EDS​​ Include (CHU Lille) and​​​‌ SNDS. The main topic​ developped in this axis​‌ is the understanding of​​ the complexity of native​​​‌ structured healthcare data and​ the formulation of clinical​‌ questions and raw EHR​​ data into statistical learning​​​‌ statements. To develop models​ and precision medicine methods​‌ for patient health trajectories​​ concerning elderly and diabetic/obese​​​‌ patients represent a first​ example of application on​‌ real data. Because of​​ the variety of the​​​‌ scope, this axis involves​ all the permanent team​‌ members with a special​​ mention for members from​​​‌ Metrics laboratory which interract​ closely with clinicians and​‌ hospital structures. PhD students​​ and some engineers are​​​‌ associated.

3.2 Axis 2:​ Space-time dependent data analysis​‌

One main concern of​​ the Datavers team is​​​‌ to developp statistical methods​ taking into account the​‌ temporal and spatial dimension​​ of data. The clinical​​​‌ decision support systems of​ interest in this project​‌ concern both health patient​​ trajectories and event occurences​​​‌ over time (death, hospitalisation,​ adverse drug events). In​‌ both cases, time and​​ space are essential factors​​ in the statistical analysis.​​​‌ As a general observation,‌ the EHR systems massively‌​‌ record the spatio-temporal feature​​ of clinical data. Datavers​​​‌ develops methodologies for functional‌ data analysis, spatial statistics‌​‌ and multistate models with​​ focus on categorical functional​​​‌ data, spatial dependency of‌ observations in supervised learning‌​‌ and survival analysis with​​ competing risks and recurrent​​​‌ events.

3.3 Axis 3:‌ Complex, high dimensional and‌​‌ tall data analysis

The​​ originality of this axis​​​‌ is to properly account‌ for the fact that‌​‌ records in EHR are​​ gradually more numerous (regarding​​​‌ the number of patients‌ and the number of‌​‌ features) and also that​​ the recorded features are​​​‌ more heterogeneous with a‌ high variety of patient‌​‌ features. As a more​​ precise example concerning the​​​‌ number of individuals, the‌ national PMSI database (medicalization‌​‌ of information systems program)​​ contains nearly 24 to​​​‌ 27 million MCO (obstetrics-surgical‌ medicine) stays per year.‌​‌ Concerning now the number​​ of features, in genomics​​​‌ one can collect up‌ to 5 billion base‌​‌ pairs per individual. In​​ addition, IoT and wearable​​​‌ devices are a new‌ source of data which‌​‌ inflates databases significantly and​​ in the long term.​​​‌ Generic clustering challenges fueled‌ by EHR such as‌​‌ dealing with avalanche of​​ small clusters within massive​​​‌ and heterogeneous data sets,‌ selection within a priori‌​‌ "irreconcilable" clustering methods and​​ joint modeling of longitudinal​​​‌ high-dimensional data and multivariate‌ censored data are addressed‌​‌ within this axis.

4​​ Application domains

Application areas​​​‌ of statistical modeling for‌ complex data are extensive.‌​‌ Datavers team is mainly​​ focused on Biology and​​​‌ Health applications where new‌ challenges in high throughput‌​‌ technologies or clinical decision​​ systems are opened. Secondary​​​‌ application areas are considered‌ in Industry, Retail, Finance,‌​‌ Marketing and Cybersecurity.

4.1​​ Health applications

Beyond specific​​​‌ applications arising from individual‌ collaborations between the team‌​‌ members and researchers from​​ health domain, Datavers contributes​​​‌ to statistical modeling of‌ the patient path at‌​‌ hospital and precision medicine,​​ in particular for elderly,​​​‌ diabetic and oncology patients.‌ As a general remark,‌​‌ data coming from hospitals​​ will continue to grow​​​‌ in volume and complexity,‌ especially due to recent‌​‌ national policy to build​​ EHR systems which allows​​​‌ to optimally structure medical‌ data and record them‌​‌ over time. Consequently, it​​ is essential to rely​​​‌ on fundamental advances provided‌ in Axis 2 and‌​‌ Axis 3, in particular​​ to anticipate increasing complexity​​​‌ emerging from future EHR-like‌ databases.

4.2 Economic and‌​‌ other field applications

Collaborations​​ with companies such as​​​‌ Decathlon (Sport), ADEO (Marketing),‌ Seckiot (Cybersecurity), Worldline (Finance),‌​‌ Withings (wearable devices) are​​ source of application research​​​‌ we materialise by supervising‌ PhD CIFRE theses with‌​‌ a strong impact in​​ the development plan of​​​‌ these companies. These researches‌ cover dynamical clustering and‌​‌ predictive clustering from time​​ series data, unsupervised learning​​​‌ models from IoT graph‌ data, functional data analysis,‌​‌ etc. Even though health​​ applications are the core​​​‌ of Datavers, the‌ members of the team‌​‌ continue to develop such​​ collaborations to ensure broad​​​‌ transfer of our research‌ beyond health applications.

5‌​‌ Social and environmental responsibility​​​‌

5.1 Footprint of research​ activities

Datavers develops innovative​‌ learning methods as an​​ added value to medical​​​‌ decision support based on​ real-life health. Precision medecine,​‌ patient health trajectories modeling​​ and multistate analysis are​​​‌ the core of Datavers​ research activity.

5.2 Impact​‌ of research results

The​​ development of statistical learning​​​‌ methods to integrate into​ a clinical decision system​‌ is a foundation axis​​ of Datavers. Besides​​​‌ the integration of new​ methodologies within the medical​‌ information systems of specific​​ hospital units (for instance,​​​‌ the geriatric unit) as​ dedicated software, the internships​‌ of master students from​​ Master of Biologie et​​​‌ Santé (Faculty of Medicine)​ or postdocs in public​‌ Health can be shared​​ between Datavers team and​​​‌ research units from University​ of Lille and CHU​‌ of Lille. Datavers objective​​ is to publish these​​​‌ methods and disseminate them​ to researchers wishing to​‌ develop clinical decision support​​ systems.

Establishing partnerships with​​​‌ companies such as CIFRE​ PhD theses is a​‌ powerful tool for us​​ to transfer our research​​​‌ to industry and private​ companies. We intend to​‌ continue this process as​​ the ongoing collaborations with​​​‌ Withings, Decathlon, Worldline, Seckiot,​ Alicante, Horiba and ADEO​‌ companies.

6 Highlights of​​ the year

6.1 Awards​​​‌

The NUMETAB project coordinated​ by Guillemette Marot (​‌Datavers) and Francois​​ Pattou (Inserm U1190) was​​​‌ selected as 2025 laureat​ of the call Cross​‌ Disciplinarity Projects "Initiative d'excellence​​ Université de Lille et​​​‌ France 2030" (2026-2029, 3.2M​ euros)

Sophie Dabo got​‌ the first paper award​​ of the ICMSEM 2025​​​‌ conference

6.2 Nomination

Sophie​ Dabo has been elected​‌ Vice-president of CIMPA (International​​ Center of Pured and​​​‌ Applied Mathematics), Unesco-Center Category​ 2

Sophie Dabo has​‌ been designated by CNRS​​ INSMI to lead the​​​‌ IRL (International Research Laboratory)​ project in mathematics based​‌ in Africa.

6.3 Interview​​

Sophie Dabo gave an​​​‌ interview on October 15th​ at Radio France International​‌ for the event "Around​​ the Question, the magazine​​​‌ for all the sciences:​ How to do mathematics​‌ differently and on every​​ continent".

7 Latest software​​​‌ developments, platforms, open data​

7.1 Latest software developments​‌

7.1.1 cfda

  • Name:
    Categorical​​ functional data analysis
  • Keyword:​​​‌
    Functional data
  • Functional Description:​

    The R package cfda​‌ performs:

    - descriptive statistics​​ for categorical functional data​​​‌

    - dimension reduction and​ optimal encoding of states​‌ (correspondance multiple analyses towards​​ functional data)

    - approximation​​​‌ for multivariate categorical functional​ data analysis.

  • Release Contributions:​‌
    - approximation for multivariate​​ categorical functional data analysis.​​​‌
  • URL:
  • Contact:
    Cristian​ Preda
  • Participants:
    Cristian Preda,​‌ Quentin Grimonprez, Vincent Vandewalle​​
  • Partner:
    Université de Lille​​​‌

7.1.2 MixtComp.V4

  • Keyword:
    Clustering​
  • Functional Description:
    MixtComp (Mixture​‌ Computation) is a model-based​​ clustering package for mixed​​​‌ data from Modal team​ (Inria Lille). It has​‌ been engineered around the​​ idea of easy and​​​‌ quick integration of all​ new univariate models, under​‌ the conditional independence assumption.​​ New models will eventually​​​‌ be available from researches,​ carried out by the​‌ Modal team or by​​ other teams. Currently, central​​​‌ architecture of MixtComp is​ built and functionality has​‌ been field-tested through industry​​ partnerships. Five basic models​​ (Gaussian, Multinomial, Poisson, Weibull,​​​‌ NegativeBinomial) are implemented, as‌ well as two advanced‌​‌ models (Functional and Rank).​​ MixtComp has the ability​​​‌ to natively manage missing‌ data (completely or by‌​‌ interval). MixtComp is used​​ as an R package,​​​‌ but its internals are‌ coded in C++ using‌​‌ state of the art​​ libraries for faster computation.​​​‌
  • Release Contributions:
    - New‌ I/O system - Replacement‌​‌ of regex library -​​ Improvement of initialization -​​​‌ Criteria for stopping the‌ algorithm - Added management‌​‌ of partially missing data​​ for several models -​​​‌ User documentation - Adding‌ user features in R‌​‌
  • URL:
  • Contact:
    Christophe​​ Biernacki
  • Participants:
    Christophe Biernacki,​​​‌ Vincent Kubicki, Matthieu Marbac-Lourdelle,‌ Serge Iovleff, Quentin Grimonprez,‌​‌ Etienne Goffinet

7.1.3 HDSpatialScan​​

  • Name:
    Multivariate and Functional​​​‌ Spatial Scan Statistics
  • Keywords:‌
    Functional data, Clustering, Spatial‌​‌ information, Multivariate data
  • Scientific​​ Description:
    Scan statistics in​​​‌ high dimensional spaces
  • Functional‌ Description:
    Allows to detect‌​‌ spatial clusters of abnormal​​ values on multivariate or​​​‌ functional data
  • URL:
  • Contact:
    Sophie Dabo
  • Participants:‌​‌
    Sophie Dabo, Michael Genin,​​ Camille Frevent

7.1.4 visCorVar​​​‌

  • Name:
    visualization of correlated‌ variables in the context‌​‌ of statistical integration of​​ omics data
  • Keywords:
    Data​​​‌ integration, Visualization
  • Functional Description:‌
    The R package visCorVar‌​‌ allows visualizing results from​​ data integration with the​​​‌ function block.spslda (bioconductor mixOmics‌ package). The data integration‌​‌ is performed for different​​ types of omic datasets​​​‌ (transcriptomics, metabolomics, metagenomics) in‌ order to select variables‌​‌ of a omic dataset​​ which are correlated with​​​‌ the variables of the‌ other omic datasets and‌​‌ the response variables and​​ to predict the class​​​‌ membership of a new‌ sample. These correlated variables‌​‌ can be visualized with​​ correlation circles and networks.​​​‌
  • URL:
  • Contact:
    Guillemette‌ Marot
  • Participants:
    Maxime Brunin,‌​‌ Guillemette Marot, Pierre Pericard​​
  • Partner:
    Université de Lille​​​‌

7.1.5 MLGL

  • Name:
    Multi-Layer‌ Group Lasso
  • Keywords:
    Variable‌​‌ selection, Statistical learning
  • Functional​​ Description:
    The MLGL R-package,​​​‌ standing for Multi-Layer Group-Lasso,‌ implements a procedure of‌​‌ variable selection in the​​ context of redundancy between​​​‌ explanatory variables, which holds‌ true with high dimensional‌​‌ data. The MLGL approach​​ combines variables aggregation and​​​‌ selection in order to‌ improve interpretability and performance.‌​‌ First, a hierarchical clustering​​ procedure provides at each​​​‌ level a partition of‌ the variables into groups.‌​‌ Then, the set of​​ groups of variables from​​​‌ the different levels of‌ the hierarchy is given‌​‌ as input to group-Lasso,​​ with weights adapted to​​​‌ the structure of the‌ hierarchy. At this step,‌​‌ group-Lasso outputs sets of​​ candidate groups of variables​​​‌ for each value of‌ regularization parameter. The versatility‌​‌ offered by MLGL to​​ choose groups at different​​​‌ levels of the hierarchy‌ a priori induces a‌​‌ high computational complexity. MLGL​​ however exploits the structure​​​‌ of the hierarchy and‌ the weights used in‌​‌ group-Lasso to greatly reduce​​ the final time cost.​​​‌ The final choice of‌ the regularization parameter –‌​‌ and therefore the final​​ choice of groups –​​​‌ is made by a‌ multiple hierarchical testing procedure.‌​‌
  • URL:
  • Contact:
    Guillemette​​ Marot
  • Participants:
    Guillemette Marot,​​​‌ Quentin Grimonprez

8 New‌ results

8.1 Axis 1‌​‌ and Axis 3: From​​​‌ Unsupervised to Guided Clustering:​ A Variational Implementation

Participants:​‌ Christophe Biernacki, Violaine​​ Courrier, Cristian Preda​​​‌.

Clustering is viewed​ as an unsupervised technique,​‌ but in practice it​​ requires guidance to uncover​​​‌ meaningful structures. We formalize​ this with guided clustering,​‌ a paradigm that uses​​ a guiding variable to​​​‌ steer the discovery process,​ and introduce the Guided​‌ Clustering Variational Autoencoder (GCVAE)​​ as its deep generative​​​‌ realization. GCVAE learns a​ latent space structured as​‌ a Gaussian Mixture Model​​ by optimizing a variational​​​‌ objective that forces the​ representation to be maximally​‌ informative about the guiding​​ variable. This framework allows​​​‌ the resulting clustering to​ be dynamically reoriented by​‌ altering the guiding variable,​​ yielding clusters that are​​​‌ both interpretable and meaningful​ for the specified context.​‌ Experiments on public (MNIST-SVHN)​​ and proprietary connected health​​​‌ devices data demonstrate GCVAE’s​ ability to discover coherent​‌ and task-relevant clusters in​​ complex, high-dimensional settings.

This​​​‌ work has been presented​ to the team seminar​‌ 49, to a​​ national conference 35,​​​‌ 31 and to an​ international conference 29.​‌ The submission to an​​ international joural is in​​​‌ preparation.

It is a​ joint work with Benjamin​‌ Vittrant from the Witings​​ company.

8.2 Axis 1​​​‌ and Axis 3: Levels​ Merging in the Latent​‌ Class Model

Participants: Christophe​​ Biernacki, Emmanuel Chazard​​​‌, Johan Lyrvall.​

The latent class model​‌ (LCM), dedicated to cluster​​ categorical variables, suffers for​​​‌ the curse of dimension​ when the number of​‌ levels is large, situation​​ frequently encountered in practice.​​​‌ We propose to extent​ LCM to a natural​‌ modeling which limits the​​ number of levels by​​​‌ merging them, process which​ is also equivalent to​‌ a specific levels clustering.​​ Related estimation and model​​​‌ selection processes are also​ presented and discussed.

A​‌ national conference, an international​​ conference and also an​​​‌ international journal are in​ preparation.

This is a​‌ joint work with Christine​​ Keribin from University Paris-Saclay.​​​‌

8.3 Axis 1 and​ Axis 2: Joint Latent​‌ Class Models: A Tutorial​​ on Practical Applications in​​​‌ Clinical Research

Participant: Genia​ Babykina.

The joint​‌ latent class model is​​ a statistical approach that​​​‌ allows the simultaneous analysis​ of two outcomes related​‌ to disease progression—a longitudinal​​ outcome and a time-to-event​​​‌ outcome—in the presence of​ population heterogeneity. The theoretical​‌ properties of the model​​ have been established, and​​​‌ it has been implemented​ in dedicated software. However,​‌ due to its complexity,​​ the model remains challenging​​​‌ for clinicians to specify​ and use in practice.​‌ This work, published in​​ article 18, provides​​​‌ a detailed tutorial aimed​ at clinicians and applied​‌ statisticians. It explains how​​ to specify joint latent​​​‌ class models in the​ R software to address​‌ concrete clinical questions, and​​ how to explore, manipulate,​​​‌ and interpret the resulting​ outputs. The tutorial is​‌ based on a real​​ clinical dataset; for each​​​‌ clinical question, the corresponding​ mathematical model specification and​‌ R implementation are presented,​​ along with a detailed​​​‌ interpretation of estimation results​ and goodness-of-fit measures. This​‌ work was carried out​​ within the framework of​​ the PhD thesis of​​​‌ M. Kyheng, co-supervised by‌ G. Babykina, and in‌​‌ collaboration with A. Duhamel​​ (University of Lille, CHU​​​‌ Lille).

8.4 Axis 1:‌ Metabolite biomarker discovery for‌​‌ pancreatic neuroendocrine tumors using​​ metabolomic approach

Participant: Sophie​​​‌ Dabo.

Metabolic flexibility,‌ a key hallmark of‌​‌ cancer, reflects aberrant tumour​​ changes associated with metabolites.​​​‌ The metabolic plasticity of‌ pancreatic neuroendocrine tumours (pNETs)‌​‌ remains largely unexplored. Notably,​​ the heterogeneity of pNETs​​​‌ complicates their diagnosis, prognosis,‌ and therapeutic management. In‌​‌ this paper we compared​​ the plasma metabolomic profiles​​​‌ of patients with pNET‌ and non-cancerous individuals to‌​‌ understand metabolic dysregulation. This​​ study highlights the distinct​​​‌ plasma metabolic signatures of‌ pNETs, including the critical‌​‌ role of FAO and​​ elevated glutamate levels in​​​‌ metastasis, supporting the energy‌ and biosynthetic needs of‌​‌ rapidly proliferating tumour cells.​​ Mapping of these dysregulated​​​‌ metabolites may facilitate the‌ identification of new therapeutic‌​‌ targets for pNETs management.​​ The paper 17 is​​​‌ in collaboration with Dr‌ Arnaud Jannin of Lille‌​‌ CHU and colleagues from​​ Oncolille and Lille CHU.​​​‌

8.5 Axis 1: Long-term‌ outcome of oesophageal atresia‌​‌ in adolescence (TransEAsome): a​​ national French cohort study​​​‌ protocol

Participant: Guillemette Marot‌.

TransEAsome is a‌​‌ national multicentre population-based cohort​​ study recruiting participants from​​​‌ all qualified French centres‌ for OA surgery at‌​‌ birth. The primary objective​​ is to assess the​​​‌ prevalence of gastro-oesophageal reflux‌ disease in adolescence among‌​‌ patients with OA, with​​ several secondary objectives including​​​‌ the identification of risk‌ factors and multiomic profiles‌​‌ from oesophageal biopsies and​​ blood samples collected between​​​‌ 13 and 14 years‌ old, compared with a‌​‌ control group. This comprehensive​​ characterisation of phenotype and​​​‌ omic profiles aims to‌ enhance the understanding of‌​‌ disease evolution in patients​​ with OA and inform​​​‌ tailored care management strategies.‌ This work has been‌​‌ published in BMJ journal​​ 55.

8.6 Axis​​​‌ 1: Analysis of Dependency‌ Levels in Psychiatric Hospitalizations‌​‌ by Psychiatric Nurses: A​​ Retrospective Study in France​​​‌

Participants: Emmanuel Chazard,‌ Alexis Dias, Antoine‌​‌ Lamer.

This study​​ aims to evaluate and​​​‌ describe the dependency levels‌ among adults hospitalised for‌​‌ psychiatric care in France​​ between 2013 and 2022,​​​‌ leveraging medico‐administrative data from‌ the French National Health‌​‌ Data System (SNDS). We​​ conducted a retrospective cohort​​​‌ study using SNDS data,‌ analysing ADL scores collected‌​‌ during psychiatric admissions. Dependency​​ levels were categorized into​​​‌ six levels based on‌ established criteria, with specific‌​‌ focus on physical (e.g.,​​ mobility, continence) and relational​​​‌ dimensions (e.g., behavioural interactions).‌ See for more details‌​‌ 16.

8.7 Axis​​ 1: Comparison of youth​​​‌ psychiatric hospitalizations by type‌ of facility in 2022‌​‌

Participants: Emmanuel Chazard,​​ Antoine Lamer, Antoine​​​‌ Teston.

This work‌ was conducted using the‌​‌ French national insurance database​​ (SNDS). Patients aged less​​​‌ than 18 and discharged‌ from psychiatric hospitals in‌​‌ 2022 were included. Characteristics​​ of stays were described​​​‌ according to the types‌ of facilities: public, private‌​‌ not-for-profit, or private for-profit​​ hospitals. As a result,​​​‌ in 2022, 20,598 patients‌ were hospitalized in psychiatric‌​‌ facilities in France, totaling​​​‌ 46,222 stays. There were​ 76.92% of the stays​‌ in public, 13.39% in​​ non-profit facilities, and 9.70%​​​‌ in for-profit facilities. In​ public and non-profit facilities,​‌ patients were more frequently​​ male, younger, and had​​​‌ shorter lengths of stay​ compared to those in​‌ for-profit facilities. Public facilities​​ take care of the​​​‌ majority of patients. Characteristics​ of patients and stays​‌ differ according to the​​ type of facility. There​​​‌ is a significant common​ population between public and​‌ private sectors. See for​​ more details 23.​​​‌

8.8 Axis 1: Risk​ factors for severe morbidity​‌ and mortality

Participants: Emmanuel​​ Chazard, Antoine Lamer​​​‌, Océane Pécheux.​

Using the French national​‌ discharge summary database, we​​ retrospectively analysed all hospital​​​‌ stays that included POP​ surgery in public- or​‌ private-sector healthcare facilities between​​ January 1st, 2015, and​​​‌ September 1st, 2024. A​ total of 375,705 surgical​‌ procedures were included. In​​ a multivariate analysis, the​​​‌ risk of death was​ higher for laparotomy, transanal​‌ and multiple approaches than​​ for vaginal surgery. The​​​‌ composite outcome rate (death​ or admission to an​‌ intensive care unit during​​ the hospital stay for​​​‌ POP surgery) was 0.57​ % (n = 2,124).​‌ The patient-related and surgery-related​​ risk factors were age,​​​‌ heart failure, respiratory insufficiency,​ diabetes, obesity, and laparoscopic,​‌ laparotomy, transanal and multiple​​ approaches. See for more​​​‌ details 19.

8.9​ Axis 1: Mortality and​‌ fracture risk in children​​ with osteogenesis imperfecta

Participants:​​​‌ Emmanuel Chazard, Antoine​ Lamer, Cécile Philippoteaux​‌.

In this work​​ we used data from​​​‌ the French nationwide hospital​ discharge database (2014-2022). Based​‌ on age at index​​ stay, patients were classified​​​‌ as newborns (< 1​ month), infants (>1 and​‌ <24 months), or children​​ (> 2 years). Immediate​​​‌ mortality (during the index​ stay or after same-day​‌ transfer) and long-term mortality​​ were analyzed along with​​​‌ fracture risk using descriptive​ statistics, Kaplan-Meier estimates, and​‌ Cox models. See for​​ more details 20.​​​‌

8.10 Axis 1: Evaluation​ of a score for​‌ identifying hospital stays that​​ trigger a pharmacist intervention​​​‌

Participants: Emmanuel Chazard,​ Laurine Robert.

The​‌ study was retrospective and​​ observational, conducted in the​​​‌ clinical pharmacy team. The​ patient risk score was​‌ adapted from a Canadian​​ score and was integrated​​​‌ in the clinical decision​ support system (CDSS). For​‌ each hospital stay, the​​ score was calculated at​​​‌ the beginning of hospitalization​ and we retrospectively showed​‌ if a medication review​​ and a PI were​​​‌ conducted. Then, the optimal​ patient risk score threshold​‌ was determined to help​​ pharmacist in optimizing medication​​​‌ review. See for more​ details 21.

8.11​‌ Axis 2: Clustering of​​ recurrent events data

Participants:​​​‌ Genia Babykina, Vincent​ Vandewalle.

A novel​‌ statistical methodology was developed​​ for the analysis of​​​‌ recurrent event data, which​ commonly arise in fields​‌ such as healthcare, epidemiology,​​ and reliability studies. Specifically,​​​‌ we proposed a mixture​ model for recurrent events​‌ that accounts for unobserved​​ heterogeneity through latent classes.​​​‌ This framework enables the​ unsupervised clustering of individuals​‌ into homogeneous subgroups, while​​ modeling recurrent event intensities​​ within each cluster and​​​‌ adjusting for covariates. Model‌ parameters are estimated by‌​‌ maximum likelihood using the​​ EM algorithm. The feasibility​​​‌ and performance of the‌ method were assessed through‌​‌ simulation studies and illustrated​​ using real-world hospital readmission​​​‌ data, providing improved insight‌ into heterogeneous recurrent event‌​‌ dynamics. The associated methodological​​ article is published in​​​‌  15. This methodology‌ was subsequently applied in‌​‌ a large prospective multicentre​​ cohort study aimed at​​​‌ identifying subgroups of older‌ patients at risk of‌​‌ repeated hospital readmissions and​​ death following discharge from​​​‌ acute geriatric units. Using‌ the proposed approach, two‌​‌ distinct patient subgroups with​​ markedly different post-discharge outcomes​​​‌ were identified. Further analyses‌ revealed that only a‌​‌ limited number of clinical​​ characteristics were weakly associated​​​‌ with membership in the‌ high-risk subgroup, highlighting the‌​‌ difficulty of predicting adverse​​ outcomes based solely on​​​‌ standard clinical variables. This‌ applied work demonstrates the‌​‌ practical relevance of the​​ proposed methodological framework and​​​‌ underscores the need for‌ improved predictive tools to‌​‌ better target high-risk older​​ patients. The results are​​​‌ published in  24.‌ This work was carried‌​‌ out in collaboration with​​ V. Vandewalle (Inria Modal)​​​‌ and J. Bravo (University‌ of Cádiz, Spain). The‌​‌ clinical application was conducted​​ in close collaboration with​​​‌ clinicians from CHU Lille,‌ notably F. Visade and‌​‌ J.-B. Beuscart.

8.12 Axis​​ 2: Variable selection with​​​‌ FDR control for high‌ dimensional competing risk data‌​‌

Participants: Guillemette Marot,​​ Genia Babykina, Hugo​​​‌ Cannafarina.

In biomedical‌ research, high-dimensional data are‌​‌ increasingly common, particularly in​​ fields such as genomics,​​​‌ transcriptomics, proteomics, and metabolomics,‌ while clinical outcomes are‌​‌ often subject to competing​​ risks. In this context,​​​‌ the penalized Fine–Gray model‌ is widely used to‌​‌ identify covariates associated with​​ the outcome of interest.​​​‌ However, variable selection remains‌ challenging, as penalized approaches‌​‌ may retain a large​​ number of irrelevant variables,​​​‌ thereby complicating the identification‌ of meaningful biomarkers. To‌​‌ address this issue, we​​ proposed the use of​​​‌ Integrated Path Stability Selection‌ (IPSS) to enhance variable‌​‌ selection in high-dimensional competing​​ risks settings. This approach​​​‌ improves the control of‌ false positives while maintaining‌​‌ the ability to detect​​ truly influential variables and​​​‌ ensuring control of the‌ False Discovery Rate (FDR).‌​‌ Simulation studies demonstrate that​​ IPSS substantially reduces the​​​‌ number of false positives‌ compared with existing methods,‌​‌ while preserving strong performance​​ in terms of true​​​‌ positive detection and predictive‌ accuracy. The practical relevance‌​‌ of the method is​​ further illustrated through a​​​‌ real-world biomedical case study.‌ This work has been‌​‌ disseminated in a conference​​ paper 37. This​​​‌ research was conducted within‌ the framework of a‌​‌ PhD thesis of H.​​ Cannafarina, co-supervised by C.​​​‌ Preda, G. Marot and‌ G. Babykina.

8.13 Axis‌​‌ 2: Longitudinal Data: A​​ Lever for Precision Medicine​​​‌

Participant: Genia Babykina.‌

A keynote presentation was‌​‌ delivered by a member​​ of the team (G.​​​‌ Babykina) during the Journées‌ PEPR Santé Numérique held‌​‌ in Lille (October 2025).​​ The keynote provided a​​​‌ comprehensive overview of statistical‌ methodologies enabling the use‌​‌ of longitudinal data from​​​‌ multiple sources as a​ lever for precision medicine.​‌ The presentation highlighted methodological​​ challenges and recent advances​​​‌ in modeling complex longitudinal​ trajectories to better support​‌ individualized clinical decision-making. The​​ presentation slides are available​​​‌ in  27.

8.14​ Axis 2: Fusion regression​‌ methods with repeated functional​​ data

Participants: Sophie Dabo​​​‌, Cristian Preda,​ Issam Moindjie.

Linear​‌ regression and classification methods​​ with repeated functional data​​​‌ are considered in this​ work. For each statistical​‌ unit in the sample,​​ a real-valued parameter is​​​‌ observed over time under​ different conditions related by​‌ some neighborhood structure (spatial,​​ group, etc.). Two regression​​​‌ methods based on fusion​ penalties are proposed to​‌ consider the dependence induced​​ by this structure. These​​​‌ methods aim to obtain​ parsimonious coefficient regression functions,​‌ by determining if close​​ conditions are associated with​​​‌ common regression coefficient functions.​ The first method is​‌ a generalization to functional​​ data of the variable​​​‌ fusion methodology based on​ the 1-nearest neighbor. The​‌ second one relies on​​ the group fusion lasso​​​‌ penalty which assumes some​ grouping structure of conditions​‌ and allows for homogeneity​​ among the regression coefficient​​​‌ functions within groups. Numerical​ simulations and an application​‌ of electroencephalography data are​​ presented  10.

8.15​​​‌ Axis 2: Principal component​ analysis of multivariate spatial​‌ functional data

Participant: Sophie​​ Dabo.

This paper​​​‌ is devoted to the​ study of dimension reduction​‌ techniques for multivariate spatially​​ indexed functional data and​​​‌ defined on different domains.​ We present a method​‌ called Spatial Multivariate Functional​​ Principal Component Analysis (SMFPCA),​​​‌ which performs principal component​ analysis for multivariate spatial​‌ functional data. In contrast​​ to Multivariate Karhunen-Loève approach​​​‌ for independent data, SMFPCA​ is notably adept at​‌ effectively capturing spatial dependencies​​ among multiple functions. SMFPCA​​​‌ applies spectral functional component​ analysis to multivariate functional​‌ spatial data, focusing on​​ data points arranged on​​​‌ a regular grid. The​ methodological framework and algorithm​‌ of SMFPCA have been​​ developed to tackle the​​​‌ challenges arising from the​ lack of appropriate methods​‌ for managing this type​​ of data. The performance​​​‌ of the proposed method​ has been verified through​‌ finite sample properties using​​ simulated datasets and sea-surface​​​‌ temperature dataset. Additionally, we​ conducted comparative studies of​‌ SMFPCA against some existing​​ methods providing valuable insights​​​‌ into the properties of​ multivariate spatial functional data​‌ within a finite sample.​​ The paper 22 is​​​‌ in collaboration with Idris​ Siahmed, Leila Hamdad (Algeria)​‌ Christelle Judith Agonkoui and​​ Yoba Kande (University of​​​‌ Lille).

8.16 Axis 2:​ Forecasting mortality rates with​‌ functional signatures

Participant: Sophie​​ Dabo.

This study​​​‌ introduces an innovative methodology​ for mortality forecasting, which​‌ integrates signature-based methods within​​ the functional data framework​​​‌ of the Hyndman-Ullah (HU)​ model. This new approach,​‌ termed the Hyndman-Ullah with​​ truncated signatures (HUts) model,​​​‌ aims to enhance the​ accuracy and robustness of​‌ mortality predictions. By utilizing​​ signature regression, the HUts​​​‌ model is able to​ capture complex, nonlinear dependencies​‌ in mortality data which​​ enhances forecasting accuracy across​​​‌ various demographic conditions. The​ model is applied to​‌ mortality data from 12​​ countries, comparing its forecasting​​ performance against variants of​​​‌ the HU models across‌ multiple forecast horizons. Our‌​‌ findings indicate that overall​​ the HUts model not​​​‌ only provides more precise‌ point forecasts but also‌​‌ shows robustness against data​​ irregularities, such as those​​​‌ observed in countries with‌ historical outliers. The integration‌​‌ of signature-based methods enables​​ the HUts model to​​​‌ capture complex patterns in‌ mortality data, making it‌​‌ a powerful tool for​​ actuaries and demographers. Prediction​​​‌ intervals are also constructed‌ with bootstrapping methods. This‌​‌ paper 25 is in​​ collaboration Zhong Jing Yap​​​‌ and Dharini Pathmanathan from‌ Malaya University (Kuala Lumpur,‌​‌ Malaysia).

8.17 Axis 2:​​ Predictive model for running​​​‌ performance using scalar and‌ qualitative functional data.

Participants:‌​‌ François Bassac, Cristian​​ Preda, Cédric Morio​​​‌.

Predicting the performance‌ of runners using the‌​‌ training programs offered by​​ Decathlon is a major​​​‌ challenge for both the‌ company and the runners‌​‌ themselves. Data measuring the​​ runners physical capacities, the​​​‌ training themes followed and‌ the results of training‌​‌ sessions are available at​​ Decathlon. Their longitudinal nature​​​‌ (time dimensionality) and heterogeneity‌ (type, length of observation)‌​‌ require effective pre-processing before​​ they can be used​​​‌ in a predictive model.‌ The extension of principal‌​‌ component analysis and multiple​​ correspondence analysis to scalar​​​‌ and categorical functional data‌ is used to reduce‌​‌ the dimension, visualize the​​ data and fit a​​​‌ linear regression model. The‌ coefficient functions of the‌​‌ regression model allow interpretation​​ and prediction with new​​​‌ data. See for more‌ details 36.

8.18‌​‌ Axis 3: Detection of​​ anomalies in dynamics graphs​​​‌ with application in cybersecurity‌ for OT

Participants: Christophe‌​‌ Biernacki, Cristian Preda​​.

The increasing number​​​‌ of cyber attacks on‌ industrial networks puts human‌​‌ life and economies at​​ risk. Firms usually implement​​​‌ fixed rules rather than‌ anomaly detection to prevent‌​‌ such attacks. However, anomaly​​ detection methods would allow​​​‌ for a more flexible‌ grasp of deviations from‌​‌ normal behaviour. For instance,​​ anomaly detection in graphs​​​‌ modeling industrial networks can‌ sense changes in the‌​‌ behaviour of machines. In​​ this work, we seek​​​‌ to establish whether the‌ number of messages sent‌​‌ from one or more​​ machines to one or​​​‌ more machines is normal‌ or not. To this‌​‌ end, we first model​​ interactions between IP addresses​​​‌ with dynamical graphs. Then,‌ we construct a test‌​‌ statistic based on the​​ likelihood of a graph​​​‌ computed thanks to generative‌ models such as the‌​‌ stochastic block model and​​ kernel estimators. Finally, we​​​‌ evaluate the power of‌ the test in realistic‌​‌ and generic attack scenarios.​​

Clarisse Boinay defended her​​​‌ PhD thesis this year‌  47. She presented‌​‌ this work at Ecole​​ d’été de Saint-Flour (July​​​‌ 2025, Clermont-Ferrand) and a‌ paper for an international‌​‌ journal is in preparation.​​

8.19 Axis 3: Model-based​​​‌ co-clustering: high dimension and‌ estimation challenges

Participant: Christophe‌​‌ Biernacki.

Model-based co-clustering​​ can be seen as​​​‌ a particularly important extension‌ of model-based clustering. It‌​‌ allows for a significant​​ reduction of both the​​​‌ number of rows (individuals)‌ and columns (variables) of‌​‌ a data set in​​​‌ a parsimonious manner, and​ also allows interpretability of​‌ the resulting reduced data​​ set since the meaning​​​‌ of the initial individuals​ and features is preserved.​‌ Moreover, it benefits from​​ the rich statistical theory​​​‌ for both estimation and​ model selection. Many works​‌ have produced new advances​​ on this topic in​​​‌ recent years, and we​ offer a general update​‌ of the related literature.​​ It is the opportunity​​​‌ to advocate two main​ messages, supported by specific​‌ research material: (1) co-clustering​​ requires further research to​​​‌ fix some well-identified estimation​ issues, and (2) co-clustering​‌ is one of the​​ most promising approaches for​​​‌ clustering in the (very)​ high-dimensional setting, which corresponds​‌ to the global trend​​ in modern data sets.​​​‌

A presentation at an​ international online semininar ("DaSSWeb​‌ - Data Science and​​ Statistics Webinar") has been​​​‌ given on this topic​ 48.

It is​‌ a joint work with​​ Julien Jacques from University​​​‌ Lyon 2 and Christine​ Keribin from University Paris-Saclay.​‌

8.20 Axis 3: An​​ EM Stopping Rule for​​​‌ Avoiding Degeneracy in Gaussian-Based​ Clustering with Missing Data​‌

Participant: Christophe Biernacki.​​

Missing data frequency increases​​​‌ with the growing size​ of multivariate modern datasets.​‌ In Gaussian model-based clustering,​​ the EM algorithm easily​​​‌ takes into account such​ data but the degeneracy​‌ problem is dramatically aggravated​​ during the EM runs:​​​‌ parameter degeneracy is quite​ slow and also more​‌ frequent than with complete​​ data. Consequently, parameter degenerated​​​‌ solutions may be confused​ with valuable parameter solutions​‌ and, in addition, computing​​ time may be wasted​​​‌ through wrong runs. In​ this work, a simple​‌ and low informational condition​​ on the latent partition​​​‌ allows to propose a​ very simple partition-based stopping​‌ rule of EM which​​ shows good behaviour on​​​‌ numerical experiments.

This work​ has been presented to​‌ Journée PS-MAASAI 2025 28​​ and an article to​​​‌ an international journal is​ in preparation.

It is​‌ a joint work with​​ Vincent Vandewalle from University​​​‌ Côte d’Azur.

8.21 Axis​ 3: Probabilistic estimation of​‌ fatigue damage based on​​ binned data from passive​​​‌ sensors

Participants: Mustapha Atmani​, Christophe Biernacki.​‌

The monitoring of the​​ structural integrity of civil​​​‌ engineering structures, particularly bridges​ subjected to variable loads​‌ due to traffic, is​​ crucial for safety and​​​‌ predictive maintenance. In the​ case of conventional strain​‌ gauges, continuous amplitudes from​​ active sensors are measured​​​‌ but such sensors lack​ of robustness over time​‌ and need to be​​ powered. To overcome these​​​‌ constraints, the company SilMach​ has developed a passive​‌ mechanical sensor that requires​​ no power supply and​​​‌ is designed to detect​ strain amplitudes at the​‌ installation location. This sensor​​ provides aggregated data in​​​‌ the form of counts​ (binned data): it indicates​‌ the number of cycles​​ whose amplitude falls within​​​‌ predefined, often wide, intervals,​ without reproducing the exact​‌ values of the fluctuations​​ experienced over time.

This​​​‌ study proposes an estimation​ methodology using an Expectation–Maximization​‌ (EM) algorithm adapted to​​ binned data, in order​​​‌ to efficiently identify the​ parameters of the chosen​‌ distribution (or mixture) based​​ solely on the interval​​ counts from the passive​​​‌ sensor. Once the parameters‌ are estimated, the damage‌​‌ is calculated via its​​ integral expression, and its​​​‌ uncertainty is then quantified.‌ Looking ahead, we propose‌​‌ to study the effect​​ of the counting interval​​​‌ bounds (size, position) on‌ the estimation accuracy and‌​‌ the width of the​​ confidence intervals, in order​​​‌ to jointly optimize the‌ design of passive sensors‌​‌ and the aggregation strategy.​​

This work has been​​​‌ submitted to EWSHM 2026‌ (12th European Workshop on‌​‌ Structural Health Monitoring).

It​​ is a joint with​​​‌ André Orcesi from Cerema.‌ See for more details‌​‌ 14.

9 Bilateral​​ contracts and grants with​​​‌ industry

9.1 Bilateral Grants‌ with Industry

9.1.1 Withings‌​‌

Participants: Christophe Biernacki,​​ Violaine Courrier, Cristian​​​‌ Preda.

Withings is‌ a French consumer electronics‌​‌ company which designs and​​ innovates in connected devices,​​​‌ such as the first‌ Wi-Fi scale on the‌​‌ market (introduced in 2009),​​ an FDA-cleared blood pressure​​​‌ monitor, a smart sleep‌ system, and a line‌​‌ of automatic activity tracking​​ watches. It also provides​​​‌ B2B services for healthcare‌ providers and researchers.

The‌​‌ PhD thesis of Violaine​​ Courrier begun on September​​​‌ 2023 on the topic‌ of analysis of multivariate,‌​‌ sparse longitudinal data, with​​ mixed co-variates, from connected​​​‌ medical objects.

9.1.2 Seckiot‌

Participants: Christophe Biernacki,‌​‌ Clarisse Boinay, Cristian​​ Preda.

Seckiot is​​​‌ an editor of cybersecurity‌ software to protect industrial‌​‌ systems & IoT. From​​ December 2021, Clarisse Boinay​​​‌ begun her Cifre PhD‌ thesis (with AID, Agence‌​‌ de l'Innovation de Défense)​​ with Seckiot on the​​​‌ topic of "anomaly detection‌ and change point detection‌​‌ in contextual dynamic asynchronous​​ graphs with applications in​​​‌ OT cybersecurity" under the‌ co-supervision of Thomas Anglade‌​‌ (Seckiot), Christophe Biernacki and​​ Cristian Preda .

Clarisse​​​‌ Boinay defended her PhD‌ thesis on December 16‌​‌ 2025 47.

9.1.3​​ SilMach

Participants: Mustapha Atmani​​​‌, Christophe Biernacki.‌

Through their joint ROAD-AI‌​‌ project, Inria and Cerema​​ are jointly studying digital​​​‌ tools allowing these phenomena‌ to be modeled using‌​‌ structural instrumentation. This initiative​​ is complemented and reinforced​​​‌ by the SIRCAPASS project‌ coordinated by the company‌​‌ SilMach and which aims​​ to use new passive​​​‌ MEMS (Micro Electro-Mechanical Systems)‌ sensor technology for this‌​‌ instrumentation.

In this context,​​ Mustapha Atmani began his​​​‌ PhD thesis on December‌ 1 2024 entitled “Statistical‌​‌ processing of “low data”​​ from passive sensors: application​​​‌ to the monitoring of‌ engineering structures”. The co-supervision‌​‌ is ensured by André​​ Orcesi from Cerema.

9.1.4​​​‌ Décathlon

Participants: Francois Bassac‌, Cristian Preda.‌​‌

Decathlon is a brand​​ specializing in the large​​​‌ distribution of sports equipment‌ and materials. From September‌​‌ 2022, François Bassac begun​​ his PhD thesis within​​​‌ Inria-Decathlon partnership on the‌ topic of predicting performances‌​‌ and injuries with training​​ data under the supervision​​​‌ of Cristian Preda.

9.1.5‌ Horiba

Participants: Komlan Noukpoape‌​‌, Sophie Dabo,​​ Cristian Preda.

Horiba​​​‌ is a company specialized‌ on optical spectrometry. Datavers‌​‌ is working with this​​ compagny and CENTRALE Lille​​​‌ on Raman spectroscopy and‌ Artificial Intelligence dedicated to‌​‌ the synthesis in chemistry.​​​‌

10 Partnerships and cooperations​

10.1 International initiatives

10.1.1​‌ Participation in other International​​ Programs

IRN AFRIMath

Participant:​​​‌ Sophie Dabo.

  • Partner​ Institution(s): CNRS
    • AFRIMath is​‌ an International Research Network​​ of the CNRS bringing​​​‌ together mathematicians located mainly​ in sub-Saharan Africa and​‌ in France.
  • Date/Duration: 2021-2028​​
  • Additionnal info/keywords: Numerical Analysis,​​​‌ Probability and Statistics
IRL​ CRM-CNRS

Participant: Sophie Dabo​‌.

  • Partner Institution(s): CNRS,​​ University of Montreal
    • IRL​​​‌ CRM-CNRS is a joint​ International Research Laboratory between​‌ CNRS and University of​​ Montreal, it is based​​​‌ in Montreal. Sophie Dabo​ has been in CNRS​‌ delegation in this IRL​​ from September 2024 to​​​‌ February 2025.
  • Date/Duration: 2024-2025​
PHC Tournesol

Participant: Sophie​‌ Dabo.

  • Partner Institution(s):​​ University of Lille, ULB​​​‌ (Brussels)
    • Sophie Dabo shares​ the PI of the​‌ project with Pr Thomas​​ Verdebout
  • Date/Duration: 2024-2025
  • Additionnal​​​‌ info/keywords: Functional Data, Statistical​ test, PCA

10.1.2 Visits​‌ to international teams

Sabbatical​​ programme

Participant: Sophie Dabo​​​‌.

Sophie Dabo was​ on CNRS sabbatical program​‌ at Université of Montreal,​​ IRL CRM-CNRS, from September​​​‌ 2024 to February 2025.​ She hold a CNRS​‌ Chair JRP FANE-MATH-PE with​​ 3 months sabbatical program​​​‌ each year (2024-2027) at​ AIMS South Africa, AIMS​‌ Senegal and North West​​ Univeristy in South Africa.​​​‌

10.1.3 Other european programs/initiatives​

Participant: Sophie Dabo.​‌

Sophie Dabo is part​​ of the Mathematics for​​​‌ Humanity Scientific team of​ ICMS of London Mathematical​‌ Society

10.2 National initiatives​​

10.2.1 CDP - Cross​​​‌ Disciplinary Project

Participants: Guillemette​ Marot, Genia Babykina​‌, Cristian Preda,​​ Emmanuel Chazard.

  • Consortium:​​​‌
    Inria, Unievrsity of Lille,​ Inserm, CHU Lille.
  • Coordinators:​‌
    François Pattou (INSERM 1190​​ (EGID)) and Guillemette Marot​​​‌ (Inria, Datavers)
  • Project title:​
    Molecular signatures of esophageal​‌ atresia: towards the identification​​ of the molecular causes​​​‌ of the different forms​ of esophageal atresia and​‌ prenatal diagnosis
  • Objective :​​
    Characterize inter-individual heterogeneity in​​​‌ weight-loss trajectories ; identify​ the biological mechanisms underlying​‌ variability in response to​​ interventions ; link weight-loss​​​‌ trajectories to major long-term​ clinical outcomes ; translate​‌ these insights into predictive,​​ clinically actionable decision-support tools.​​​‌
  • Funding:
    3.2M euros.
  • Duration​
    2026-2029.

10.2.2 ”Inria Challenge”​‌ ROAD-AI with Cerema

Participant:​​ Christophe Biernacki.

Cerema​​​‌ (Centre d'études et d'expertise​ sur les risques, l'environnement,​‌ la mobilité et l'aménagement​​ - Centre for Studies​​​‌ on Risks, the Environment,​ Mobility and Urban Planning)​‌ is a public institution​​ dedicated to supporting public​​​‌ policies, under the dual​ supervision of the ministry​‌ for ecological transition and​​ the ministry for regional​​​‌ cohesion and local authority​ relations. Datavers is involved​‌ in the ROAD-AI (Routes​​ et Ouvrages d'Art Diversiformes,​​​‌ Augmentés & Intégrés) “Inria​ Challenge”, with six other​‌ Inria teams (ACENTAURI, COATI,​​ FUN, I4S,STATIFY, TITANE) including​​​‌ statistics, robotics, telecomunication, sensors​ network and 3D modeling.​‌ This four year project​​ (starting in 2021) aims​​​‌ at having more sustainable,​ safer and more resilient​‌ transport infrastructures.

10.2.3 ANR​​

Oesomics

  • Participants:
    Guillemette Marot​​​‌ .
  • Type:
    ANR AAP​ Recherche translationnelle en santé​‌
  • Acronym
    : Oesomics
  • Project​​ title:
    Molecular signatures of​​​‌ esophageal atresia: towards the​ identification of the molecular​‌ causes of the different​​ forms of esophageal atresia​​ and prenatal diagnosis
  • Coordinator:​​​‌
    Frédéric Gottrand (Univ. Lille,‌ CHU Lille, Infinite)
  • Duration:‌​‌
    36 months (2022–2027)
  • Funding:​​
    233k euros
  • Partners:
    CHU​​​‌ Lille, PRISM, PLBS-Goal, PLBS-bilille‌
  • Contribution:
    Statistical analysis of‌​‌ multi-omics (mainly transcriptomics and​​ proteomics) data

TransEAsome

  • Participants:​​​‌
    Guillemette Marot .
  • Type:‌
    AMI Maladies rares
  • Acronym‌​‌
    : TransEAsome
  • Project title:​​
    Long term outcome of​​​‌ esophageal atresia: transomics profiles‌ in adolescence
  • Coordinator:
    Frédéric‌​‌ Gottrand (Univ. Lille, CHU​​ Lille, Infinite)
  • Duration:
    72​​​‌ months (2022–2027)
  • Funding:
    1.4M‌ euros
  • Partners:
    CHU Lille,‌​‌ Univ. Lille, Inserm NO,​​ Inserm ADR - GO,​​​‌ CRACMO, FIMATHO
  • Contribution:
    Statistical‌ analysis of multi-omics (mainly‌​‌ transcriptomics and proteomics) data​​

10.2.4 FHU

A FHU​​​‌ is a federative project‌ and a label necessary‌​‌ to postulate for a​​ RHU.

  • Acronym:
    PRECISE
  • Project​​​‌ title:
    PREcision health in‌ Complex Immune-mediated inflammatory diseaSEs‌​‌
  • Coordinator:
    David Launay (U.​​ Lille, CHU Lille)
  • Duration:​​​‌
    5 years (2021–2025)
  • Partners:‌
    CHU Lille, CHU Amiens,‌​‌ CHU Rouen, CHU Caen,​​ Université de Lille, Université​​​‌ de Picardie, Université de‌ Rouen, Inserm
  • Contribution:
    The‌​‌ objective of FHU PRECISE​​ is to structure care,​​​‌ research and teaching relative‌ to care of patients‌​‌ who suffer from complex​​ IMID (Immune mediated inflammatory​​​‌ diseases) with an interdisciplinary‌ approach. Guillemette Marot is‌​‌ the co-head with Vincent​​ Sobanski of the WP2​​​‌ workpackage, which aims at‌ creating a «virtual patient»‌​‌ and cluster patients based​​ on their clinical and​​​‌ omic profiles. In this‌ WP, she is involved‌​‌ both in the analysis​​ task with Bilille platform​​​‌ and in the research‌ task led by Christophe‌​‌ Biernacki , involving Datavers​​ team. This research task​​​‌ aims at combining complex‌ data and integrating temporal‌​‌ structure in order to​​ identify patient’s care pathways.​​​‌ Guillemette Marot is also‌ participating with Bilille platform‌​‌ in WP3 for the​​ research of a molecular​​​‌ signature predictive of the‌ treatment response (resistance and‌​‌ complication).

Participants: Christophe Biernacki​​, Sophie Dabo,​​​‌ Genia Babykina, Cristian‌ Preda.

11 Dissemination‌​‌

11.1 Promoting scientific activities​​

11.1.1 Scientific events: organisation​​​‌

Sophie Dabo organized:

  1. Four‌ Research schools:

    African‌​‌ Mathematical School on Quantitative​​ Biology: Applications in Epidemiology,​​​‌ Ecology and Cancer,‌ 19-27 Feb 2024, NWU,‌​‌ SA

    3MC-PIMS-ICMS school​​ on Multiscale Modeling: Infectious​​​‌ Diseases, Cancer and Treatments‌, 2 - 13‌​‌ Dec 2024, Edinburgh, UK​​

    CIMPA School on​​​‌ Mathematical and Statistical Modeling‌ in Oncology, 3‌​‌ - 14 Feb 2025,​​ North-West University, South Africa​​​‌

    3MC-PIMS-IDMS-ICMS school on‌ Quantitative molecular and cellular‌​‌ biology, 16-27 Jun​​ 2025, University of Manitoba,​​​‌ Canada

  2. 4 conferences:

    ∘‌ International Conference on Mathematical‌​‌ Modeling in Biology and​​ Life Sciences, 28​​​‌ Feb-1 Mar 2024, North-West‌ University, SA

    Colloque‌​‌ Francophone International de Statistique,​​ Probabilités et Interactions,​​​‌ 8-10 Jul 2025, AIMS-Senegal‌ & Université Gaston Berger‌​‌

    Conférence Internationale Annuelle:​​ Femmes Mathématiques et Interactions​​​‌, 12 Jun 2025,‌ Association des Femmes Scientifiques‌​‌ Africaines du Québec, Canada​​

    Women in SAGE-Tunisia​​​‌, 30 Sep -‌ 4 Oct 2025, Université‌​‌ de Tunis

    African​​ Women in Mathematics: Challenging​​​‌ Questions, 6-8 Oct‌ 2025, Tunisia

  3. 1 Exhibition‌​‌ (to promote mathematics, inspire​​​‌ young people, and highlight​ African heritage), 8-10 May​‌ 2025, Dakar-Senegal
General chair,​​ scientific chair
  • Sophie was​​​‌ the general chair of​ African Mathematical Biology Society​‌ conference, December 5th 2025.​​
Member of the conference​​​‌ program committees
  • Christophe Biernacki​ was a program committee​‌ member of the 29th​​ International Conference on Knowledge-Based​​​‌ and Intelligent Information &​ Engineering Systems (KES'25​‌) for the session​​ on "Detection of Complex​​​‌ Attacks"
  • Sophie was member​ of the scientific program​‌ committe of EcoSta2025 conference,​​ 21-23 August 2025.

11.1.2​​​‌ Journal

Member of the​ editorial boards
  • Christophe Biernacki​‌ is an Associate Editor​​ for the international journal​​​‌ Advances in Data Analysis​ and Classification (ADAC​‌).
  • Sophie Dabo is​​ an Associate Editor of​​​‌ Journal of: Statistical Modeling​ and Analytics, Journal of​‌ Nonparametrics Statistics, Afrika Mathematika.​​
  • Cristian Preda is an​​​‌ Associate Editor of Methodology​ and Computing in Applied​‌ Probability.
Reviewer - reviewing​​ activities
  • Christophe Biernacki acted​​​‌ as a reviewer for​ different journals (Journal of​‌ Classification, Methodology and Computing​​ in Applied Probability, Communications​​​‌ in Statistics - Theory​ and Methods, Computational Statistics,​‌ Statistics and Computing, Austrian​​ Journal of Statistics) and​​​‌ two conferences (AISTATS 2025,​ CAp 2025).
  • Genia Babykina​‌ acted as a reviewer​​ for Brazilian Journal of​​​‌ Biometrics.
  • Sophie Dabo acted​ as reviewer for different​‌ journals (JRSS B, C,​​ JASA, Electronic Journal of​​​‌ Statistics, Bernoulli, Journal of​ Nonparametric Statistics, ADAC,...) and​‌ several conferences worldwide.
  • Cristian​​ Preda acted as a​​​‌ reviewer for BMJ, CSDA,​ JMVA journals.

11.1.3 Invited​‌ talks

  • Genia Babykina gave​​ an invited talk at​​​‌ Journées PEPR Santé Numérique​ held in France 27​‌.
  • Sophie Dabo has​​ been invited to several​​​‌ conferences and seminars:
    • Seminar,​ North West University, South​‌ Africa, 13th January 2025.​​
    • Journées de Statistique et​​​‌ Optimisation, Perpignan, 2th-4th April​ 2025.
    • Conference Afrimath, Abidjan,​‌ Ivory Coast, 31st March​​ 2025.
    • Conference SAMPTA 2025,​​​‌ July 28th, August 1st,​ Vienna, Austria
    • First virtual​‌ symposium of the African​​ Society for Biomathematics, 26-27th​​​‌ June, 2025, Virtual.
    • ICMSEM​ 2025, Lille, July, 24-25,​‌ 2025.
    • Non-stationarity and Statistics​​ for EEG, Paris, 4th​​​‌ September 2025,
    • SSADS25, Oujda,​ Maroc, 15th October 2025,​‌
    • JSMDS 2025, Tunisia, November​​ 13th-15th, 2025
    • SASA 2025,​​​‌ Riverside, South Africa, November​ 26th, 2025.
    • Mathematics in​‌ Africa, London Mathematical Society,​​ UK/Africa Partnerships, May 14th,​​​‌ Edinburgh.
    • CIMAD 2025, November​ 26th-27th, 2025, Ndjamena, Tchad.​‌
    • Seminar IBENS, ENS Paris,​​ 8th December 2025
  • Cristian​​​‌ Preda has been invited​ for a plenary talk​‌ in the StatMod Conference,​​ September 2025, University of​​​‌ Piraeus, Greece.

11.1.4 Leadership​ within the scientific community​‌

  • Christophe Biernacki was elected​​ as the President of​​​‌ the SFdS (Société Française​ de Statistique) since July​‌ 2024, which is the​​ French society specialized in​​​‌ Statistics, whose mission is​ to promote the use​‌ of statistics and its​​ understanding and to foster​​​‌ its methodological developments.
  • Sophie​ Dabo is :
    • vice-president​‌ of CIMPA since​​ January 2025.
    • member of​​​‌ Diversity committee of IMU​ (International Mathematical Union), 2020-2025,​‌
    • member of Scientific Committee,​​ ICMS (International Centre for​​​‌ Mathematical Sciences) Mathematics for​ Humanity, 2023-2025.
    • Member​‌ of the scientific committee​​ of Ibni Oumar Mahamat​​ Saleh (Association pour la​​​‌ promotion scientifique de l'Afrique)‌ Award, 2019-2025.
    • Associate mmeber‌​‌ of European-Mathematical Society committee​​ of Developping countries, since​​​‌ 2022 after chairing the‌ committee.

11.1.5 Scientific expertise‌​‌

  • Christophe Biernacki was a​​ member of the ANR​​​‌ scientific evaluation committee "AI‌ for scientific discovery".
  • Sophie‌​‌ Dabo was a member​​ of the ANR scientific​​​‌ evaluation committee 40 "Mathematics",‌ INSERM CSS7, INRAE CSS‌​‌ Misti

11.1.6 Research administration​​

  • Since January 2020, Christophe​​​‌ Biernacki acts as a‌ deputy scientific director of‌​‌ Inria at the national​​ level in charge of​​​‌ the domain “Applied mathematics,‌ computation and simulation".
  • Since‌​‌ January 2024, Sophie Dabo​​ acts as chair of​​​‌ the CNRS Network FANE‌ MATH PE.

11.2 Teaching‌​‌

  • Sophie Dabo as professor​​ at University of Lille,​​​‌ teaches Datamining (Master 1),‌ Spatial Statistics (Master 2),‌​‌ Data Analysis (Master 1)​​ and Probability (Bachelor).
  • Cristian​​​‌ Preda as a professor‌ at University of Lille‌​‌ teaches statistics and probability​​ (192 hours per year)​​​‌ for engineers of Polytech'Lille.‌
  • Genia Babykina as an‌​‌ associate professor of University​​ of Lille teaches statistics​​​‌ at ILIS institute for‌ 192 hours per year.‌​‌
  • Emmanuel Chazard as a​​ professor at University of​​​‌ Lille teaches statistics and‌ probability at students of‌​‌ the Faculty of Medicine​​

11.3 Supervision

  • Clarisse Boinay​​​‌ works on anomaly detection‌ and change point detection‌​‌ in contextual dynamic asynchronous​​ graphs with applications in​​​‌ OT cybersecurity, under the‌ supervision of Christophe Biernacki‌​‌ and Cristian Preda. She​​ has defended on PhD​​​‌ thesis on December 16‌ 2025 47.
  • Violaine‌​‌ Courrier works on the​​ analysis of multivariate, sparse​​​‌ longitudinal data, with mixed‌ co variates, from connected‌​‌ medical objects. Started in​​ September 2023 under the​​​‌ supervision of Christophe Biernacki‌ and Cristian Preda.
  • Mustapha‌​‌ Atmani began his PhD​​ thesis on December 1​​​‌ 2024 entitled “Statistical processing‌ of “low data” from‌​‌ passive sensors: application to​​ the monitoring of engineering​​​‌ structures”. The co-supervision is‌ ensured by André Orcesi‌​‌ from Cerema.
  • Hugo Cannafarina​​ (PhD) worked on variable​​​‌ selection in the context‌ of high dimensional data‌​‌ and competing risks outcome​​ from November 2024 to​​​‌ November 2025 under co-supervision‌ of C. Preda, G.‌​‌ Marot and G. Babykina.​​ In November 2025, Hugo​​​‌ decided to not continue‌ the PhD project.
  • Cécile‌​‌ Verrier works on Mathematical​​ Oncology to model senescence​​​‌ 51. She started‌ her PhD on February‌​‌ 2025 and is supervsied​​ by Sophie Dabo, Alexandre​​​‌ Poulain et Vanessa Dehault‌ (Oncolille, Canther) .
  • Clara‌​‌ Dubois works on, she​​ works on Spectroscopie Raman​​​‌ and AI for chemistry‌ synthesis. Her Cifre PhD‌​‌ started in 2023 with​​ Horiba and Centrale Lille.​​​‌ She is supervised by‌ Sophie Dabo, Christophe Dujardin‌​‌ (Centrale Lille) and Sebastien​​ Legendre (Hiriba)
  • François Bassac​​​‌ works on functional data‌ for predicting sport performance‌​‌ of runners. His PhD​​ started in 2022 under​​​‌ the supervison of Cristian‌ Preda.

11.4 Juries

  • Christophe‌​‌ Biernacki acted as a​​ reviewer for 2 PhD​​​‌ theses and was president‌ of the jury for‌​‌ two other PhD theses.​​
  • Sophie Dabo acted as​​​‌ a reviewer for 6‌ PhD theses and was‌​‌ president of the jury​​​‌ for one of these​ PhD theses.

11.4.1 Educational​‌ and pedagogical outreach

  • Sophie​​ Dabo is involved in​​​‌ severak activities worlwide on​ training new generation of​‌ scientists from the global​​ South via CIMPA, EMS-CDC,​​​‌ ICMS, IMM-ICTP programs

11.5​ Popularization

11.5.1 Specific official​‌ responsibilities in science outreach​​ structures

11.5.2​​ Productions (articles, videos, podcasts,​​​‌ serious games, ...)

  • Christophe​ Biernacki gave several talks​‌ related to its implication​​ within the ROAD-AI project​​​‌ 52, within the​ ORAP organization 53,​‌ within scientific direction at​​ Inria 54 and within​​​‌ SFdS 26. He​ written also an article​‌ related to the ROAD-AI​​ project 14.

11.5.3​​​‌ Participation in Live events​

Sophie Dabo has participated​‌ at the Radio France​​ International live event "Around​​​‌ the Question, the magazine​ for all the sciences:​‌ How to do mathematics​​ differently and on every​​​‌ continent", 10th October 2025.​

12 Scientific production

12.1​‌ Major publications

12.2 Publications of​​ the year

International journals​​​‌

Invited conferences​​​‌

  • 26 inproceedings K.Karteek‌ Alahari and C.Christophe‌​‌ Biernacki. AI as​​ a Scientific Pilot? JDS​​​‌ 2025 - 56e Journées‌ de Statistique Marseille, France‌​‌ June 2025 HAL back​​ to text
  • 27 inproceedings​​​‌G.Génia Babykina.‌ Longitudinal Data: A Lever‌​‌ for Precision Medicine.​​Journées Annuelles du PEPR​​​‌ Santé NumériqueLille, France‌October 2025HALback‌​‌ to textback to​​ text
  • 28 inproceedingsC.​​​‌Christophe Biernacki and V.‌Vincent Vandewalle. An‌​‌ EM Stopping Rule for​​ Avoiding Degeneracy in Gaussian-Based​​​‌ Clustering withMissing Data..‌Journée PS-MAASAI 2025 -‌​‌ Séminaire de l'équipe de​​ Probabilités et Statistique du​​​‌ LJADNice, FranceApril‌ 2025HALback to‌​‌ text
  • 29 inproceedingsV.​​Violaine Courrier and C.​​​‌Christophe Biernacki. Guided‌ Clustering Variational Autoencoder (GCVA):‌​‌ A Predictive Clustering Approach​​ with a VAE.​​​‌15-th Scientific Meeting Classification‌ and Data Analysis Group‌​‌ (CLADAG)Naples, ItalySeptember​​ 2025HALDOIback​​​‌ to text

International peer-reviewed‌ conferences

  • 30 inproceedingsC.‌​‌Christelle Agonkoui, H.​​Hua Cao, S.​​​‌Sophie Dabo-Niang, G.‌Gaurav Dhar, N.‌​‌Nicolas Jankovsky and K.​​Karin Sahmer. A​​​‌ signature-based model for learning‌ lung cancer stage from‌​‌ multiplex immunofluorescence image data​​ with spatial summary functions​​​‌.2025 International Conference‌ on Artificial Intelligence, Computer,‌​‌ Data Sciences and Applications​​ (ACDSA)Antalya, TurkeyIEEE​​​‌August 2025, 1-6‌HALDOI
  • 31 inproceedings‌​‌V.Violaine Courrier and​​ C.Christophe Biernacki.​​​‌ Clustering guidé dans des‌ autoencodeurs variationnels.JdS'2025‌​‌ - 56es Journées de​​ StatistiqueMarseille, FranceJune​​​‌ 2025HALback to‌ text
  • 32 inproceedingsS.‌​‌Sophie Dabo-Niang. Bridging​​ Nonparametric Statistics andSpatial Data​​​‌ Modeling.JSMDS 2025‌Hammamet, TunisiaNovember 2025‌​‌HAL
  • 33 inproceedingsS.​​Sophie Dabo-Niang. Heteroskedastic​​​‌ Choice-Based Sampling Spatial Choice‌ Models: Application to cancer‌​‌ modeling.ICMSEM 2025​​Lille, FranceJune 2025​​​‌HAL
  • 34 inproceedingsS.‌Sophie Dabo-Niang. Learning‌​‌ of Non-stationarity multivariate functional​​ processes:Application to EEG.​​​‌Non-stationarity and Statistics for‌ EEGNanterre, FranceSeptember‌​‌ 2025HAL
  • 35 inproceedings​​C.Courrier Violaine and​​​‌ B.Biernacki Christophe.‌ Guided Clustering Variational Autoencoder‌​‌.JdS'2025- 56es Journées​​ de StatistiqueMarseille, France​​​‌June 2025HALback‌ to text

Conferences without‌​‌ proceedings

  • 36 inproceedingsF.​​François Bassac, C.​​​‌Cristian Preda and C.‌Cédric Morio. Predictive‌​‌ model for running performance​​ using scalar and qualitative​​​‌ functional data.JDS‌ 2025 - 56iemes Journées‌​‌ de la Statistique de​​ la SFdSMarseille, France​​​‌June 2025HALback‌ to text
  • 37 inproceedings‌​‌H.Hugo Cannafarina,​​ G.Guillemette Marot and​​​‌ G.Génia Babykina.‌ Variable selection with FDR‌​‌ control for high dimensional​​ competing risk data.​​​‌JdS2025 : 56es Journées‌ de Statistique de la‌​‌ SFdSMarseille, FranceJune​​​‌ 2025HALback to​ text
  • 38 inproceedingsS.​‌Sophie Dabo-Niang. Dimension​​ reduction properties and supervised​​​‌ learning of complex Functional​ Data.CIMAD 2025​‌ - Conférence Internationale sur​​ les Mathématiques et leurs​​​‌ Applications dans les pays​ en voie de Développement​‌Ndjamena, ChadNovember 2025​​HAL
  • 39 inproceedingsS.​​​‌Sophie Dabo-Niang. Dimension​ reduction properties and supervised​‌ learning ofcomplex Functional Data​​.SAMPTA 2025Vienne,​​​‌ AustriaJuly 2025HAL​
  • 40 inproceedingsS.Sophie​‌ Dabo-Niang. Functional Data​​ Analysis : A PCA​​​‌ Approach for Learning Models​ and Applications to Biology​‌.First virtual symposium​​ of the African Society​​​‌ for BiomathematicOnline, France​June 2025HAL
  • 41​‌ inproceedingsS.Sophie Dabo-Niang​​. Mathematics in Africa​​​‌.ICMS UK/Africa Partnerships​Edinbourg, Ecosse, United Kingdom​‌May 2025HAL
  • 42​​ inproceedingsS.Sophie Dabo-Niang​​​‌. Principal Component Analysis​ for Dependent FunctionalData: Incorporating​‌ Spatial and Temporal Structures​​.AFRIMath 2025 -​​​‌ Afrimath ConferenceAbidjan, Côte​ d’IvoireMarch 2025HAL​‌

Scientific books

  • 43 book​​E.Emmanuel Chazard.​​​‌ Biostatistiques pour les études​ de santé: Comprendre les​‌ biostatistiques pour le 1er​​ et le 2ème cycle​​​‌ des études de santé​ (médecine, odontologie, maïeutique, orthophonie,​‌ kinésithérapie, soins infirmiers, etc.).​​.Editions ChazardNovember​​​‌ 2025HAL
  • 44 book​E.Emmanuel Chazard.​‌ Objectif Thèse niveau 1​​ : Poussin pressé: Votre​​​‌ mémoire académique quantitatif en​ Santé, de A à​‌ Z : mémoire de​​ fin d'études, M1, M2,​​​‌ thèse d’exercice, thèse d’université.​ Version percutante, simplifiée et​‌ condensée du livre "Objectif​​ Thèse niveau 2 :​​​‌ Poulet consciencieux"..Editions​ ChazardApril 2025HAL​‌
  • 45 bookE.Emmanuel​​ Chazard. Objectif Thèse​​​‌ niveau 2 : Poulet​ consciencieux: Votre mémoire académique​‌ quantitatif en Santé, de​​ A à Z :​​​‌ mémoire de fin d'études,​ M1, M2, thèse d’exercice,​‌ thèse d’université..Editions​​ ChazardApril 2025HAL​​​‌

Edition (books, proceedings, special​ issue of a journal)​‌

  • 46 proceedingsE.Emmanuel​​ Chazard and N.Nicolas​​​‌ Jay, eds. Congrès​ ÉMOIS – Nancy, 20​‌ et 21 mars 2025​​.EMOIS73Nancy,​​​‌ FranceElsevier Masson SAS​March 2025, 202968​‌HALDOI

Doctoral dissertations​​ and habilitation theses

Other scientific publications​

Scientific​​​‌ popularization

  • 52 inproceedingsC.‌Christophe Biernacki. IA‌​‌ : recherche et applications​​ en prise directe.​​​‌Conférences Techniques TerritorialesMetz,‌ France2025HALback‌​‌ to text
  • 53 inproceedings​​C.Christophe Biernacki.​​​‌ IA générative pour le‌ HPC :une nouvelle rupture‌​‌ ?: Conclusion du forum​​ 55.55e forum​​​‌ ORAPParis, FranceNovember‌ 2025HALback to‌​‌ text
  • 54 inproceedingsC.​​Christophe Biernacki. frugalité@IA@CS​​​‌.Journée thématique frugalité‌ pour le calcul scientifique‌​‌Bordeaux, FranceOctober 2025​​HALback to text​​​‌

12.3 Cited publications

  • 55‌ articleM.Mélanie Leroy‌​‌. Long-term outcome of​​ oesophageal atresia in adolescence​​​‌ (TransEAsome): a national French‌ cohort study protocol.‌​‌BMJ Open151​​2025back to text​​​‌