SEMAGRAMME

SEMAGRAMME - 2025

2025Activity‌‌ reportTeamSEMAGRAMME

RNSR: 201120979K

Research center Inria‌ Centre at Université de‌ Lorraine
In partnership with:‌‌CNRS, Université de Lorraine
Team name: Semantic Analysis‌ of Natural Language
In‌ collaboration with:Laboratoire lorrain‌‌ de recherche en informatique et ses applications (LORIA)‌

Creation of the Team:‌ 2013 July 01

Each‌‌ year, Inria research teams publish an Activity Report‌ presenting their work and‌ results over the reporting‌‌ period. These reports follow a common structure, with‌ some optional sections depending‌ on the specific team.‌‌ They typically begin by‌ outlining the overall objectives and research programme, including‌ the main research themes, goals, and methodological approaches.‌ They also describe the application domains targeted by‌ the team, highlighting the scientific or societal contexts‌ in which their work is situated.

The reports‌ then present the highlights of the year, covering‌ major scientific achievements, software developments, or teaching contributions.‌ When relevant, they include sections on software, platforms,‌ and open data, detailing the tools developed and‌ how they are shared. A substantial part is‌ dedicated to new results, where scientific contributions are‌ described in detail, often with subsections specifying participants‌ and associated keywords.

Finally, the Activity Report addresses‌ funding, contracts, partnerships, and collaborations at various levels,‌ from industrial agreements to international cooperations. It also‌ covers dissemination and teaching activities, such as participation‌ in scientific events, outreach, and supervision. The document‌ concludes with a presentation of scientific production, including‌ major publications and those produced during the year.‌

Keywords

Computer Science and Digital Science

A5.8. Natural‌ language processing
A7.2. Logic in Computer Science
A9.4.‌ Natural language processing

1 Team members, visitors, external collaborators

Research Scientists‌

Philippe de Groote [Team leader, INRIA‌, Senior Researcher]
Bruno Guillaume [INRIA‌, Researcher]
Vincent Martin [INRIA,‌ Researcher]
Sylvain Pogodalla [INRIA, Researcher‌]

Faculty Members

Maxime Amblard [UL,‌ Professor, HDR]
Karën Fort [UL‌, Professor, HDR]
Jacques Jayez [‌ENS DE LYON, Emeritus]
Michel Musiol‌ [UL, Professor Delegation, until Aug‌ 2025, HDR]

PhD Students

Clémentine Bleuze‌ [UL]
Hee-Soo Choi [UL,‌ ATER]
Marie Cousin [UL]
Amandine‌ Decker [UL]
Fanny Ducel [UNIV‌ PARIS SACLAY]
Maxime Guillaume [YSEOP,‌ CIFRE]
Amandine Lecomte [UL, until‌ Mar 2025]
Siyana Pavlova [UL,‌ ATER]
Valentin Richard [Univ Amsterdam]‌
Vincent Tourneur [UL]
Rémi de Vergnette‌ de Lamotte [UL, from Nov 2025‌]

Technical Staff

Khensa Amani Daoudi [INRIA‌, Engineer, until Jan 2025]
Amandine‌ Lecomte [UL, from Mar 2025]‌
Iglika Zlatkova Nikolova-Stoupak [UL]

Interns and‌ Apprentices

Mohammad Al Takach [UL, from‌ Mar 2025 until Aug 2025]
Jeffrey Andre‌ [UL, Intern, from Apr 2025‌ until May 2025]
Apolline Bastien [UL‌, Intern, from Jun 2025 until Jun‌ 2025]
Ahana Chattopadhyay [UL, Intern‌, from Mar 2025 until Aug 2025]‌
Luc Cheng [INRIA, Intern, from‌ Mar 2025 until Jul 2025]
Florian Cuny‌ [UL, Intern, from Jun 2025‌ until Aug 2025]
Lucie Digoin-Caparros [UL‌, Intern, from Jun 2025 until Aug‌ 2025]
Mae Dugoua Jacques [CNRS, Intern, from Apr‌ 2025 until Jun 2025‌]
Samba Fall [‌‌INRIA, Intern, from Jun 2025 until‌ Aug 2025]
Zsofia‌ Flora Hauk [UL‌‌, Intern, from Jun 2025 until Aug‌ 2025]
Jules Holder‌ [CNRS, Intern‌‌, from Apr 2025 until Jun 2025]‌
Vidit Khazanchi [UL‌, Intern, from‌‌ May 2025 until Jul 2025]
Owen Le‌ Ray [UL,‌ Intern, from Nov‌‌ 2025]
Loic Leclere [UL, Intern‌, from Jun 2025‌ until Aug 2025]‌‌
Tadzhat Marharian [UL, Intern, from‌ Jun 2025 until Aug‌ 2025]
Ivaylo Mitov‌‌ [UL, Intern, from Jun 2025‌ until Aug 2025]‌
Wassila Oudinache [UL‌‌, Intern, from Apr 2025 until Jun‌ 2025]
Arthur Pedrini‌ [UL, Intern‌‌, from Jun 2025 until Aug 2025]‌
Shayan Ahmed Sharriff [‌UL, Intern,‌‌ from Jun 2025 until Aug 2025]
Austin‌ Tangban [UL,‌ from Jul 2025 until‌‌ Aug 2025]
Enola Thomas [UL,‌ Intern, from Jun‌ 2025 until Jun 2025‌‌]
Celine Zyna Rahme [UL, from‌ Jul 2025 until Aug‌ 2025]
Rémi de‌‌ Vergnette de Lamotte [UL, Intern,‌ from Mar 2025 until‌ Aug 2025]

Administrative‌‌ Assistants

Véronique Constant [INRIA]
Sophie Drouot‌ [INRIA]
Anne-Marie‌ Messaoudi [LORIA,‌‌ from Sep 2025]
Anne-Marie Messaoudi [UL‌, until Aug 2025‌]
Gallown Nizard [‌‌UL]
Cecilia Olivier [INRIA]

External‌ Collaborators

Mathieu Constant [‌UL]
Khensa Amani‌‌ Daoudi [UNICAEN, from Feb 2025 until‌ Aug 2025]
Roberto‌ Diaz Hernandez [Univ‌‌ Jaén, from Apr 2025]
Michel Musiol‌ [UL, from‌ Sep 2025, HDR‌‌]

2 Overall objectives

2.1 Scientific Context

Computational‌ linguistics is a discipline‌ at the intersection of‌‌ computer science and linguistics. On the theoretical side,‌ it aims to provide‌ computational models of the‌‌ human language faculty. On the applied side, it‌ is concerned with natural‌ language processing and its‌‌ practical applications.

From a structural point of view,‌ linguistics is traditionally organized‌ into the following sub-fields:‌‌

Phonology, the study of language abstract sound systems.‌
Morphology, the study of‌ word structure.
Syntax, the‌‌ study of language structure, i.e., the way words‌ combine into grammatical phrases‌ and sentences.
Semantics, the‌‌ study of meaning at the levels of words,‌ phrases, and sentences.
Pragmatics,‌ the study of the‌‌ ways in which the meaning of an utterance‌ is affected by its‌ context.

Computational linguistics is‌‌ concerned by all these fields. Consequently, various computational‌ models, whose application domains‌ range from phonology to‌‌ pragmatics, have been developed. Among these, logic-based models‌ play an important part,‌ especially at the “highest”‌‌ levels.

At the level of syntax, generative grammars‌ may be seen as‌ basic inference systems, while‌‌ categorial grammars are based‌ on substructural logics specified by Gentzen sequent calculi.‌ Finally, model-theoretic grammars amount to sets of logical‌ constraints to be satisfied.

At the level of‌ semantics, the most common approaches derive from Montague‌ grammars, which are based on the simply typed‌ $λ$ -calculus and Church's simple theory of types.‌ In addition, various logics (modal, hybrid, intensional, higher‌ order...) are used to express logical semantic representations.‌

At the level of pragmatics, the situation is‌ less clear. The word pragmatics has been introduced‌ by Morris to designate the branch of philosophy‌ of language that studies, besides linguistic signs, their‌ relation to their users and the possible contexts‌ of use. The definition of pragmatics was not‌ quite precise, and, for a long time, several‌ authors have considered (and some authors are still‌ considering) pragmatics as the wastebasket of syntax and‌ semantics. Nevertheless, as far as discourse processing is‌ concerned (which includes pragmatic problems such as pronominal‌ anaphora resolution), logic-based approaches have also been successful.‌ In particular, Kamp's Discourse Representation Theory gave rise‌ to sophisticated `dynamic' logics. The situation, however, is‌ less satisfactory than it is at the semantic‌ level. On the one hand, we are facing‌ a kind of logical “tower of Babel”. The‌ various pragmatic logic-based models that have been developed,‌ while sharing underlying mathematical concepts, differ in several‌ respects and are too often based on ad‌ hoc features. As a consequence, they are difficult‌ to compare and appear more as competitors than‌ as collaborative theories that could be integrated. On‌ the other hand, several phenomena related to discourse‌ dynamics (e.g., context updating, presupposition projection and accommodation,‌ contextual reference resolution...) are still lacking deep logical‌ explanations. We strongly believe, however, that this situation‌ can be improved by applying to pragmatics the‌ same approach Montague applied to semantics, using the‌ standard tools of mathematical logic.

Accordingly:

The overall‌ objective of the Sémagramme project is to design‌ and develop new unifying logic-based models, methods, and‌ tools for the semantic analysis of natural language‌ utterances and discourses. This includes the logical modeling‌ of pragmatic phenomena related to discourse dynamics. Typically,‌ these models and methods will be based on‌ standard logical concepts (stemming from formal language theory,‌ mathematical logic, and type theory), which should make‌ them easy to integrate.

The project is organized‌ along three research directions (i.e., syntax-semantics interface,‌ discourse dynamics, and common basic resources),‌ which interact as explained below.

Moreover, a transversal‌ and transdisciplinary theme has been developed in the‌ team in the past years: ethics in NLP‌ and more generally in AI.

2.2 Syntax-Semantics Interface‌

The Sémagramme project intends to focus on the‌ semantics of natural languages (in a wider sense‌ than usual, including some pragmatics). Nevertheless, the semantic‌ construction process is syntactically guided, that is, the‌ constructions of logical representations of meaning are based‌ on the analysis of the syntactic structures. We‌ do not want, however, to commit ourselves to such or such specific‌ theory of syntax. Consequently,‌ our approach should be‌‌ based on an abstract generic model of the‌ syntax-semantic interface.

Here, an‌ important idea of Montague‌‌ comes into play, namely, the “homomorphism requirement”: semantics‌ must appear as a‌ homomorphic image of syntax.‌‌ While this idea is almost a truism in‌ the context of mathematical‌ logic, it remains challenged‌‌ in the context of natural languages. Nevertheless, Montague's‌ idea has been quite‌ fruitful, especially in the‌‌ field of categorial grammars, where van Benthem showed‌ how syntax and semantics‌ could be connected using‌‌ the Curry-Howard isomorphism. This correspondence is the keystone‌ of the syntax-semantics interface‌ of modern type-logical grammars.‌‌ It also motivated the definition of our own‌ Abstract Categorial Grammars 77‌.

Technically, an Abstract‌‌ Categorial Grammar simply consists of a (linear) homomorphism‌ between two higher-order signatures.‌ Extensive studies have shown‌‌ that this simple model allows several grammatical formalisms‌ to be expressed, providing‌ them with a syntax-semantics‌‌ interface for free 75, 8.

We‌ intend to carry on‌ with the development of‌‌ the Abstract Categorial Grammar framework. At the foundational‌ level, we will define‌ and study possible type‌‌ theoretic extensions of the formalism, in order to‌ increase its expressive power‌ and its flexibility. At‌‌ the implementation level, we will continue the development‌ of an Abstract Categorial‌ Grammar support system.

As‌‌ said above, considering the syntax-semantics interface as the‌ starting point of our‌ investigations allows us not‌‌ to be committed to some specific syntactic theory.‌ The Montagovian syntax-semantics interface,‌ however, cannot be considered‌‌ to be universal. In particular, it does not‌ seem to be well‌ adapted to dependency and‌‌ model-theoretic grammars. Consequently, in order to be as‌ generic as possible, we‌ intend to explore alternative‌‌ models of the syntax-semantics interface. In particular, we‌ will explore relational models‌ where several distinct semantic‌‌ representations can correspond to the same syntactic structure.‌

2.3 Discourse Dynamics

It‌ is well known that‌‌ the interpretation of a discourse is a dynamic‌ process. Take a sentence‌ occurring in a discourse.‌‌ On the one hand, it must be interpreted‌ according to its context.‌ On the other hand,‌‌ its interpretation affects this context, and must therefore‌ result in an updating‌ of the current context.‌‌ For this reason, discourse interpretation is traditionally considered‌ to belong to pragmatics.‌ The cut between pragmatics‌‌ and semantics, however, is not that clear.

As‌ we mentioned above, we‌ intend to apply to‌‌ some aspects of pragmatics (mainly, discourse dynamics) the‌ same methodological tools Montague‌ applied to semantics. The‌‌ challenge here is to obtain a completely compositional‌ theory of discourse interpretation,‌ by respecting Montague's homomorphism‌‌ requirement. We think that this is possible by‌ using techniques coming from‌ programming language theory, in‌‌ particular, continuation semantics, and the related theories of‌ functional control operators.

We‌ have indeed successfully applied‌‌ such techniques in order to model the way‌ quantifiers in natural languages‌ may dynamically extend their‌‌ scope 76. We‌ intend to tackle, in a similar way, other‌ dynamic phenomena (typically, anaphora and referential expressions, presupposition,‌ modal subordination...).

What characterizes these different dynamic phenomena‌ is that their interpretations need information to be‌ retrieved from a current context. This raises the‌ question of the modeling of the context itself.‌ At a foundational level, we have to answer‌ questions such as the following. What is the‌ nature of the information to be stored in‌ the context? What are the processes that allow‌ implicit information to be inferred from the context?‌ What are the primitives that allow a context‌ to be updated? How does the structure of‌ the discourse and the discourse relations affect the‌ structure of the context? These questions also raise‌ implementation issues. What are the appropriate data types?‌ How can we keep the complexity of the‌ inference algorithms sufficiently low?

2.4 Common Basic Resources‌

Even if our research primarily focuses on semantics‌ and pragmatics, we nevertheless need syntax. More precisely,‌ we need syntactic trees to start with. We‌ consequently need grammars, lexicons, and parsing algorithms to‌ produce such trees. During the last years, we‌ have developed the notion of interaction grammar 78‌ and graph rewriting 3, 4 as models‌ of natural language syntax. This includes the development‌ of grammars for French 91, together with‌ morphosyntactic lexicons. We intend to continue this line‌ of research and development. In particular, we want‌ to increase the coverage of our grammars for‌ French, and provide our parsers with more robust‌ algorithms.

Further primary resources are needed in order‌ to put at work a computational semantic analysis‌ of utterances and discourses. As we want our‌ approach to be as compositional as possible, we‌ must develop lexicons annotated with semantic information. This‌ opens the quite wide research area of lexical‌ semantics.

Finally, when dealing with logical representations of‌ utterance interpretations, the need for inference facilities is‌ ubiquitous. Inference is needed in the course of‌ the interpretation process, but also to exploit the‌ result of the interpretation. Indeed, an advantage of‌ using formal logic for semantic representations is the‌ possibility of using logical inference to derive new‌ information. From a computational point of view, however,‌ logical inference may be highly complex. Consequently, we‌ need to investigate which logical fragments can be‌ used efficiently for natural language oriented inference.

3‌ Research program

3.1 Overview

The research program of‌ Sémagramme aims to develop models based on well-established‌ mathematics. We seek two main advantages from this‌ approach. On the one hand, by relying on‌ mature theories, we have at our disposal sets‌ of mathematical tools that we can use to‌ study our models. On the other hand, developing‌ various models on a common mathematical background will‌ make them easier to integrate, and will ease‌ the search for unifying principles.

The main mathematical‌ domains on which we rely are formal language‌ theory, symbolic logic, and type theory.

3.2 Formal Language Theory

Formal language‌ theory studies the purely‌ syntactic and combinatorial aspects‌‌ of languages, seen as sets of strings (or‌ possibly trees or graphs).‌ Formal language theory has‌‌ been especially fruitful for the development of parsing‌ algorithms for context-free languages.‌ We use it, in‌‌ a similar way, to develop parsing algorithms for‌ formalisms that go beyond‌ context-freeness. Language theory also‌‌ appears to be very useful in formally studying‌ the expressive power and‌ the complexity of the‌‌ models we develop.

3.3 Symbolic Logic

Symbolic logic‌ (and, more particularly, proof‌ theory) is concerned with‌‌ the study of the expressive and deductive power‌ of formal systems. In‌ a rule-based approach to‌‌ computational linguistics, the use of symbolic logic is‌ ubiquitous. As we previously‌ said, at the level‌‌ of syntax, several kinds of grammars (generative, categorial...)‌ may be seen as‌ basic deductive systems. At‌‌ the level of semantics, the meaning of an‌ utterance is captured by‌ computing (intermediate) semantic representations‌‌ that are expressed as logical forms. Finally, using‌ symbolic logics allows one‌ to formalize notions of‌‌ inference and entailment that are needed at the‌ level of pragmatics.

3.4‌ Type Theory and Typed‌‌ Lambda-Calculus

Among the various possible logics that may‌ be used, Church's simply‌ typed $λ$ -calculus and‌‌ simple theory of types (also known as higher-order‌ logic) play a central‌ part. On the one‌‌ hand, Montague semantics is based on the simply‌ typed $λ$ -calculus, and‌ so is our syntax-semantics‌‌ interface model. On the other hand, as shown‌ by Gallin, the target‌ logic used by Montague‌‌ for expressing meanings (i.e., his intensional logic) is‌ essentially a variant of‌ higher-order logic featuring three‌‌ atomic types (the third atomic type standing for‌ the set of possible‌ worlds).

4 Application domains‌‌

4.1 Deep Semantic Analysis

Our applicative domains concern‌ natural language processing applications‌ that rely on a‌‌ deep semantic analysis. For instance, one may cite‌ the following ones:

textual‌ entailment and inference,
dialogue‌‌ systems,
semantic-oriented query systems,
content analysis of unstructured‌ documents,
(semi) automatic knowledge‌ acquisition,
discourse structure analysis‌‌ (argumentative relations, discourse markers),
lexical resources.

4.2 Text‌ Transformation

Text transformation is‌ an application domain featuring‌‌ two important sub-fields of computational linguistics:

parsing, from‌ surface form to abstract‌ representation,
generation, from abstract‌‌ representation to surface form.

Text simplification or automatic‌ summarization belong to that‌ domain.

We aim at‌‌ using the framework of Abstract Categorial Grammars we‌ develop to this end.‌ It is indeed a‌‌ reversible framework that allows both parsing and generation.‌ Its underlying mathematical structure‌ of $λ$ -calculus makes‌‌ it fit with our type-theoretic approach to discourse‌ dynamics modeling.

4.3 Types‌ for discourse markers

While‌‌ there is a rich descriptive literature on Discourse‌ Markers (DM), for instance‌ words/expressions like so or‌‌ yet in English, the question of their representation‌ in type systems is‌ understudied. In addition to‌‌ basic types such as individuals or events, or‌ simple functional types (properties,‌ etc.), DM are known‌‌ to operate on domains‌ like states of affairs, beliefs or speech acts.‌ The entities inhabiting these domains are themselves complex.‌ For instance, speech acts involve discourse planning in‌ the form of a network of intentions and‌ actions. Moreover, DM can combine with one another,‌ forming clusters whose meaning is not always apparent‌ from the meanings of the component DM. Within‌ the context of the ANR CODIM, we‌ aim at developing a typing system for (i)‌ taking into account the array of types denoted‌ by DM and (ii) addressing the questions of‌ the semantic nature of their combinations.

5 Social‌ and environmental responsibility

5.1 Footprint of research activities‌

ANR InExtenso:

WP4 of the project is dedicated‌ to the evaluation of the environmental impact of‌ the LLMs. More precisely, it aims at proposing‌ a method for measuring the environmental impact of‌ digital health and use it in the project‌ evaluations and beyond.

6 Latest software developments, platforms,‌ open data

6.1 Latest software developments

6.1.1 ACGtk‌

Name:
Abstract Categorial Grammar Development Toolkit
Keywords:
Natural‌ language processing, Functional programming, Logic programming, Lambda-calculus, Ocaml‌
Scientific Description:

Abstract Categorial Grammars (ACG) are a‌ grammatical formalism in which grammars are based on‌ typed lambda-calculus. A grammar generates two languages: the‌ abstract language (the language of parse structures), and‌ the object language (the language of the surface‌ forms, e.g., strings, or higher-order logical formulas), which‌ is the realization of the abstract language.

ACGtk‌ provides two software tools to develop and to‌ use ACGs: acgc, which is a grammar compiler,‌ and acg, which is an interpreter of a‌ command language that allows one, in particular, to‌ parse and realize terms.
Functional Description:
ACGtk provides‌ a piece of software for developing and using‌ Abstract Categorial Grammars (ACG).
Release Contributions:
This new‌ version of the software provides two important functionalities.‌ On the one hand, it provides support for‌ parsing with almost linear grammars. On the other‌ hand, it generates a javascript program to be‌ used and loaded by web browers, in order‌ to help demonstrating the software (a demo version‌ is available on-line from the public gitlab webpages‌ of the project).
URL:
https://gitlab.inria.fr/ACG/dev/ACGtk
Publications:
hal-01242154,‌ hal-01328702, tel-01412765, inria-00112956, inria-00100529,‌ hal-04479621
Contact:
Sylvain Pogodalla
Participants:
Philippe De Groote,‌ Pierre Ludmann, Jiri Marsik, Sylvain Pogodalla, Vincent Tourneur‌

6.1.2 Grew

Name:
Graph Rewriting
Keywords:
Semantics, Syntactic‌ analysis, NLP, Graph rewriting
Functional Description:
Grew is‌ a Graph Rewriting tool dedicated to applications in‌ NLP. Grew takes into account confluent and non-confluent‌ graph rewriting and it includes several mechanisms that‌ help to use graph rewriting in the context‌ of NLP applications (built-in notion of feature structures,‌ parametrization of rules with lexical information).
News of‌ the Year:
In 2025, three new versions (1.17,‌ 1.18 and 1.19) were released (together with several‌ bug fixes). New features are, for version 1.17:‌ Handling of multi-treebank requests in Grew-match, for version‌ 1.18: Improved handling of metadata and global constraints, for version 1.19: Introduction‌ of tuples of clustering‌ keys, improve corpusbank manager.‌‌
URL:
https://grew.fr/
Publications:
hal-03724068, hal-03177701, hal-01930591‌, hal-01814386, hal-03021720‌, hal-04387830, hal-04387852‌‌, hal-03724068, hal-03724129, hal-03846825
Contact:
Bruno‌ Guillaume
Participants:
Bruno Guillaume,‌ Guillaume Bonfante, an anonymous‌‌ participant

6.1.3 HostoMytho

Keywords:
Game with a purpose,‌ Natural language processing
Functional‌ Description:
HostoMytho is a‌‌ GWAP, or "game with a purpose" developed within‌ the framework of the‌ CODEINE ANR project. The‌‌ aim of the game is to allow users‌ to annotate medical files‌ generated automatically, in order‌‌ to evaluate their plausibility (quality of the language‌ and medical semantics) and‌ to add different layers‌‌ of information (negation, hypothesis, time, etc.). HostoMytho is‌ multiplatform.
URL:
https://hostomytho.atilf.fr
Publication:‌
hal-04555052v1
Contact:
Karën Fort‌‌
Partners:
LISN, CEA-List

6.1.4 Arborator-Grew

Name:
Arborator's Collaborative‌ Annotation
Keywords:
Annotation tool,‌ Syntactic analysis
Functional Description:‌‌
The online interface allows managing collaborative annotation projects‌ in dependency syntax. It‌ is possible to use‌‌ Grew queries and also to directly rewrite graphs‌ in the annotation tool.‌
News of the Year:‌‌
During 2025, we continued to refactor the code‌ base for both frontend‌ and backend. In addition,‌‌ we worked on improving existing functionalities and adding‌ new ones based on‌ user requests.
URL:
https://arborator.grew.fr/‌‌
Publication:
hal-03021720
Contact:
Bruno Guillaume
Participant:
5 anonymous‌ participants
Partners:
Université Paris‌ Nanterre, LIMSI, LISN

6.2‌‌ Open data

7 New results

7.1 Syntax-Semantics Interface‌

Participants: Maxime Amblard,‌ Marie Cousin, Philippe‌‌ de Groote, Amandine Decker, Bruno Guillaume‌, Maxime Guillaume,‌ Sylvain Pogodalla, Siyana‌‌ Pavlova, Valentin Richard, Zhengjian Li.‌

7.1.1 Abstract Categorial Grammars‌

Feature Structure

ACG has‌‌ proven to be a powerful framework with well-defined‌ theoretical properties. It was,‌ however, lacking a facility‌‌ which is useful and widely used for grammar‌ engineering: feature structures. The‌ latter are often used‌‌ to express in a concise way some combinatorial‌ properties related to morphosyntactic‌ properties of expressions, for‌‌ instance subject-verb agreement.

We worked on extending the‌ ACG type system to‌ provide a generic feature‌‌ structure framework. This extension relies on a restricted‌ addition of the product‌ (records) and dependent types‌‌ and still allows for the reduction of grammars‌ to Datalog programs (which‌ is used to implement‌‌ ACG parsing in ACGtk, see Sec.‌ 6). In his‌ thesis, Maxime Guillaume introduced‌‌ Affix Abstract Categorial Grammars (AACGs), an extension of‌ ACGs enriched by the‌ integration of feature structures.‌‌

First, he defined an enriched λ-calculus that extends‌ the simply typed linear‌ λ-calculus with enumerations, records,‌‌ and dependent products. On this basis, he defined‌ AACGs and demonstrated their‌ strong equivalence with classical‌‌ ACGs through a series of formal transformations. The‌ algorithmic implications of this‌ equivalence for parsing were‌‌ then studied. An adaptation of Kanazawa’s reduction was‌ presented. This adaptation guarantees‌ polynomial-time complexity while preserving‌‌ the factorization benefits specific to AACGs. Finally, to‌ validate the industrial applicability‌ of this approach, a‌‌ dedicated compiler for AACGs‌ was designed and implemented, integrated into a text‌ generation engine. Experiments conducted on a large-coverage French‌ grammar highlight a significant reduction in grammar size‌ as well as a notable improvement in parsing‌ and generation performance.

Encoding of Meaning-Text Theory Into‌ ACGs

Meaning-Text Theory (MTT) is a linguistic theory‌ geared towards generating natural language expressions from semantic‌ representations 87. It relies on seven representation‌ levels (e.g., semantics, deep syntax, surface syntax, etc.).‌ Representations at each level are related to representations‌ at the adjacent levels by rewriting devices. Each‌ representation is made of several structures, among which‌ the predicative and the communicative ones. MTT uses‌ the key concept of paraphrase, especially in these‌ rewriting devices. ACGs come with several composition modes,‌ one of which in particular corresponds to transduction‌ of (tree or graph) structures.

We have therefore‌ been studying the ability of ACGs to model‌ MTT structure transformations between adjacent levels, focusing on‌ the structures and levels of semantics, deep syntax,‌ and surface syntax.

In previous work 68,‌ 67 we proposed an encoding of MTT into‌ ACGs where the predicative structure of the semantic‌ level in MTT was used. However, MTT rewriting‌ processes also make use of communicative structure information,‌ decorating the predicate structures (at the semantic level)‌ with theme and rheme information.

Indeed, both expressions‌ "Charlie is Taylor's son" and "Charlie, the son‌ of Taylor" share the same predicative structure and‌ are not paraphrases of each other. While the‌ second one is a nominal expression, the first‌ one is a verbal expression about Charlie, that‌ states that he is Taylor's son. The difference‌ between both expressions, that share the same semantic‌ predicative graph, is made by the communicative structure.‌

It shows the crucial role the communicative structure‌ plays in MTT since they determine, from a‌ given semantic graph (i.e., predicative and communicative structures),‌ which deep-syntactic graph is to be obtained. We‌ have therefore proposed to also take into account‌ this communicative structure, using suitable types and grammatical‌ composition as offered by the ACG framework 50‌, 59.

We also proposed an alternative‌ approach to representing deep and surface syntactic trees‌ 42, 29. This alternative approach, based‌ on 74, 73, allows for a‌ more flexible and generic representation of the syntactic‌ structures, and for a better account of modifiers‌ (adverbs, adjectives).

7.1.2 Formal semantics of adnominal modification‌

We have proposed a treatment of adnominal modification‌ that parallels the treatment of adverbial modification in‌ neo-Davidsonian event semantics 32. To this end,‌ we introduced a notion of perspective that allows‌ nouns to be interpreted as sets of sets‌ of perspective. The resulting theory provides a unified‌ compositional treatment of intersective, subsective, modal, and privative‌ adjectives, and avoids the intensional paradoxes caused by‌ an extensional treatment of subsecutive adjectives. Building on‌ this work, we have advocated for unifying the‌ concepts of events, states, and perspectives. We then defined possible worlds as‌ sets of such event-like‌ concepts. This approach allows‌‌ different semantic treatments proposed in the literature to‌ be reconstructed within a‌ unified framework. In particular,‌‌ it provides a formal treatment of the ambiguity‌ between the intersective and‌ the subsective interpretation that‌‌ some adjectives present 52. Finally, with the‌ aim of giving an‌ account of hyperintensional phenomena‌‌ related to the interpretation of proper names that‌ refer to the same‌ individual but cannot be‌‌ substituted one for the other, we came up‌ with the radical idea‌ of interpreting individuals as‌‌ sets of perspectives 28.

7.1.3 Semantic treatment‌ of plurals textual mathematics.‌

We reviewed issues related‌‌ to the semantics of plurals in natural language‌ and demonstrated how these‌ issues arise in the‌‌ case of mathematical texts. In particular, we focused‌ on the distinction between‌ collective and distributive predicates‌‌ 26. We also studied the conditions under‌ which adjectives that denote‌ binary relations can be‌‌ used as collective predicates. This led us to‌ propose a fine-grained semantic‌ interpretation of grammatical numbers‌‌ and to introduce distributivity operators that enable a‌ compositional semantic treatment of‌ plurals in natural mathematics‌‌ 27.

7.1.4 Semantic Representation

Siyana Pavlova defended‌ her PhD thesis in‌ June 2025 58,‌‌ in which she presented YARN, a new semantic‌ representation formalism that aims‌ to combine the benefits‌‌ of logic-based formalisms with direct interpretability, making it‌ widely usable. YARN is‌ rooted in the encoding‌‌ of different semantic phenomena as separate layers. The‌ thesis presents a formal‌ definition of the mathematical‌‌ structure that constitutes YARN and illustrates with concrete‌ examples how this structure‌ can be used in‌‌ the context of semantic representation for encoding multiple‌ phenomena (such as modality,‌ negation and quantification) as‌‌ layers built on top of a central predicate-argument‌ structure. The benefit of‌ YARN is that it‌‌ allows for the independent annotation and analysis of‌ different phenomena as they‌ are easy to “switch‌‌ off”. Furthermore, the ability of YARN to encode‌ simple interactions between phenomena‌ is explored. The thesis‌‌ concludes with a discussion of some of the‌ interesting observations made during‌ the development of YARN‌‌ so far and outline our extensive future plans‌ for this formalism.

In‌ 40 Rémi De Vergnette,‌‌ Maxime Amblard and Bruno Guillaume present different modular‌ evaluation metrics for Layered‌ Meaning Representation, defined as‌‌ YARN, a semantic formalism encoded using rich structures‌ that generalize AMR graphs.‌ While existing metrics like‌‌ SMATCH evaluate graph-based semantic representations such as AMR,‌ they cannot directly handle‌ YARN's more complex structures.‌‌ A full use of the modular nature of‌ YARN is used to‌ propose two families of‌‌ metrics, depending on the linguistic features and type‌ of semantic phenomenon targeted.‌ The first one, SMATCHY,‌‌ extends the AMR SMATCH metric. The new metric‌ YARNBLEU, based on the‌ SEMBLEU metric for AMR‌‌ is also proposed. Both families are evaluated on‌ a small dataset of‌ human annotated YARN structures,‌‌ adding random modifications simulating‌ annotation mistakes and show that SMATCHY provides a‌ more consistent and reliable approach with respect to‌ the type of modifications considered.

Ivaylo Mitov and‌ Tadzhat Marharian conducted both an M1 internship under‌ the supervision of Bruno Guillaume and Maxime Amblard.‌ Ivaylo Mitov worked on the developement on annotation‌ for AMR and for YARN for other languages‌ and on the production of YARN from AMR,‌ leveraging Universal Dependencies annotations. Tadzhat Marharian started the‌ developement of a new Graphical User Interface for‌ managing YARN annotations.

In 43 Amandine Decker and‌ Maxime Amblard discuss the limits of semantic representation‌ formalisms, in particular when it comes to representing‌ meaning in context and interaction. Detailed representations can‌ be used as basis for natural language understanding‌ or generation. While these formalisms produce thorough analysis,‌ they do not cover some crucial aspects of‌ real language use. Most semantic representation formalisms like‌ AMR, DRS or UMR operate out-of-context, which means‌ they ignore a significant part of the content‌ of the utterances they analyse. In this work‌ they discuss various aspects of language use left‌ out by semantic representation formalisms and argue that‌ future work in this field should include extending‌ these formalisms so they could cover the interactive‌ aspect of language.

7.1.5 Syntax and semantics of‌ questions

Natural language statements are composed not only‌ of declarative sentences but also of interrogative ones.‌ Moreover, sentences cannot be categorized into purely declarative‌ or purely interrogative sentences. Typically, a declarative statement‌ may contain a subordinated interrogative clause:

(a)
I‌ don't know where Mary is.

We observe that‌ noun phrases and declarative clauses can sometimes raise‌ alternatives like hidden questions. For example, in a‌ dependence statement like (b), several scenarios are considered‌ (sunny, rainy,..., going to the beach, not going‌ to the beach) and are related to each‌ other implicitly. In 55, a compositional way‌ to derive and link these alternatives is laid‌ out.

(b)
Depending the weather, we might go‌ to the beach.
(c)
Ça dépend (de) quel‌ temps il fait.

In French, similar sentences using‌ the verb dépendre can embed an interrogative clause.‌ However, it is unclear what is more standard‌ between keeping the de preposition or removing it‌ in cases like (c). The contribution 54 investigates‌ this grammatical issue by establishing corpus statistics on‌ the frequency of a preposition between a verb‌ and its embedded interrogative clause.

Like indefinites, interrogative‌ words can be referred to by other expressions.‌ For example, she $_{i}$ in (d) refers to‌ the person who was sitting there. This kind‌ of anaphora has not been fully considered in‌ anaphora-annotated corpora. The study 53 tries to evaluate‌ this by making an inventory of the (missing)‌ annotations of anaphora with a wh-word in the‌ French corpus ANCOR.

(d)
Who $^{i}$ was sitting‌ there? She $_{i}$ forgot her bag.

7.1.6 Use‌ of semantics

Before the invention of the printing‌ press, texts could only be reproduced through manual copying, a process prone‌ to errors, accidents, and‌ intentional modifications. These changes‌‌ altered each manuscript and were subsequently propagated by‌ other scribes. For philologists‌ reconstructing text history and‌‌ genealogical relationships (stemma codicum), analyzing these variants is‌ crucial. Stemmatology methods aim‌ to objectively construct genealogical‌‌ trees of textual transmission.

At the University of‌ Lorraine, the Écritures laboratory‌ and MSH have focused‌‌ on uncovering the genealogical lineage of Hebrew manuscripts.‌ A join project with‌ Maxime Amblard seeks to‌‌ improve the manual work involved in critical editions‌ of the Hebrew Bible‌ by applying advanced methods‌‌ from applied mathematics and natural language processing to‌ reconstruct stemmas. With Iglika‌ Zlatkova Nikolova-Stoupak, they design,‌‌ train and test learning model to automatically tag‌ scribal variants in manuscripts.‌

The current project 36‌‌ is inscribed within the field of stemmatology or‌ the study and/or reconstruction‌ of textual transmission based‌‌ on the relationship between the available witnesses of‌ given texts. In particular,‌ the variants (differences) at‌‌ the word-level in manuscripts written in Biblical Hebrew‌ are addressed. A dataset‌ based on the Book‌‌ of Ben Sira is manually annotated for the‌ following variant categories: ‘plus/minus’,‌ ‘inversion’, ‘morphological’, ‘lexical’ or‌‌ ‘unclassifiable’. A strong classifier (F1 value of 0.80)‌ is then trained to‌ predict these categories in‌‌ collated (aligned) pairs of witnesses. The classifier is‌ non-neural and makes use‌ of the two words‌‌ themselves as well as part-of-speech (POS) tags, hand-crafted‌ rules per category, and‌ additional synthetically derived data.‌‌ Other models experimented with include neural ones based‌ on the state-of-the-art model‌ for Modern Hebrew, DictaBERT.‌‌ Other features whose relevance is tested are different‌ types of morphological information‌ pertaining to the word‌‌ pairs and the Levenshtein distance between the words‌ within a pair. The‌ strongest classifier as well‌‌ as the used data are made publicly available.‌ Coincidentally, the corelation between‌ two sets of morphological‌‌ labels is investigated: professionally established as per the‌ QumranDigital online library and‌ automatically derived with the‌‌ sub-model DictaBERT-morph.

Maxime Amblard pursue a collaboration with‌ the French Company Namkin.‌ With Georgios Zervakis, they‌‌ develop BEE A First Assessment of Language Models‌ for Business Event Extraction.‌ Event Extraction (EE) is‌‌ the task of automatically extracting relevant information about‌ events in text. Business‌ events in particular, such‌‌ as corporate investments or product launches, can provide‌ enterprises with insight into‌ how to better position‌‌ themselves in the market with respect to the‌ competition. We benchmark existing‌ EE systems in the‌‌ business domain. To this end, we introduce BEE‌ (Business Event Extraction), a‌ manually-curated corpus for end-to-end‌‌ business event extraction. Empirical results of four different‌ system architectures demonstrate the‌ challenging nature of BEE,‌‌ with Large Language Models (LLMs) underperforming compared to‌ smaller models. Finally, we‌ employ complementary evaluation metrics‌‌ to understand the types of errors and reveal‌ significant performance gains

While‌ modern semantic representations may‌‌ contain vast quantities of information, they do not‌ always (or necessarily) contain‌ the information that is‌‌ useful for the concrete‌ application. For instance, significant challenges still persist in‌ dealing with temporal relations and finely-grained negation interpretation.‌

Recent research has looked into the benefits of‌ exploiting semantic representations, and in particular Abstract Meaning‌ Representation, for low-resources scenarios and document level event‌ argument extraction. However, it appears that AMR has‌ to be adapted in order to optimally support‌ event extraction related tasks 95. One major‌ limitation of AMR for document-level event extraction is‌ that AMR works at the sentence level, and‌ thus requires the aggregation of sentence-level representations. AMR‌ is also limited in terms of negation and‌ universal quantification expressive power.

7.2 Distributional Semantics and‌ Lexical Structures

Participants: Sylvain Pogodalla.

Numerical and‌ continuous representation of word semantics, in particular vector‌ representations, and neural learning techniques gave rise to‌ impressing results on a large number of natural‌ language processing tasks. These representations, or embeddings,‌ rely on the distributional hypothesis 79, 71‌: the meaning of word is provided by‌ the linguistic context in which it occurs, and‌ semantically related words should be represented by related‌ embeddings.

However, the very nature of semantic relatedness‌ encoded in embeddings remains somewhat unspecified, and can‌ express different relations as classified by linguists (e.g.,‌ synonymy, hyponymy, etc.) 90, 81, and‌ may even depend on the chosen methods to‌ compute the vector similarity 88, the size‌ of the context or its type 94,‌ 89.

We have been studying the vector‌ representations as provided by transformer and attention models‌ 93 and compare them with linguistic knowledge as‌ expressed by linguists. We rely more precisely on‌ the theory of combinatorial explanatory lexicology, the lexicological‌ part of the Meaning-Text Theory melcuk-polguere:2016,melcuk-polguere:2021, which hinges‌ upon collocations to structure lexical knowledge as graphs.‌ This theory provides a fine-grained description of lexical‌ relations against which numerical models can be compared,‌ as well as lexical resources (a lexicon for‌ French 85, 65 and annotated examples 64‌). We focus on lexical structure, where previous‌ works rather focused on morphosyntactic information and syntactic‌ structures 82, 86. Data construction and‌ statistical analysis is being performed and a publication‌ is in preparation.

7.3 Discourse Dynamics

Participants: Maxime‌ Amblard, Philippe de Groote, Amandine Decker‌, Jacques Jayez, Michel Musiol, Emeric‌ Licorni, Ines Hernandez.

7.3.1 Dialogue Modeling‌

Dialogue encompasses a vast diversity of interactional forms‌ which grows with technological and societal evolutions, such‌ as the generalisation of video-mediated communication following the‌ COVID-19 pandemic. As dialogue data becomes increasingly heterogeneous,‌ modelling dialogue requires not only algorithmic advances but‌ also a precise characterisation of the data on‌ which these models are developed and evaluated. In‌ order to better understand current practices in the‌ field, Amandine Decker, Maxime Amblard and Ellen Breitholtz‌ (Gothenburg University, Sweden) conducted a meta-review of papers‌ on dialogue published in the ACL Anthology in‌ 2024 30. This analysis provides an empirical overview of how dialogue‌ data is really described‌ and used by the‌‌ community. One of the main observations is that‌ dialogue data is increasingly‌ treated primarily as a‌‌ resource for model training, rather than as an‌ object of analysis in‌ its own right. As‌‌ a consequence, research overwhelmingly focuses on English-language datasets,‌ with a strong preference‌ for clean, high-quality dialogues,‌‌ and often overlook distinctions between task-oriented and open-domain‌ interactions. These practices make‌ it difficult to establish‌‌ a principled framework for selecting appropriate dialogue resources‌ for a given task‌ and limit our ability‌‌ to assess the scope and generalisability of reported‌ results. This line of‌ work aims to contribute‌‌ to a more explicit reflection on dialogue data‌ and its role in‌ dialogue modelling research.

This‌‌ work is complemented by ongoing research by Amandine‌ Decker, Maxime Amblard and‌ Ellen Breitholtz on topical‌‌ structure analysis through the collection of a corpus‌ of chat-based interactions in‌ both English and French.‌‌ The objective is to develop a resource specifically‌ designed to support the‌ study of topical organisation‌‌ in dialogue, with a particular focus on how‌ participants interpret and accommodate‌ potentially incoherent contributions during‌‌ interaction.

7.3.2 Discourse Markers

Jacques Jayez continues working‌ with Mathilde Dargnat (ATILF),‌ Paola Herreño (Ph.D. candidate‌‌ ATILF-LLF) and Maeva Sillaire (Ph.D. candidate ATILF) on‌ the semantic representation of‌ D(iscourse) M(arkers). DMs are‌‌ words/expressions like so or well in English which‌ help structuring discourse or‌ communicating speakers' internal epistemic‌‌ or affective states as well as interactional moves.‌ The discourse structuring function‌ is the hallmark of‌‌ connective DMs, which correspond to a large variety‌ of discourse relations (causal,‌ explanatory, concessive, temporal, etc.).‌‌ Other functions are realized by discourse particles which‌ can express for instance‌ surprise, attention modification or‌‌ various interactional moves (backchannels, calls to attention, etc.)‌ 80.

Investigating the‌ semantic profile of DMs‌‌ is developed through three distinct but not quite‌ independent subtasks. (1) Characterizing‌ what DMs index (refer‌‌ to, denote, etc). The domain-based approach initiated in‌ the 90s consists in‌ defining different types (aka‌‌ domains) of semantic objects, like states of‌ affairs, beliefs or speech‌ acts. Domains are instrumental‌‌ in teasing apart subclasses of connective DMs 84‌. Discourse particles index‌ internal states of speakers‌‌ or interactional operations 69. (2) The second‌ subtask consists in determining‌ what the semantic contribution‌‌ of a DM is (propositional content, presupposition, conventional‌ implicature). The semantic contribution‌ aspect interacts with the‌‌ indexing behaviour of DMs for connectives 84 and,‌ moreover, in the case‌ of particles, raises the‌‌ question of the semantic analysis of `side effects'‌ in terms of monads,‌ as exemplified by Asudeh‌‌ and Giorgolo 66 a.o. (3) The intuitions about‌ the lexical meaning of‌ DMs are notoriously difficult‌‌ to substantiate, in particular for particles. We are‌ currently studying how different‌ types of intuitions can‌‌ be coded in the declarative format of Dialogue‌ Game Boards of 72‌. Points (2) and‌‌ (3) converge toward the‌ problem of defining an ontology which extends that‌ of Ginzburg by including commitment, intentions and side-effects,‌ in order to take into account the distinctions‌ introduced in 92.

In the context of‌ the CODIM ANR project, we have designed‌ a workflow for annotating the DMs in our‌ set of French spoken and written corpora and‌ analysing the statistical properties of DM sequences. Given‌ the overall poor performance of LLMs, we have‌ kept the finite automata approach previously developed in‌ CODIM, constructing a final cascade of 622 automata‌ with the help of the Unitex-Gramlab software. The‌ cascade extracts 900 DM types from the corpora‌ for a total of 8195046 DM occurrences. The‌ annotation results are normalized and passed to a‌ set of 10 association measure functions, which estimate‌ the strength of association between any two juxtaposed‌ DMs in the corpora. The resulting vectors are‌ scaled and compared by various distance estimators, in‌ order to create a hierarchy of association for‌ any two DMs sharing a common associate, for‌ instance alors and bon with respect to mais‌ in the pairs mais alors and mais bon‌.

Jacques Jayez has refined his work on‌ the argumentative dimension of discourse, and the last‌ version of his submission for a book on‌ implicit manipulation has been accepted by de Gruyter‌ 83.

7.3.3 Pathological Discourse Modeling

Also based‌ on interviews between psychologists and schizophrenia patient, we‌ began a study on the alignement between discourse‌ descriptors and speech characteristics, in order to uncover‌ potential link between what is said (discourse) and‌ how it is said (speech characteristics). To do‌ so, Vincent Martin supervised two M1 students (Speech‌ pathologists) who worked on pauses characteristics on the‌ difference discourse structures ; he then supervised two‌ other interns (Zsofia Hauwk, M1 and Maé Dugoua-Jacques,‌ L3) to work on the automation of diarization‌ (speakers separation) and text transcription of these interviews.‌ The low audio quality has represented a significant‌ challenge, which we are currently trying to resolve‌ at the time of writing this report.

Vincent‌ Martin also proposed a new framework for analysing‌ speech acoustic quality using network analyses of acoustic‌ descriptors18. This framework has obtained relevant‌ results on the SpeechWelness challenge, adressing suicidability in‌ adolescent using only speech, the resultats have been‌ presented at Interspeech 2025 37.

In parallel‌ with this work, Vincent Martin pursued his work‌ about refining sleep 13, 11, 12‌, 14 and psychiatric semiology 15, in‌ order to improve the accuracy of digital psychiatry‌ devices by refining their targets.

Michel Musiol has‌ conducted theoretical, formal and empirical researches in semantics‌ and conversation analysis in order to relate the‌ linguistic, cognitive and psycholinguistic aspects of semantic representations‌ as they appear in discourse. For instance, with‌ Maxime Amblard, we build a formal, computational and‌ dynamic model likely to reveal the properties of‌ pathological discourse, based on the modeling of violations to coherence. In that‌ way, empirical studies were‌ based on clinical interviews‌‌ between psychologists or psychiatrists and schizophrenic patients 19‌ or between psychologists or‌ psychiatrists and bipolar patients‌‌ 10. In the first paper, our dialog‌ analysis model supplements to‌ existing methods which often‌‌ suffer from being ad-hoc, lacking compatibility with manual‌ analysis, or failing to‌ produce variables that align‌‌ with computational or algebraic analysis. In the second‌ paper, we show that‌ cognitive and conversational properties‌‌ measured with clinical assessment or discourse analysis have‌ led to the formulation‌ of a hypothesis suggesting‌‌ that the two pathologies might be situated on‌ a continuum. We examined‌ the hypothesis of such‌‌ a continuum in the context of the pragmatic‌ discontinuities that occur in‌ dialogue with a psychologist‌‌ and either a schizophrenic or a bipolar patient.‌ Furthermore, the aim is‌ to delineate the cognitive‌‌ and psycholinguistic impairments observed in the schizophrenic group‌ in comparison to the‌ bipolar group.

Anyway, this‌‌ program is intended to subsequently propose computerized tools‌ for diagnosis assistance, screening‌ of people at risk,‌‌ as well as psychotherapeutic and therapeutic evaluation and‌ follow-up 56. For‌ instance, we have investigated‌‌ the socio-behavioral dynamics of Shwachman-Diamond Syndrom, focusing on‌ how children with the‌ condition navigate cooperative interactions.‌‌ Using computational pragmatics, we aimed to identify the‌ underlying principles guiding their‌ social behavior 20.‌‌

In the line of last years project, Michel‌ Musiol and Maxime Amblard‌ pursue on the caracterisation‌‌ of pathological discourse. With Arthur Trognon, they published‌ a book chapter.

For‌ the PhD work of‌‌ Vincent-Thomas Barrouillet, in 10 they compare two matched‌ clinical interview corpora, conducted‌ with bipolar patients and,‌‌ under the same conditions, with schizophrenic patients. The‌ interview is non-directive, which‌ encourages the patient to‌‌ speak freely. Both corpora contain the same number‌ of words. They conduct‌ an exhaustive search for‌‌ "breaks" using an investigative model of discursive disorganization‌ that is sensitive to‌ the linguistic and illocutionary‌‌ properties of speech acts. We conduct an exhaustive‌ search for "breaks" using‌ an investigative model of‌‌ discursive disorganization that is sensitive to the linguistic‌ and illocutionary properties of‌ speech acts. These "breaks"‌‌ are then formally analyzed using hierarchical modeling, which‌ reveals the defective relationships‌ between speech acts in‌‌ the dynamic structuring of conversational sequences. They conclude‌ that hierarchical and dynamic‌ discourse analysis methodology is‌‌ a valuable tool for identifying certain bipolar disorders‌ as well as for‌ recognizing schizophrenic symptoms. It‌‌ also makes it possible to clarify the psycholinguistic‌ processes associated with the‌ expression of bipolar and‌‌ schizophrenic disorders in verbal interaction. Finally, it contributes‌ to the hypothesis of‌ a continuum between schizophrenia‌‌ and bipolar disorder, supporting the high-level cognitive processes‌ that underpin discursive competence.‌

7.4 Common Basic Resources‌‌

Participants: Maxime Amblard, Hee-Soo Choi, Philippe‌ de Groote, Bruno‌ Guillaume, Sylvain Pogodalla‌‌, Karën Fort.

7.4.1 Universal Dependencies and‌ Surface Syntactic Universal Dependencies‌

The Universal Dependencies (UD)‌‌ project aims to build‌ a syntactic dependency scheme that enables similar analyses‌ of several different languages. Bruno Guillaume is an‌ active member of the UD community and contributes‌ to the development and the improvement of the‌ French data within this international initiative.

In 2025,‌ he continued to work, in collaboration with Sylvain‌ Kahane, Kim Gerdes and their teams to promote‌ the Surface Syntactic Universal Dependencies (SUD) framework. SUD‌ is an annotation scheme for syntactic dependency treebanks,‌ that is almost isomorphic to UD (Universal Dependencies).‌ Unlike to UD, it is based on syntactic‌ criteria (favouring functional heads) and the relations are‌ defined on distributional and functional bases.

This work‌ is mainly conducted in the ANR project Autogramm‌ (Induction of descriptive grammar from annotated corpora), which‌ started in 01 2022. The project aims to‌ automate, as far as possible, the extraction of‌ descriptive grammars and grammatical descriptions from annotated corpora‌ for linguistic and typological studies. The project also‌ promotes the development of treebanks for low-resourced languages,‌ in order to extract quantitative descriptive grammars for‌ these languages.

In 38, the authors present‌ a new format of the Rhapsodie Treebank, which‌ contains both syntactic and prosodic annotations. This provides‌ a comprehensive dataset for the study of spoken‌ French. This integrated format enables complex, multilevel queries‌ and paves the way for intonosyntactic studies.

In‌ 34, the authors proposed a study of‌ the different statuses of the morphosyntactic features used‌ in UD treebanks. If most of these features‌ correspond to values of inflectional morphemes, some describe‌ lexical subclasses or are just conventional names of‌ (polysemic) morphemes. Syncretism is also a challenge, because‌ exact values are only deductible from contextual information.‌ An attempt at clarification and an implementation in‌ written and spoken French treebanks is then proposed.‌

Bruno Guillaume, in collaboration with Santiago Herrera, Ioana-Madalina‌ Silai, Caio Corro and Sylvain Kahane 33 have‌ developed a a data-driven contrastive framework to extract‌ common and distinctive linguistic descriptions from syntactic treebanks.‌ The extracted contrastive rules are defined by a‌ statistically significant difference in frequency and precision, and‌ classified as common and distinctive rules across the‌ set of treebanks. The method is illustrated by‌ working on object word order using Universal Dependencies‌ (UD) treebanks in 6 Romance languages: Brazilian Portuguese,‌ Catalan, French, Italian, Romanian and Spanish. The paper‌ discusses the limitations faced due to inconsistent annotation‌ and the feasibility of conducting contrastive studies using‌ the UD collection.

During his M2 internship, Luc‌ Cheng has applied the methodology used for contrastive‌ studies to the corpus correction application. This study‌ was conducted using written and spoken French, as‌ well as two English corpora.

In 2025, two‌ new versions of Universal Dependencies were released. Bruno‌ Guillaume collaborated with field linguists to produce or‌ improve Surface Syntactic Universal Dependencies treebanks and to‌ convert them to Universal Dependencies:

Version 2.16 on‌ May:
- new treebank for Bokota (with Marie Benzerrak‌ and Natalia Cáceres Arandia)
- new treebank for Ika (with Jana Bajorat and‌ Natalia Cáceres Arandia)
- new‌ treebank for Nenets (with‌‌ Nikolett Mus)
- enhanced mSUD treebank for Gbaya (with‌ Paulette Roulon)
Version 2.17‌ on November:
- enhanced UD‌‌ treebank for Old Egyptian (with Roberto Antonio Díaz‌ Hernández)
- new treebank for‌ Western Hausa (with Bernard‌‌ Caron)

In April and May 2025, Roberto Antonio‌ Díaz Hernández undertook a‌ three-week visit to the‌‌ LORIA, funded by an Short-Term Scientific Mission of‌ the UniDive COST action.‌ He collaborated with Bruno‌‌ Guillaume to build a Grew-match instance dedicated to‌ the annotations of the‌ Ancient Egyptian hieroglyphic text‌‌ from the pyramids: GrewPT.

In May 2025,‌ Bruno Guillaume made a‌ two-week visit to the‌‌ University of Bologna (funded by an Short-Term Scientific‌ Mission of the UniDive‌ COST action). He collaborated‌‌ with Ludovica Pannitto on a survey of the‌ annotation of ppoken data‌ in the Universal Dependencies‌‌ project.

In 35, Nikolett Mus, in collaboration‌ with Bruno Guillaume, Sylvain‌ Kahane and Daniel Zeman,‌‌ presents the development of the Tundra Nenets Universal‌ Dependencies (UD) Treebank, the‌ first syntactically annotated resource‌‌ for the Samoyedic branch of the Uralic family.‌ The treebank integrates spokenlanguage‌ data and adopts the‌‌ morphologically enhanced Surface-Syntactic UD (mSUD) framework to capture‌ inflectional morphology and morphology-based‌ syntactic relations. It further‌‌ incorporates Information Structure annotation. The methodological workflow includes‌ data selection, transcription conventions,‌ sentence and lexeme segmentation,‌‌ annotation of spoken-language features, lemmatization, treatment of morpheme‌ status, part-of-speech and morphological‌ tagging, and syntactic annotation‌‌ based on the functional and distributional properties of‌ syntactic elements. The paper‌ also outlines the principles‌‌ guiding multilevel annotation and justify the theoretical choices‌ underlying the integration of‌ prosodic, morphological, and syntactic‌‌ information.

The work on the Gbaya treebank was‌ publised in 39.‌ The paper presents the‌‌ first treebank for Gbaya, a language from the‌ under-resourced Niger-Congo family. The‌ language has a rich‌‌ system of tonal morphemes and virtually no affixes.‌ The dependency analysis is‌ based on a morpheme-based‌‌ tokenisation and the treebank is also distributed in‌ a word-based Universal Dependencies‌ version. Several constructions are‌‌ discussed in the paper: genitive construction, clause coordination,‌ sentence particles, adverbial and‌ relative clauses, serial verb‌‌ constructions, reported speech, topicalization, and focalization.

7.4.2 Citizen‌ Science

Karën Fort worked‌ with colleagues from Sorbonne‌‌ on guidelines to develop citizen science projects. These‌ guidelines were finally published‌ in a journal article‌‌ 21 and at a TALN workshop 48.‌

7.4.3 Synthetic clinical texts‌ generation

In the context‌‌ of the CODEINE ANR project and more specifically‌ of Nicolas Hiebel's PhD‌ thesis, Karën Fort worked‌‌ with Aurélie Névéol (LISN-CNRS) and Olivier Ferret (CEA)‌ on the generation of‌ synthetic clinical texts.

The‌‌ key idea of the project is to use‌ confidential corpora to automatically‌ generate anonymous synthetic texts‌‌ capable of emulating real documents from the perspective‌ of their linguistic characteristics.‌ Nicolas Hiebel worked on‌‌ a state of the art of clinical texts‌ generation that has been‌ published in a journal‌‌ 16.

Another part‌ of the project consists in using a Games‌ With A Purpose to validate and then annotate‌ the synthesized clinical texts. This game, developed by‌ Bertrand Remy, is called HostoMytho (see Section 6.1.3‌), and includes various mini-games for different annotation‌ layers, such as negation, error typing, or plausibility‌ rating. The game is multi-platform, and therefore intended‌ to be used on the web (see: online‌ HostoMytho), on Android and iOS.

7.5 Ethics‌ and biases

Participants: Karën Fort, Maxime Amblard‌, Michel Musiol, Marc Anderson, Fanny‌ Ducel, Clémentine Bleuze.

7.5.1 Ethics dissemination‌ in scientific communities

Karën Fort and Fanny Ducel,‌ together with other members of the ACL Ethics‌ committee and student volunteers to the committee, participated‌ in the creation, organization, and presentation of a‌ tutorial on ethical challenges in NLP, which took‌ place at the ACL conference in July 2025‌ 61 and attracted around 40 attendees.

Fanny Ducel,‌ under the supervision of Karën Fort and Aurélie‌ Névéol, authored a long abstract on the role‌ that applied linguistics could play to aim at‌ ethical NLP research, calling for more interdisciplinarity. This‌ work was presented in French at NÉALA, a‌ national applied linguistics conference 51.

7.5.2 Evaluating‌ stereotypes in autoregressive language models

Fanny Ducel, under‌ the supervision of Karën Fort and Aurélie Névéol,‌ and in collaboration with Nicolas Hiebel, measured gender‌ stereotypical biases in LLM-generated clinical cases, in French.‌ This work has been presented and published at‌ NAACL in English 31, and its translated‌ French version at TALN 45.

Jeffrey André,‌ under the supervision of Fanny Ducel, Karën Fort‌ and Aurélie Névéol, designed a web interface (‌Masculead) that allows users to contribute to‌ an interactive leaderboard, which is based on the‌ previously published framework for gender bias detection 70‌. This interface, as well as arguments on‌ the notion and flaws of leaderboards for language‌ models, were presented at the "Ethic and Alignment‌ of (large) Language Models" workshop, at TALN 44‌.

7.5.3 Biases in the biomedical domain

Karën‌ Fort is PI of a 4 year ANR‌ project (2023-2027), InExtenso (Intrinsic and Extrinsic evaluation of‌ biases in large language models), in collaboration with‌ Rouen's hospital (CHU) and LISN-CNRS. The project aims‌ at better identifying stereotyped biases in LLMs in‌ French and, when possible, mitigate them. Within the‌ framework of this project, Clémentine Bleuze supervised the‌ internship of M2 student Hawawou Oumarou-Tchapchet, along with‌ partners from Rouen's hospital. This internship aimed at‌ evaluating socio-demograpic biases of a french LLM in‌ a medical classification task.

Under the supervision of‌ Karën Fort and Aurélie Névéol, and in collaboration‌ with Vincent Martin, Clémentine Bleuze conducted a literature‌ review on the subject of LLM-assisted mental health‌ prediction tasks, which has been submitted to the‌ Journal of Medical Internet Research (JMIR).

7.5.4 NLP‌ for NLP and Ethics

Clémentine Bleuze continued the‌ work initiated during her M2 internship in collaboration with Fanny Ducel and‌ under the supervision of‌ Karën Fort and Maxime‌‌ Amblard. This work explored the notion of scientific‌ overclaiming (when researchers inadequately‌ interpret or present elements‌‌ of their research) in NLP papers. It also‌ led to the definition‌ of a taxonomy of‌‌ relevant research claims, the constitution of a corpus‌ of NLP claims originating‌ from ArXiv and ACL‌‌ papers (a subpart of which has been human-annotated),‌ and the training of‌ BERT-based models to predict‌‌ claim types. This research, along with new results‌ about typical claim patterns‌ used in research papers,‌‌ was presented at TALN 2025 as a poster‌ 41.

Karën Fort‌ and Vincent Martin conducted‌‌ two automatic lexical analysis of the words censored‌ by the Trump administration‌ in the scientific litterature,‌‌ respectively related to mental health 47 and sleep‌ health 17. The‌ results of these studies,‌‌ combining lexical networks and temporal analyses, demonstrates the‌ impossibility to produce scientific‌ data – and consequently‌‌ to produce global health policies based on these‌ missing data – without‌ the vocabulary under censure‌‌ in the Trump administration.

8 Bilateral contracts and‌ grants with industry

8.1‌ Bilateral contracts with industry‌‌

Maxime Amblard pursue a collaboration with the French‌ Company Namkin. The industry‌ faces numerous challenges that‌‌ necessitate the evolution of BtoB marketing tools, in‌ order to develop a‌ valuable offer and provide‌‌ an enhanced customer experience. Namkin's BrainLab develops industrial‌ marketing tools for digitalizing‌ customer relations, evolving business‌‌ models, and exploiting business and economic data for‌ business development. One of‌ the key challenges of‌‌ marketing intelligence is to identify risks and opportunities‌ so as to guide‌ marketing strategies. Among the‌‌ sources of information useful to detect risks and‌ opportunities, Namkin has identified‌ Business Events, that is,‌‌ “textually reported real-world occurrences, actions, relations, and situations‌ involving companies and firms”.‌ Un postdoctorant, Georgios Zervakis,‌‌ chez Namkin et un ingénieur, Sullivan Benard ont‌ participé à la collaboration.‌

9 Partnerships and cooperations‌‌

9.1 International research visitors

9.1.1 Visits of international‌ scientists

Casey Kennington

Status‌
Researcher
Institution of origin:‌‌
Boise State University
Country:
USA
Dates:
25-29 march‌ 2025
Context of the‌ visit:
invitation to give‌‌ a seminar
Mobility program/type of mobility:
research stay‌

Aarne Ranta

Status
Professor‌
Institution of origin:
University‌‌ of Gothenburg
Country:
Sweden
Dates:
22-25 july 2025‌
Context of the visit:‌
Collaboration in the context‌‌ of the Malinca project
Mobility program/type of mobility:‌
Invitation

Díaz Hernández Roberto‌ Antonio

Status
Researcher
Institution‌‌ of origin:
Universidad de Jaén
Country:
Spain
Dates:‌
28 april - 16‌ may 2025
Context of‌‌ the visit:
development of NLP tools for Old‌ Egyptian
Mobility program/type of‌ mobility:
Short Term Scientific‌‌ Mission (STSM) funded by UniDive

9.2 European initiatives‌

9.2.1 Horizon Europe

MALINCA‌

Participants: Philippe de Groote‌‌.

MALINCA project on cordis.europa.eu

Title:
Mathematicae Lingua‌ Franca: Bridging the Linguistic‌ Gap Between the Mathematician‌‌ and the Machine
Duration:
From March 1, 2025‌ to February 28, 2031‌
Partners:
- Institut National de‌‌ Recherche en Informatique et‌ Automatique (Inria), France
- Universidad Pontificia Comillas (Comillas), Spain‌
- Université Paris Cité (UPCité), France
- Centre National de‌ la Recherche Scientifique (CNRS), France
Inria contact:
Hugo‌ Herbelin
Summary:
In the recent years, proof assistants‌ have shown their astounding ability to tackle the‌ complete formalisation of large pieces of mathematics, with‌ the celebrated certifications of the Feit-Thompson theorem, of‌ the Kepler conjecture, and more recently, the resolution‌ of Scholze liquid tensor challenge. We believe that‌ the time is ripe to demonstrate that they‌ can tackle mathematics in the flexible and semi-formal‌ way it is created and exchanged by the‌ mathematicians. To that purpose, we aim to develop‌ proof assistant technologies of an entirely new nature,‌ including a formal language and a foundational approach‌ to mathematical meaning, with the versatility necessary to‌ represent the dynamic linguistic structures to be found‌ in the daily practice of mathematics. The result‌ will be a linguistic front-end that will allow‌ mathematicians, and scientists in general, to express in‌ proof assistants their proofs and computations the semi-formal‌ way they think of them. Three research tracks‌ stand out: the mathematical and linguistic foundations; formalisation‌ of real-world vernacular mathematics into a high-level language‌ of representation (Godement challenge); new techniques and software‌ tools, based on natural language processing, to automate‌ the formalisation process. The translation in the machine‌ of semi-formal mathematics needs to go beyond the‌ traditional view that reduces reasoning to logic, and‌ requires to understand the dynamics of the discursive‌ linguistic process which underlines mathematics. Building on advances‌ of linguistics, mathematical logic, programming language semantics and‌ machine learning, we will contribute significantly to the‌ rise of a new generation of proof assistants,‌ integrating at their heart a linguistic layer and‌ automated guidance tools for mathematical proofs, theorems and‌ definitions. The resulting high-level manipulation of concepts will‌ lead to novel research outcomes supporting the daily‌ activity of mathematical scientists.

9.2.2 Other european programs/initiatives‌

Bruno Guillaume is a member of the core‌ group of the COST action: CA21167 - Universality,‌ diversity and idiosyncrasy in language technology (UniDive).‌ He is the leader of the working group‌ named "Corpus Annotation".

9.3 National initiatives

9.3.1 ANR‌ Project: InExtenso

Participants: Karën Fort, Maxime Amblard‌, Michel Musiol, Fanny Ducel.

Title:‌
Intrinsic and Extrinsic evaluation of biases in large‌ language models
Duration:
10 2023–09 2027
Coordinator:
Karën‌ Fort
Partners:
CHU Rouen, LISN, LORIA
Participants:
Maxime‌ Amblard, Fanny Ducel, Karën Fort (coordinator), Michel Musiol,‌ Miguel Couceiro
Abstract:
Large Language Models (LLM) are‌ the Swiss Army knife of today’s Natural Language‌ Processing (NLP). They often outperform the state-of-the-art on‌ benchmarks commonly used in the field for tasks‌ such as part-of-speech tagging, text classification and named-entity‌ recognition, thus paving the way to a myriad‌ of end-user applications. However, it has been shown‌ that LLM exhibit major ethical issues including significant‌ environmental impact, mirroring and amplification of stereotyped biases,‌ which in turn have a disproportionate impact on historically disadvantaged social groups.‌ It is urgent to‌ address the social impact‌‌ of NLP as the applications we develop, such‌ as chatGPT, are now‌ directly made available to‌‌ end users. The detection and mitigation of biases‌ have therefore become an‌ active area of research‌‌ in the past few years, focusing mainly on‌ Masked Language Models (MLM)‌ such as BERT in‌‌ English and the North American social context. Several‌ sources of bias were‌ identified in the NLP‌‌ pipeline. However the interconnection between sources and overall‌ impact of each source‌ on downstream applications remains‌‌ unclear. In this project, we want to observe‌ the entire pipeline, from‌ the intrinsic point of‌‌ view (within the model itself), to the pre-training‌ task point of view‌ (in the case of‌‌ autoregressive LLM, text generation), on to some real-world‌ downstream applications. We chose‌ to focus on two‌‌ types of medical applications: mental illness diagnosis help‌ and information extraction from‌ clinical records for public‌‌ health purposes such as patient enrollment into clinical‌ trials. The project will‌ provide corpora and methods‌‌ for a global evaluation of bias in LLM‌ in French as well‌ as studies to further‌‌ the understanding of biases in clinical NLP pipelines‌ and the environmental impact‌ of the integration of‌‌ these models in digital health.

9.3.2 ANR Project:‌ CoDeinE

Participants: Karën Fort‌, Bruno Guillaume,‌‌ Bertrand Remy.

Title:
artificial text COrpus DEsIgNed‌ Ethically automatic synthesis of‌ clinical documents
Duration:
03‌‌ 2021–02 2026
Coordinator:
Aurélie Névéol (Limsi)
Partners:
CRC,‌ CEA List, LISN, LORIA‌
Participants:
Bruno Guillaume, Karën‌‌ Fort (local coordinator), Bertrand Remy
Abstract:
Machine learning‌ methods have become prevalent‌ in language technologies. They‌‌ rely on annotated corpora to train models and‌ evaluate algorithms. The CoDeinE‌ project proposes to address‌‌ the lack of shareable corpora in sensitive domains‌ such as health or‌ banking. The key idea‌‌ of the project is to use confidential corpora‌ to automatically generate synthetic‌ texts that mimic the‌‌ linguistic properties of real documents while preserving confidentiality.‌ We will use clinical‌ documents in electronic patient‌‌ records as a case study. Furthermore, the project‌ will rely on Games‌ With A Purpose and‌‌ crowd sourcing to validate and annotate the synthesized‌ texts.

9.3.3 ANR Project:‌ Autogramm

Participants: Bruno Guillaume‌‌, Karën Fort, Khensa Amani Daoudi.‌

Title:
Induction of descriptive‌ grammar from annotated corpora‌‌
Duration:
01 2022–12 2025
Coordinator:
Sylvain Kahane (Université‌ Paris Nanterre)
Partners:
MoDyCo,‌ LACITO, LISN, Inria Nancy‌‌ – Grand Est
Participants:
Bruno Guillaume (local coordinator),‌ Karën Fort
Abstract:
The‌ goal of this project‌‌ is to automate, as far as possible, the‌ extraction of descriptive grammars‌ and grammatical descriptions from‌‌ annotated corpora for linguistic and typological studies. The‌ project also promotes the‌ development of treebanks for‌‌ under-endowed languages, in order to extract quantitative descriptive‌ grammars for these languages.‌ The project uses the‌‌ annotation scheme SUD (Surface-syntactic Universal Dependencies), the‌ query tool Grew-match and‌ the annotation tool ArboratorGrew‌‌.

9.3.4 ANR Project:‌ CODIM

Participants: Maxime Amblard, Jacques Jayez.‌

Title:
Compositionality and discourse markers
Duration:
01 2023–12‌ 2026
Coordinator:
Mathilde Dargnat (Université de Lorraine and‌ ATILF)
Partners:
ATILF, LLF, LORIA
Participants:
Maxime Amblard,‌ Jacques Jayez
Abstract:
The CODIM project focuses on‌ the two main linguistic resources for organizing monologues‌ or conversations in human languages : D(iscourse) M(arkers)(‌therefore/donc, well/ben, bon etc. in English/French)‌ and prosody (in particular, intonation). It will evaluate‌ their status with respect to two major views‌ on communication: compositionality (the possibility of combining meaningful‌ expressions into more complex meaningful expressions) and pattern‌ or construction-based approaches (the idea that language users‌ exploit partly `frozen’ strings of words). We will‌ compare the semantic and prosodic properties of simple‌ and complex French DM (e.g. ah + bon‌) found in corpora for written and spoken‌ French, using a variety of technical tools for‌ DM identification (category-driven text mining), clustering (statistics and‌ Machine Learning) and research in prosody (duration and‌ intensity measures, contour representation). The project fosters a‌ number of collaborations between linguists and computer scientists.‌

9.3.5 PEPR Project Digital Health: Autonom Health

Participants:‌ Maxime Amblard, Michel Musioil, Vincent Martin‌.

Title:
Autonom Health
Duration:
06 2023–12 2030‌
Coordinator:
Pierre Philip (Université de Bordeaux)
Partners:
LABRI,‌ Sanpsy, LORIA, ISIR, CES, LIRIS
Participants:
Maxime Amblard,‌ Michel Musiol, Vincent Martin
Abstract:
Western populations face‌ an increase of longevity which mechanically increases the‌ number of chronic disease patients to manage. Current‌ healthcare strategies will not allow to maintain a‌ high level of care with a controlled cost‌ in the future and E health can optimize‌ the management and costs of our health care‌ systems. Healthy behaviors contribute to prevention and optimization‌ of chronic diseases management, but their implementation is‌ still a major challenge. Digital technologies could help‌ their implementation through numeric behavioral medicine programs to‌ be developed in complement (and not substitution) to‌ the existing care in order to focus human‌ interventions on the most severe cases demanding medical‌ interventions.

10 Dissemination

10.1 Promoting scientific activities

10.1.1‌ Scientific events: organisation

Vincent Martin has been moderator‌ for a session from the Société Médico-Psychologique entitled‌ “La psychiatrie à ses frontières”, 09 2025, Bordeaux,‌ France.
Vincent Tourneur has organized the Loria PhD‌ seminar (8 presentations during the year).

General chair,‌ scientific chair

Sylvain Pogodalla: scientific co-chair of the‌ 16th International Conference on Computational Semantics, 09‌ 22–23, 2025, Düsseldorf, Germany, 57.
Karën Fort:‌ Ethics co-chair of the ACL 2025 conference.

10.1.2‌ Scientific events: selection

Chair of conference program committees‌

Maxime Amblard: chair of the workshop 4AS Atelier‌ sur les Avancées en AMR et en Analyse‌ Sémantiques colocated with TALN 2025.
Sylvain Pogodalla: co-chair‌ of the program committee for the journéesImpact‌ de la science ouverte sur la recherche et‌ les pratiques scientifiques, 01 27–29, 2026, Nancy,‌ France.

Member of the conference program committees

Vincent‌ Martin: member of the conference program committees for the Journée d’étude sur‌ les technologies linguistiques pour‌ les langues peu dotées‌‌ (AFIA/AFCP), 12 2025, Paris, France.

Reviewer

Philippe de‌ Groote: reviewer for SCiL‌ 2025, MOL 2025‌‌, IWCS 2025.
Iglika Zlatkova Nikolova-Stoupak: reviewer‌ for 2nd UniDive training‌ school 01 2026, Yerevan,‌‌ Armenia.

10.1.3 Journal

Member of the editorial boards‌

Maxime Amblard: editor in‌ chief of the Revue‌‌ Traitement Automatique des Langues.
Sylvain Pogodalla: Member‌ of the editorial board‌ of the journal Traitement‌‌ Automatique des Langues, in charge of the‌ Résumés de thèses section.‌
Philippe de Groote: Area‌‌ editor of the FoLLI-LNCS series.

Reviewer -‌ reviewing activities

Maxime Amblard:‌ reviewer for the conferences:‌‌ ACL, COLM, ECAI, IWCS,‌ LREC, TAL,‌ reviewer for the workshop:‌‌ ISA-21, LARP, Lexique and reviwer for‌ the journal Mathematical Structures‌ in Computer Science
Philippe‌‌ de Groote: reviewer for the journal Logical Methods‌ in Computer Science.‌
Karën Fort: reviewer for‌‌ ACL 2025 and ACM FAcct 2025.
Vincent Martin:‌ reviewer for Interspeech,‌ ICASSP and the Journal‌‌ of Internet Medical Research
Sylvain Pogodalla: reviewing for‌ the Journal of Language‌ Modelling.
Amandine Decker:‌‌ reviewer for SemDial and sub-reviewer for ECAI.‌

10.1.4 Invited talks

Philippe‌ de Groote gave an‌‌ invited talk at the Conference on Mathematical and‌ Computational Linguistics for Proofs‌26.

Karën Fort‌‌ was invited to give a keynote speech at‌ the Italian NLP conference‌ CLiC-it in Sept. 2025‌‌ on the subject of "Large Language Models: the‌ challenge of evaluation" 22‌.

Karën Fort was‌‌ invited to give a keynote speech at the‌ Association française de linguistique‌ appliquée (AFLA) conference: Naturel‌‌ et Artificiel en Linguistique Appliquée : une époque‌ de paradoxes – Neala‌25, in Nancy,‌‌ in July 2025, on the subject of "Les‌ grands modèles de langue‌ : des outils situés".‌‌

Karën Fort was invited to give a speech‌ at the Conseil Scientifique‌ of the Institut CNRS‌‌ in Computer Science, in Paris, in March 2025,‌ on the subject of‌ "Les grands modèles de‌‌ langue : les défis de l'évaluation." 23.‌

Fanny Ducel was invited‌ to give a presentation‌‌ about her research on stereotypical biases in LLMs‌ to the work group‌ "Intelligence Artificielle Soutenable, Intelligible‌‌ et Vérifiable" of Université Paris-Saclay.

Vincent Martin was‌ invited to give a‌ talk at the French‌‌ National Sleep Medicine Congress: `Enjeux des modélisations pour‌ aborder la sémiologie du‌ sommeil', 11 2025, Congrès‌‌ du Sommeil, Strasbourg

10.1.5 Leadership within the scientific‌ community

Maxime Amblard is‌ PI of INSIGHT project‌‌ (Initiative d'Excellence Lorraine - PIA).
Vincent Martin is‌ member from the steering‌ comitee of the Collège‌‌ Technologies du Langage Humain (TLH) from the Association‌ française pour l’Intelligence Artificielle‌ (AfIA) since 09 2024.‌‌
Karën Fort is PI of the GDR LIFT‌ 2.

10.1.6 Scientific expertise‌

Vincent Martin: member of‌‌ the evaluation comitee for the “IA, HEalth and‌ Biology” for the French‌ Research Agence (ANR -‌‌ Appel à projet TSIA).‌
Sylvain Pogodalla: evaluation for the Inria Quadrant Programme‌, evaluation for the ANR generic call for‌ proposals 2025.

10.1.7 Research administration

Maxime Amblard:‌
- Member of CNU 27 (Computer Science)
- Head of‌ the master in Natural Language Processing
Karën Fort:‌
- Elected member of the Conseil de Pôle AM2I‌
- Chair of the Ethics committee of the ENACT‌ AI cluster
- Member of the Steering Committee of‌ the INSIGHT project
Sylvain Pogodalla:
- Elected member of‌ the comité de centre Inria Nancy – Grand‌ Est.
- In charge of the local commission IES‌ (information et édition scientifique) of the‌ Inria Nancy – Grand Est and LORIA.
- Member‌ of the national commission IES of Inria.

10.2‌ Teaching - Supervision - Juries - Educational and‌ pedagogical outreach

10.2.1 Teaching

Licence:
- Maxime Amblard, AI‌ Introduction, 14h, L1, Université de Lorraine, France.
- Maxime‌ Amblard, Ethical aspects of NLP, 10h, L3, Université‌ de Lorraine, France.
- Maxime Amblard, Human in the‌ loop, 10h, L3, Université de Lorraine, France.
- Karën‌ Fort, De l'écrit à l'information, 20h, L1 MIASHS,IDMC,‌ Université de Lorraine, France.
- Karën Fort, Outils pour‌ l'analyse linguistique, 25h, L3 MIASHS,IDMC, Université de Lorraine,‌ France.
- Hee-Soo Choi and Fanny Ducel, De l'écrit‌ à l'information, 5h, L1 MIASHS,IDMC, Université de Lorraine,‌ France.
- Hee-Soo Choi, Langages de Script, 20h, L1‌ MIASHS, IDMC, Université de Lorraine, France.
- Hee-Soo Choi,‌ Initiation aux Bases de Données, 24h, L1 MIASHS,‌ IDMC, Université de Lorraine, France.
- Hee-Soo Choi, Bases‌ de Données Avancées, 28h, L2 MIASHS, IDMC, Université‌ de Lorraine, France.
- Hee-Soo Choi, Suivi de stages,‌ 4h, L3, IDMC, Université de Lorraine, France.
- Hee-Soo‌ Choi, Algorithmique et Programmation Impérative, 30h, L1 Informatique,‌ FST, Université de Lorraine, France.
- Hee-Soo Choi, Algorithmique‌ et Programmation, 20h, L1 Informatique, FST, Université de‌ Lorraine, France.
- Hee-Soo Choi, Programmation, 36,7h, L1 Mathématiques,‌ FST, Université de Lorraine, France.
- Hee-Soo Choi, Algorithmique‌ et Programmation, 36,4h, L1 SPI, FST, Université de‌ Lorraine, France.
- Vincent Tourneur, Administration UNIX, 24h, L2,‌ IUT Charlemagne, Université de Lorraine, France.
- Vincent Tourneur,‌ Compilation, 40h, L3, IUT Charlemagne, Université de Lorraine,‌ France.
- Marie Cousin, Recherche Opérationnelle, 4h, L3, École‌ des Mines de Nancy, Université de Lorraine, France.‌
- Clémentine Bleuze, Ingénierie de la langue, 15h, L3‌ MIASHS option TAL, IDMC, Université de Lorraine, France.‌
- Maxime Amblard and Clémentine Bleuze, Découverte du traitement‌ des données langagières, 30h, L3 MIASHS option TAL,‌ IDMC, Université de Lorraine, France.
- Clémentine Bleuze, Découverte‌ du traitement des données langagières, 15h, L2 MIASHS‌ option TAL, IDMC, Université de Lorraine, France.
- Iglika‌ Zlatkova Nikolova-Stoupak, Découverte du traitement des données langagières,‌ 15h, L2 MIASHS option TAL, IDMC, Université de‌ Lorraine, France.
Master:
- Maxime Amblard and Amandine Decker,‌ Methods for NLP, 20h, M1 NLP (IDMC), Université‌ de Lorraine, France.
- Maxime Amblard, NLP project, 30h,‌ M1 NLP (IDMC), Université de Lorraine, France.
- Maxime‌ Amblard and Amandine Decker, Dialogue ChatBot and Question‌ Answering, 28h, M2 NLP (IDMC), Université de Lorraine,‌ France.
- Karën Fort, Written Corpora (English), 37.5h, M1‌ NLP (IDMC), Université de Lorraine, France.
- Clémentine Bleuze, Written corpora (English), 16h,‌ Master M1 NLP (IDMC),‌ Université de Lorraine, France.‌‌
- Karën Fort, Software Projects (English), 25h, M2 NLP‌ (IDMC), Université de Lorraine,‌ France.
- Karën Fort, Python‌‌ Programming (English), 37.5h, M1 NLP (IDMC), Université de‌ Lorraine, France.
- Philippe de‌ Groote, Formal Logic, 22h,‌‌ M1 NLP (IDMC), Université de Lorraine, France.
- Philippe‌ de Groote, Formal languages,‌ 22h, M1 NLP (IDMC),‌‌ Université de Lorraine, France.
- Philippe de Groote, Semantics,‌ 22h, M2 NLP (IDMC),‌ Université de Lorraine, France.‌‌
- Karën Fort, Clémentine Bleuze, Ethics and NLP (English),‌ 19h, M1 NLP (IDMC),‌ Université de Lorraine, France.‌‌
- Karën Fort, Ethics (English), 25h, M2 NLP (IDMC),‌ Université de Lorraine, France.‌
- Karën Fort, Génie logiciel,‌‌ 56.25h, M1 MIAGE,IDMC, Université de Lorraine, France.
- Bruno‌ Guillaume, Lexical Resources (English),‌ 15h, M2 NLP (IDMC),‌‌ Université de Lorraine, France.
- Vincent Martin, Speech processing‌ (English), 14h, M2 NLP‌ (IDMC), Université de Lorraine,‌‌ France
- Vincent Martin, Signal processing (English), 12h, M2‌ NLP (IDMC), Université de‌ Lorraine, France
- Vincent Martin,‌‌ NLP projects (English), 3h, M1 NLP (IDMC), Université‌ de Lorraine, France
- Vincent‌ Martin, Critical analysis of‌‌ artificial intelligence for health (English), 6h, Master 2‌ Health Engineering, Université Grenoble‌ Alpes, France
- Vincent Martin,‌‌ Back to the big wide world: how to‌ integrate digital tools into‌ clinical practice? (English), 6h,‌‌ Master 2 Health Engineering, Université Grenoble Alpes, France‌
- Vincent Martin, Quelques éléments‌ de STS, 2h, Licence-Master‌‌ Science de la Santé, Université de Bordeaux, France‌
- Sylvain Pogodalla, Semantics, 10h,‌ M1 NLP (IDMC), Université‌‌ de Lorraine, France
- Sylvain Pogodalla and Amandine Decker,‌ Syntactic Models, 20h, M2‌ NLP (IDMC), Université de‌‌ Lorraine, France
- Fanny Ducel, Software Projects (English), 10h,‌ M2 NLP (IDMC), Université‌ de Lorraine, France.
- Fanny‌‌ Ducel, Python Programming (English), 14h, M1 NLP (IDMC),‌ Université de Lorraine, France‌
- Fanny Ducel, Project Management‌‌ Tools (English), 8h, M1 NLP (IDMC), Université de‌ Lorraine, France.
- Clémentine Bleuze,‌ NLP for low-resource language‌‌ (English), 8h, Master M2 NLP (IDMC), Université de‌ Lorraine, France.
- Maxime Amblard‌ and Amandine Decker, Introduction‌‌ to NLP, M1 NLP (IDMC), Université de Lorraine,‌ France.
- Maxime Amblard and‌ Amandine Decker, Dialogue Engineering,‌‌ 14h, M2 NLP (IDMC) LI, Université de Lorraine,‌ France.
- Maxime Amblard and‌ Amandine Decker, Discourse, 14h,‌‌ M2 NLP (IDMC), Université de Lorraine, France.
- Amandine‌ Decker and Maxime Amblard,‌ Dialogue Engineering, 14h, M2‌‌ NLP (IDMC), Université de Lorraine, France.
- Marie Cousin,‌ Foundation of Computing, 14h,‌ M1, École des Mines‌‌ de Nancy, Université de Lorraine, France.
Doctorate:
- Maxime‌ Amblard Introduction to AI,‌ Doctoral School SLTC, Université‌‌ de Lorraine, 2 x 7h
Tutorials:
- Karën Fort,‌ Fanny Ducel, Navigating Ethical‌ Challenges in NLP: Hands-on‌‌ strategies for students and researchers 61
International Summer‌ School:

10.2.2 Supervision

PhD‌ defended in 2025

Maxime‌‌ Guillaume, Structures de traits pour les Grammaires Catégorielles‌ Abstraites, since 07‌ 2021. Supervision: Philippe de‌‌ Groote and Raphaël Salmon (Yseop).
Santiago Herrera, Extraction‌ de grammaires descriptives à‌ partir de corpus annotés‌‌ en syntaxe, since 09 2022. Supervision: Sylvain‌ Kahane (MoDyCo, Université Paris‌ Nanterre) and Bruno Guillaume.‌‌
Nicolas Hiebel, Création éthique‌ de données textuelles artificielles : application au domaine‌ biomédical, since 10 2021. Supervision: Aurélie Névéol‌ (LISN-CNRS), Karën Fort and Olivier Ferret (CEA).
Siyana‌ Pavlova, Tools and Methods for Semantic Annotation,‌ since 11 2020. Supervision: Maxime Amblard and Bruno‌ Guillaume.

PhD in progress

Vincent-Thomas Barrouillet, Le discours‌ pathologique du sujet schizophrène, caractérisation psycholinguistique et computationnelle‌ des déviations décisives à la logicité dialogique en‌ étude de corpus, since 10 2019. Supervision:‌ Michel Musiol and Maxime Amblard.
Clémentine Bleuze, Perception‌ et évaluation des biais dans les applications des‌ LLM au domaine biomédical, since 10 2024.‌ Supervision: Karën Fort and Aurélie Névéol (LISN-CNRS).
Colleen‌ Beaumard, Biomarqueurs vocaux collectés par des agents conversationnels‌ pour l'aide au diagnostic et le suivi des‌ troubles du sommeil et des troubles mentaux,‌ since 10 2022. Supervision: Jean-Luc Rouas (Université de‌ Bordeaux, LaBRI), Pierre Philip (Université de Bordeaux, SANPSY)‌ and Vincent Martin.
Elio Stasica, Diagnostic différentiel d'infarctus‌ à partir de la parole, since 9‌ 2025. Supervision: Emmanuel Vincent (Multispeech), Romain Serizel (Multispeech),‌ and Vincent Martin.
Hee-Soo Choi, Lier des ressources‌ lexicales du français en vue d'une interopérabilité entre‌ niveaux linguistiques, since 10 2021. Supervision: Karën‌ Fort and Mathieu Constant.
Marie Cousin, Modélisation de‌ paraphrase dans les grammaires catégorielles abstraites, since‌ 10 2022. Supervision: Philippe de Groote and Sylvain‌ Pogodalla.
Amandine Decker, Modelling Topic-level Interaction in Pathological‌ Conversations, since 10 2022. Supervision: Maxime Amblard‌ and Ellen Breitholtz (University of Gothenburg, Sweden).
Fanny‌ Ducel, Evaluating stereotyped biases in auto-regressive language models‌, since 10 2023. Supervision: Karën Fort and‌ Aurélie Névéol (LISN-CNRS).
Amandine Lecomte, Analyse longitudinale de‌ prise en charge psychothérapeutique de patients psychiatriques et‌ de patients atteints de maladies neurodégénératives : informatisation‌ et modélisation dialogique des indices comportementaux associés à‌ l’efficacité (vs échec) des stratégies de prise en‌ charge tentées par les thérapeutes, since 10‌ 2019. Supervision: Michel Musiol and Alexandra König.
Valentin‌ Richard, Aspects dynamiques et présuppositionnels des questions,‌ since 09 2021. Supervision: Philippe de Groote, Floris‌ Roelofsen and Reinhard Muskens (Universiteit van Amsterdam, ILLC).‌
Vincent Tourneur, Algorithmes d’analyse syntaxique pour les grammaires‌ catégorielles abstraites, since 10 2024. Supervision: Philippe‌ de Groote.

10.2.3 Other supervisions

Karën Fort and‌ Fanny Ducel supervised six M1 students during their‌ 2-month internship at LORIA. Four of these students‌ worked on the stereotypes present in benchmarks used‌ for LLMs, while the two others developed a‌ method to measure racist biases in reaction to‌ the presence of code-switching in LLM prompts. Karën‌ Fort and Fanny Ducel also supervised two L3‌ interns, one of whom worked on the code-switching‌ project, and the second one developed an interface‌ based on previous work on biases by Karën‌ Fort and Fanny Ducel. This work was published‌ in a TALN workshop 44.

10.2.4 Juries‌

Karën Fort, Maxime Amblard, Bruno Guillaume: NLP Master‌ 1 and 2 juries (IDMC)
Maxime Amblard was‌ reviewer, president and member of the PhD jury of Zacchary Sadeddine, Meaning‌ Representation Frameworks and Reasoning‌ in the Era of‌‌ LLMs, under the supervision of Fabian Suchanek (Telecom‌ Paris), Institut polytechnique de‌ Paris, 10 octobre 2025‌‌
Maxime Amblard was reviewer od Jarom´ır Salamon, Influencing‌ text generation by biological‌ signal Roman Mouˇcek, University‌‌ of West Bohemia, aout 2025.
Maxime Amblard was‌ president and member of‌ the PhD jury of‌‌ Aman Sinha (président), Evaluation of Medical Language Models,under‌ the supervision of Marianne‌ Clausel, Mathieu Constant, Université‌‌ de Lorraine, 12 décembre 2025
Maxime Amblard was‌ president and member of‌ the PhD jury of‌‌ William eduardo Soto martinez (président), Multilingual Graph-to-Text Generation‌ and Evaluation, under the‌ supervision of Claire Gardent‌‌ (DR CNRS), Yannick Parmentier (Université de Lorraine), 07‌ octobre 2025

10.2.5 Educational‌ and pedagogical outreach

Marie‌‌ Cousin and Amandine Decker: animation of a MATh.en.JEANS‌ workshop within Edmond de‌ Goncourt secondary school in‌‌ Pulnoy.
Karën Fort presented her work on ethics‌ of AI to CPGE‌ students from Lycée Poincaré,‌‌ LORIA, Nancy, Ethics of AI from an NLP‌ point of view :‌ the good, the bad‌‌ and the evaluation. January 2025.

10.3 Popularization‌

10.3.1 Productions (articles, videos,‌ podcasts, serious games, ...)‌‌

Karën Fort was interviewed for La Recherche Magazine‌ (January–March 2026) on LLM‌ agents.
Karën Fort was‌‌ interviewed for Chut!, Imaginons des systèmes plus‌ petits et plus ciblés‌, January 2025
Maxime‌‌ Amblard was interviewed by cortex.com for the kick-off‌ event of UNYS
Maxime‌ Amblard was interviewed by‌‌ newstank.com for the INSIGHT project

10.3.2 Participation in‌ Live events

Maxime Amblard‌ participate in the event‌‌ le Procès du Robot, at Lycée Loritz, 2025-02-28‌
Fanny Ducel gave a‌ presentation about her projects‌‌ on stereotypical biases in LLMs at the Université‌ Champagne-Ardenne, in the context‌ of its AI Week‌‌ and of "Fête de la Science".
Amandine Decker:‌ 2025-02-07, participation in FIRST‌ (Femmes Ingénieures, Réussir en‌‌ Sciences et Technologies), présentation de la recherche à‌ des élèves (filles) de‌ seconde (Lycée Fabert, Metz,‌‌ France),
Hee-Soo Choi and Fanny Ducel: 2025-02-27, participation‌ to the Elles Bougent‌ : Filles - Maths‌‌ et Science day to promote scientific studies and‌ careers to 150 female‌ students from 40 middle‌‌ schools.
Marie Cousin: 2025-02-27, participation to the Grand-Est‌ edition of "Sciences, un‌ métier de femmes", presentation‌‌ of what research in computer science is, interactions‌ with high school female‌ students (FST, Nancy).
Marie‌‌ Cousin: 2025-01-31, presentation to high school students in‌ the context of the‌ "Chiche !" initiative (Lycée‌‌ des métiers du tertiaire Jean-Victor Poncelet Saint-Avold, France),‌
Marie Cousin: 2025-09-20, participation‌ in “Journées européennes du‌‌ Matrimoine” (Féru des Sciences, Nancy, France).

11 Scientific‌ production

11.1 Major publications‌

1 inproceedingsM.Mohamed‌‌ Abdalla, J. P.Jan Philip Wahle,‌ T.Terry Ruas,‌ A.Aurélie Névéol,‌‌ F.Fanny Ducel, S. M.Saif M.‌ Mohammad and K.Karën‌ Fort. The Elephant‌‌ in the Room: Analyzing the Presence of Big‌ Tech in Natural Language‌ Processing Research.Proceedings‌‌ of the 61st Annual‌ Meeting of the Association for Computational LinguisticsVolume 1:‌ Long Papers61st Annual Meeting of the Association‌ for Computational Linguistics1Toronto, CanadaAssociation for‌ Computational Linguitics2023, 13141-13160HAL
2 article‌M.Marc Anderson and K.Karën Fort.‌ Human Where? A New Scale Defining Human Involvement‌ in Technology Communities from an Ethical Standpoint.‌International Review of Information EthicsAugust 2022HAL‌
3 articleG.Guillaume Bonfante and B.Bruno‌ Guillaume. Non-size increasing Graph Rewriting for Natural‌ Language Processing.Mathematical Structures in Computer Science‌28082018, 1451--1484HAL DOI back‌ to text
4 bookG.Guillaume Bonfante,‌ B.Bruno Guillaume and G.Guy Perrier.‌ Application of Graph Rewriting to Natural Language Processing‌.1Logic, Linguistics and Computer Science Set‌ISTE Wiley2018, 272HAL back to‌ text
5 articleF.Fanny Ducel, A.‌Aurélie Névéol and K.Karën Fort. "You'll‌ be a nurse, my son!" Automatically Assessing Gender‌ Biases in Autoregressive Language Models in French and‌ Italian.Language Resources and EvaluationOctober 2024‌HAL DOI
6 articleP.Philippe de Groote‌ and M.Makoto Kanazawa. A Note on‌ Intensionalization.Journal of Logic, Language and Information‌2222013, 173-194HAL DOI
7‌ inproceedingsA.Aurélie Névéol, Y.Yoann Dupont‌, J.Julien Bezançon and K.Karën Fort‌. French CrowS-Pairs: Extending a challenge dataset for‌ measuring social bias in masked language models to‌ a language other than English.ACL 2022‌ - 60th Annual Meeting of the Association for‌ Computational LinguisticsDublin, IrelandMay 2022HAL
8‌ articleS.Sylvain Pogodalla. A syntax-semantics interface‌ for Tree-Adjoining Grammars through Abstract Categorial Grammars.‌Journal of Language Modelling532017,‌ 527--605HAL DOI back to text
9 article‌R.Robert Reinecke, T. A.Tatjana A‌ Nazir, S.Sarah Carvallo and J.Jacques‌ Jayez. Factives at hand: When presupposition mode‌ affects motor response.Journal of Experimental Psychology‌2022HAL DOI

11.2 Publications of the year‌

International journals

10 articleV.-T.Vincent-Thomas Barrouillet,‌ M.Maxime Amblard, S.Sadeq Haouzir,‌ T.Thibault Delage and M.Michel Musiol.‌ The bipolarity-schizophrenia continuum hypothesis assessed from the psycholinguistic‌ perspective of discourse discontinuities.Annales Médico-Psychologiques, Revue‌ PsychiatriqueJanuary 2025HALDOI back to text‌back to text
11 articleJ.Julien Coelho‌, J.-A.Jean-Arthur Micoulaud-Franchi, V.Vincent P.‌ Martin, P.-A.Pierre-Alexis Geoffroy, P.Patrice‌ Bourgin, P.Pierre Philip and J.Jacques‌ Taillard. La santé circadienne à la croisée‌ de la physiologie et des comportements.Biologie‌ Aujourd'hui2191-2July 2025, 1-13HAL‌DOI back to text
12 articleJ.Julien‌ Coelho, J.-A.Jean-Arthur Micoulaud-Franchi, V.Vincent‌ P. Martin, P.-A.Pierre-Alexis Geoffroy, P.‌Patrice Bourgin, P.Pierre Philip and J.‌Jacques Taillard. Republication de : La santé circadienne à la croisée‌ de la physiologie et‌ des comportements.Médecine‌‌ du sommeilJuly 2025HAL DOI back to‌ text
13 articleJ.‌Julien Coelho, V.‌‌Vincent P. Martin, C.Christophe Gauld,‌ E.Emmanuel d'Incau,‌ P.-A.Pierre-Alexis Geoffroy,‌‌ P.Patrice Bourgin, P.Pierre Philip,‌ J.Jacques Taillard and‌ J.-A.Jean-Arthur Micoulaud-Franchi.‌‌ Clinical physiology of circadian rhythms: A systematic and‌ hierarchized content analysis of‌ circadian questionnaires.International‌‌ Journal of Clinical and Health Psychology252‌April 2025, 100563‌HAL DOI back to‌‌ text
14 articleP.Paul Galvez, E.‌Emmanuel D’incau, J.‌Jacques Taillard, V.‌‌Vincent P. Martin, M. C.Maria Clotilde‌ Carra, M.Mathilde‌ Fenelon, V.Virginie‌‌ Chuy, J.Julien Coelho, P.Pierre‌ Philip and J.-A.Jean-Arthur‌ Micoulaud-Franchi. Efficacy of‌‌ advancement treatments of the stomatognathic system on objective‌ sleepiness in OSA: a‌ systematic review.Journal‌‌ of Clinical Sleep MedicineApril 2025HAL DOI‌back to text
15‌ articleC.Christophe Gauld‌‌, V.Vincent P. Martin, C.Clélia‌ Quilès, P.-A.Pierre-Alexis‌ Geoffroy, J.Julien‌‌ Coelho, P.Pierre Philip, R.Régis‌ Lopez and J.-A.Jean-Arthur‌ Micoulaud-Franchi. Clinical significance‌‌ criteria in the ICSD and DSM sleep disorder‌ classifications: a content overlap‌ analysis using the Jaccard‌‌ index.Journal of Clinical Sleep MedicineJanuary‌ 2025HAL DOI back‌ to text
16 article‌‌N.Nicolas Hiebel, O.Olivier Ferret,‌ K.Karën Fort and‌ A.Aurélie Névéol.‌‌ Clinical text generation: Are we there yet?Annual‌ Review of Biomedical Data‌ Science82025,‌‌ 173-198HAL DOI back to text
17 article‌V.Vincent P. Martin‌, K.Karën Fort‌‌, J.Julien Coelho, F.François Alla‌ and J.-A.Jean-Arthur Micoulaud-Franchi‌. The lexicon of‌‌ sleep health: A natural language processing bibliometric analysis‌ of the DEI-related terms‌.Sleep HealthNovember‌‌ 2025HAL DOI back to text
18 article‌V.Vincent P. Martin‌, C.Christophe Gauld‌‌ and S.Sébastien Bailly. Gentle Introduction to‌ Network Analysis for Clinical‌ Research: a model centered‌‌ on relations.Chest1683September 2025‌, 574-577HAL DOI‌back to text
19‌‌ articleA.Arthur Trognon, C.Camille Humeau‌, L.Loann Mahdar-Recorbet‌, F.Frédéric Verhaegen‌‌ and M.Michel Musiol. A physical framework‌ to harmonize human interaction‌ analysis across disciplines.‌‌Current Psychology445January 2025, 3519-3531‌HAL DOI back to‌ text
20 articleA.‌‌Arthur Trognon, N.Natacha Stortini, C.‌Coralie Duman, N.‌Nami Koïdé, E.‌‌Ewa Skupinska, H.Hamza Altakroury, A.‌Alizée Poli, L.‌Loann Mahdar-Recorbet, B.‌‌Blandine Beaupain, J.Jean Donadieu and M.‌Michel Musiol. Self-beneficial‌ transactional social dynamics for‌‌ cooperation in Shwachman-Diamond syndrome: a mixed-subject analysis using‌ computational pragmatics.Frontiers‌ in Psychology15January‌‌ 2025HAL DOI back‌ to text
21 articleL.Laure Turcati,‌ A.Alice Millour, R.Renaud Debailly,‌ K.Karën Fort, A.Asma Steinhausser,‌ C.Corentin Biets and A.Anne Dozières.‌ Citizen Science in Practice: How (not) to Fail?‌Citizen Science: Theory and Practice101April‌ 2025, 14HALDOI back to text‌

Invited conferences

22 inproceedingsK.Karën Fort.‌ Large Language Models: the challenge of evaluation.‌CLiC-it 2025 - Eleventh Conference on Computational Linguistics‌Cagliari, ItalySeptember 2025HAL back to text‌
23 inproceedingsK.Karën Fort. Large Language‌ Models:the challenge of evaluation.2025 - Séminaire‌ IA Génératives: Promesses et DéfisParis, FranceMarch‌ 2025HAL back to text
24 inproceedingsK.‌Karën Fort. Les enjeux éthiques de l’IA‌ vus depuis le traitement automatique des langues.‌Journée de lancement du projet InsightNancy, France‌December 2025HAL
25 inproceedingsK.Karën Fort‌. Les grands modèles de langue : des‌ outils situés.NéALA 2025 - Naturel et‌ Artificiel en Linguistique Appliquée : une époque de‌ paradoxesNancy, FranceJuly 2025HAL back to‌ text
26 inproceedingsP.Philippe de Groote.‌ Some observations about plurals in textual mathematics.‌MCLP 2025 - International Conference on Mathematical and‌ Computational Linguistics for ProofsOrsay, FranceSeptember 2025‌HAL back to textback to text

International‌ peer-reviewed conferences

27 inproceedingsS.Santiago Arambillete and‌ P.Philippe de Groote. On the use‌ of binary relations as collective predicates in natural‌ mathematics: Extended Abstract.Proceedings of the 21st‌ International Workshop of Logic and Engineering of Natural‌ Language SemanticsLENLS21 - 21st International Workshop of‌ Logic and Engineering of Natural Language SemanticsNagoya,‌ JapanNovember 2025HALback to text
28‌ inproceedingsT.Timothée Bernard and P.Philippe de‌ Groote. Individuals as sets of perspectives.‌Proceedings of the 21st International Workshop of Logic‌ and Engineering of Natural Language SemanticsLENLS21 -‌ 21st International Workshop of Logic and Engineering of‌ Natural Language SemanticsNagoya, JapanNovember 2025HAL‌back to text
29 inproceedingsM.Marie Cousin‌. Dependency Structures Representation: Meaning Text Theory's Deep-Syntax‌ Encoding With Abstract Categorial Grammars.18th Meeting‌ on the Mathematics of Language - MOL 2025‌Mathematics of Language 2025Stony Brook (NY), United‌ StatesAugust 2025HALback to text
30‌ inproceedingsA.Amandine Decker, M.Maxime Amblard‌ and E.Ellen Breitholtz. Mapping the Landscape‌ of Dialogue Research: A Meta-Analysis of ACL Anthology‌ 2024.Proceedings of the 29th Workshop on‌ the Semantics and Pragmatics of Dialogue – Poster‌ AbstractsSEMDIAL 2025 - 29th Workshop on the‌ Semantics and Pragmatics of DialogueBielefeld, GermanySeptember‌ 2025, 255–257HALback to text
31‌ inproceedingsF.Fanny Ducel, N.Nicolas Hiebel‌, O.Olivier Ferret, K.Karën Fort‌ and A.Aurélie Névéol. "Women do not‌ have heart attacks!" Gender Biases in Automatically Generated Clinical Cases in French‌.Proceedings of the‌ 2025 Annual Conference of‌‌ the Nations of the Americas Chapter of the‌ Association for Computational LinguisticS‌NAACL 2025 - Annual‌‌ Conference of the Nations of the Americas Chapter‌ of the Association for‌ Computational LinguisticsAlbuquerque, United‌‌ StatesApril 2025, 7145–7159HAL DOI back‌ to text
32 inproceedings‌P.Philippe de Groote‌‌ and T.Timothée Bernard. Perspective on individuals‌.Proceedings of Sinn‌ und BedeutungSinn und‌‌ Bedeutung 2929Noto, Italy2025, 352-369‌HAL DOI back to‌ text
33 inproceedingsS.‌‌Santiago Herrera, I.-M.Ioana-Madalina Silai, C.‌Caio Corro, B.‌Bruno Guillaume and S.‌‌Sylvain Kahane. Extraction of Contrastive Rules from‌ Syntactic Treebanks: A Case‌ Study in Romance Languages‌‌.QUASY 2025 - Third Workshop on Quantitative‌ SyntaxLjubljana, SloveniaAugust‌ 2025, 26--38HAL‌‌back to text
34 inproceedingsS.Sylvain Kahane‌, B.Bruno Guillaume‌, L.Léna Brun‌‌ and S.Simeng Song. Status of morphosyntactic‌ features Illustration with written‌ and spoken French UD‌‌ treebanks.Proceedings of the 23rd International Workshop‌ on Treebanks and Linguistic‌ Theories (TLT, SyntaxFest 2025)‌‌TLT, SyntaxFest 2025 - 23rd International Workshop on‌ Treebanks and Linguistic Theories‌Ljubljana, SloveniaACL Anthology‌‌August 2025, 154--159HAL back to text‌
35 inproceedingsN.Nikolett‌ Mus, B.Bruno‌‌ Guillaume, S.Sylvain Kahane and D.Daniel‌ Zeman. Creating a‌ multi-layer Treebank for Tundra‌‌ Nenets.IWCLUL 2025 - 10th International Workshop‌ on Computational Linguistics for‌ Uralic LanguagesJoensuu, Finland‌‌December 2025HAL back to text
36 inproceedings‌I.Iglika Nikolova-Stoupak,‌ M.Maxime Amblard,‌‌ S.Sophie Robert-Hayek and F.Frédérique Rey.‌ A Classifier of Word-Level‌ Variants in Witnesses of‌‌ Biblical Hebrew Manuscripts.63rd Annual Meeting of‌ the Association for Computational‌ Linguistics (ACL 2025)Vienne,‌‌ AustriaAssociation for Computational Linguistics2025, 21313-21329‌HAL DOI back to‌ text
37 inproceedingsV.‌‌Vincent P. Martin, C.Charles Brazier,‌ M.Maxime Amblard,‌ M.Michel Musiol and‌‌ J.-L.Jean-Luc Rouas. Network of acoustic characteristics‌ for the automatic detection‌ of suicide risk from‌‌ speech. Contribution to the 2025 SpeechWellness challenge by‌ the Semawave team.‌Interspeech 2025 : proceeding‌‌Interspeech 2025Rotterdam (NL), NetherlandsISCA2025,‌ 424-428HAL DOI back‌ to text
38 inproceedings‌‌M.María Paz Botero-Garcia, S.Sylvain Kahane‌, E.Emmett Strickland‌, B.Bruno Guillaume‌‌ and A.Anne Lacheret-Dujour. An intonosyntactic treebank‌ for spoken French: What‌ is new with Rhapsodie?‌‌Proceedings of the 23rd International Workshop on Treebanks‌ and Linguistic Theories (TLT,‌ SyntaxFest 2025)TLT, SyntaxFest‌‌ 2025 - 23rd International Workshop on Treebanks and‌ Linguistic TheoriesLjubljana, Slovenia‌August 2025, 111–118‌‌HAL back to text
39 inproceedingsP.Paulette‌ Roulon-Doko, S.Sylvain‌ Kahane and B.Bruno‌‌ Guillaume. A morpheme-based treebank for Gbaya, an‌ Ubanguian language of Central‌ Africa.Proceedings of‌‌ the Eighth International Conference‌ on Dependency Linguistics (Depling, SyntaxFest 2025)Depling, SyntaxFest‌ 2025 - Eighth International Conference on Dependency Linguistics‌Ljubljana, SloveniaACL Anthology2025, 93--102HAL‌back to text
40 inproceedingsR.Rémi de‌ Vergnette, M.Maxime Amblard and B.Bruno‌ Guillaume. Evaluation Framework for Layered Meaning Representation‌.Proceedings of The Sixth International Workshop in‌ Designing Meaning RepresentationDMR 2025 - 6th International‌ Workshop on Designing Meaning RepresentationsPrague, Czech Republic‌August 2025HAL back to text

National peer-reviewed‌ Conferences

41 inproceedingsC.Clémentine Bleuze, F.‌Fanny Ducel, M.Maxime Amblard and K.‌Karën Fort. "Nowadays, the focus is on‌ results" : creation and exploratory investigation of a‌ corpus of claims from NLP articles..Actes‌ des 32ème Conférence sur le Traitement Automatique des‌ Langues Naturelles (TALN)TALN 2025 - 32ème Conférence‌ sur le Traitement Automatique des Langues Naturelles1‌Marseille, France2025HALback to text
42‌ inproceedingsM.Marie Cousin. Syntaxe en dépendance‌ avec les grammaires catégorielles abstraites : une application‌ à la théorie sens-texte.Actes des 32ème‌ Conférence sur le Traitement Automatique des Langues Naturelles‌ (TALN), volume 1 : articles scientifiques originaux20e‌ Conférence en Recherche d’Information et Applications (CORIA) 32ème‌ Conférence sur le Traitement Automatique des Langues Naturelles‌ (TALN) 27ème Rencontre des Étudiants Chercheurs en Informatique‌ pour le Traitement Automatique des Langues (RECITAL) Les‌ 18e Rencontres Jeunes Chercheurs en RI (RJCRI)Marseille,‌ FranceATALA & ARIA2025, 715-728HAL‌back to text
43 inproceedingsA.Amandine Decker‌ and M.Maxime Amblard. L'essentiel est invisible‌ pour les représentations sémantiques.Actes de l'atelier‌ Avancement de l’AMR et de l’Analyse Sémantique 2025‌ (4AS)20e Conférence en Recherche d’Information et Applications‌ (CORIA) 32ème Conférence sur le Traitement Automatique des‌ Langues Naturelles (TALN) 27ème Rencontre des Étudiants Chercheurs‌ en Informatique pour le Traitement Automatique des Langues‌ (RECITAL) Les 18e Rencontres Jeunes Chercheurs en RI‌ (RJCRI)Marseille, FranceATALA & ARIA2025,‌ 1-8HAL back to text
44 inproceedingsF.‌Fanny Ducel, J.Jeffrey André, A.‌Aurélie Névéol and K.Karën Fort. Introducing‌ MascuLead: the First Gender Bias Leaderboard.Actes‌ de l’atelier Ethic and Alignment of (Large) Language‌ Models 2025 (EALM)EALM 2025 - Ethic and‌ Alignment of (Large) Language ModelsMarseille, FranceJune‌ 2025, 12-19HALback to text back‌ to text
45 inproceedingsF.Fanny Ducel,‌ N.Nicolas Hiebel, O.Olivier Ferret,‌ K.Karën Fort and A.Aurélie Névéol.‌ "Women do not have heart attacks !" Gender‌ Biases in Automatically Generated Clinical Cases in French‌.TALN 2025 - Actes de la 32ème‌ Conférence sur le Traitement Automatique des Langues Naturelles‌32ème Conférence sur le Traitement Automatique des Langues‌ Naturelles (TALN 2025)2Marseille, FranceJuly 2025‌, 1HAL back to text
46 inproceedings‌A.Abdelhak Kelious, M.Mathieu Constant and‌ C.Christophe Coeur. Exploration de stratégies de prédiction de la complexité‌ lexicale en contexte multilingue‌ à l'aide de modèles‌‌ de langage génératifs et d'approches supervisées.Actes‌ de l'atelier Évaluation des‌ modèles génératifs (LLM) et‌‌ challenge 2025 (EvalLLM)20e Conférence en Recherche d’Information‌ et Applications (CORIA) 32ème‌ Conférence sur le Traitement‌‌ Automatique des Langues Naturelles (TALN) 27ème Rencontre des‌ Étudiants Chercheurs en Informatique‌ pour le Traitement Automatique‌‌ des Langues (RECITAL) Les 18e Rencontres Jeunes Chercheurs‌ en RI (RJCRI)CORIA-TALN2025‌Marseille, FranceATALA &‌‌ ARIA2025, 202-203HAL
47 inproceedingsV.‌Vincent P. Martin,‌ K.Karën Fort and‌‌ J.-A.Jean-Arthur Micoulaud-Franchi. La trumplang, instrument de‌ destruction de la pensée‌ : analyse de l'impact‌‌ de la censure trumpiste sur la recherche en‌ santé mentale.Actes‌ de TALNTALN 2025‌‌ - 32ème Conférence sur le Traitement Automatique des‌ Langues Naturelles1Marseille,‌ FranceJuly 2025,‌‌ pages 478-488HAL back to text
48 inproceedings‌L.Laure Turcati,‌ A.Alice Millour,‌‌ R.Renaud Debailly, K.Karën Fort,‌ A.Asma Steinhausser,‌ C.Corentin Biets and‌‌ A.Anne Dozières. Citizen Science in Practice:‌ How (not) to Fail?‌Actes de l'atelier Science‌‌ Participative pour les Données et Corpus Linguistiques 2025‌ (ParCol)20e Conférence en‌ Recherche d’Information et Applications‌‌ (CORIA) 32ème Conférence sur le Traitement Automatique des‌ Langues Naturelles (TALN) 27ème‌ Rencontre des Étudiants Chercheurs‌‌ en Informatique pour le Traitement Automatique des Langues‌ (RECITAL) Les 18e Rencontres‌ Jeunes Chercheurs en RI‌‌ (RJCRI)Marseille, FranceATALA & ARIA2025,‌ 1-2HAL back to‌ text

Conferences without proceedings‌‌

49 inproceedingsC.Colleen Beaumard, V.Vincent‌ P. Martin, C.‌Charles Brazier, Y.‌‌Yaru Wu and J.-L.Jean-Luc Rouas. Détection‌ de séquences de phonèmes‌ en parole spontanée pour‌‌ la caractérisation de la somnolence diurne excessive.‌10e Journées de phonétique‌ clinique (JPC)Sète, France‌‌June 2025HAL
50 inproceedingsM.Marie Cousin‌. Adding Communicative Structure‌ to the MTT into‌‌ ACG Encoding.Congreso Internacional sobre Estudios Teóricos‌ y Aplicados de Léxico,‌ 2025 (CIETAL 2025)Madrid,‌‌ SpainMay 2025HALback to text
51‌ inproceedingsF.Fanny Ducel‌, K.Karën Fort‌‌ and A.Aurélie Névéol. La linguistique appliquée‌ pour une IA plus‌ éthique.NéALA 2025‌‌ - Colloque sur Naturel et Artificiel en Linguistique‌ Appliquée : une époque‌ de paradoxesNancy, France‌‌July 2025HAL back to text
52 inproceedings‌P.Philippe de Groote‌ and T.Timothée Bernard‌‌. Worlds, events and perspectives.CSSP 2025‌ - 16ème Colloque de‌ Syntaxe et Sémantique de‌‌ ParisParis, FranceNovember 2025HAL back to‌ text
53 inproceedingsV.‌ D.Valentin D. Richard‌‌. Evaluating Chains Containing an Interrogative Word in‌ an Anaphorically Annotated Corpus‌.Journées scientifiques du‌‌ réseau thématique LIFT2 - linguistique informatique, formelle et‌ de terrain (Lift2-2025)Paris,‌ FranceCNRSOctober 2025‌‌HAL back to text
54 inproceedingsV. D.‌Valentin D. Richard.‌ How to explain the‌‌ divergence between normative discourses?‌ The case of the French construction "verb (+‌ preposition) + interrogative".Nouveaux regards sur la‌ normeFribourg, Switzerland2025, 75-77HAL back‌ to text
55 inproceedingsV. D.Valentin D.‌ Richard. Raising Alternatives to Express Dependence: a‌ compositional issue.The 16th Syntax and Semantics‌ Conference in Paris (CSSP 2025)Paris, FranceNovember‌ 2025HAL back to text

Scientific book chapters‌

56 inbookM.Michel Musiol, A.Arthur‌ Trognon and M.Maxime Amblard. L'analyse de‌ l'interaction verbale « patient » - « thérapeute‌ » par la modélisation formelle : perspectives diagnostiques‌ et informatisation..Psychiatrie et Psychologie du Futur‌ (Yann Auxéméry & Jasmina Mallet (eds))June 2025‌, 83-106HAL back to text

Edition (books,‌ proceedings, special issue of a journal)

57 proceedings‌Proceedings of the 16th International Conference on Computational‌ Semantics.International Conference on Computational Semantics (IWCS)‌Düsseldorf, GermanyAssociation for Computational Linguistics2025HAL‌back to text

Doctoral dissertations and habilitation theses‌

58 thesisS.Siyana Pavlova. Toward Scalable‌ Semantic Annotation‎ : Bridging Readability and a Wide‌ Range of Phenomena into a Layered Meaning Representation‌.Université de LorraineJune 2025HAL back‌ to text

Reports & preprints

59 miscM.‌Marie Cousin. Adding Communicative Structure to the‌ MTT into ACG Encoding.September 2025HAL‌back to text
60 miscG.Gabriel Sauger‌, J.-Y.Jean-Yves Marion, S.Sazzadur Rahaman‌, V.Vincent Tourneur, M.Muaz Ali‌ and V.Victor Matrat. Attacking the First-Principle:‌ A Black-Box, Query-Free Targeted Mimicry Attack on Binary‌ Function Classifiers.January 2026HAL

Other scientific‌ publications

61 miscL.Luciana Benotti, F.‌Fanny Ducel, K.Karën Fort, G.‌Guido Ivetta, Z.Zhijing Jin, M.-Y.‌Min-Yen Kan, S. J.Seunghun J. Lee‌, M.Minzhi Li, M.Margot Mieskes‌ and A.Adriana Pagano. Navigating Ethical Challenges‌ in NLP: Hands-on strategies for students and researchers‌.July 2025HALDOI back to text‌back to text
62 inproceedingsV. D.Valentin‌ D. Richard. Les chaines anaphoriques avec un‌ mot interrogatif ne sont pas toutes (bien) annotées‌ dans ANCOR.Journées scientifiques du réseau thématique‌ LIFT2 - linguistique informatique, formelle et de terrain‌ (Lift2-2025)Paris, FranceCNRSOctober 2025HAL

Scientific‌ popularization

63 articleM.Maxime Amblard and N.‌Nolwenn Le Jannic. Traitement automatique des langues‌ : d’une lente progression à des bouleversements fulgurants‌.IntersticesJanuary 2025HAL

11.3 Cited publications‌

64 miscATILF. BEL-RL-fr.ORTOLANG (Open‌ Resources and TOols for LANGuage) –www.ortolang.fr2025,‌ URL: https://hdl.handle.net/11403/examples-ls-fr/back to text
65 miscATILF‌. Réseau Lexical du Français (RL-fr).ORTOLANG‌ (Open Resources and TOols for LANGuage) –www.ortolang.fr2025‌, URL: https://hdl.handle.net/11403/lexical-system-fr/back to text
66 book‌A.Ash Asudeh and G.Gianluca Giorgolo.‌ Enriched Meanings. Natural Language Semantics with Category Theory‌.1Oxford Studies in Semantics and Pragmatics13OxfordOxford University‌ Press2020back to‌ text
67 inproceedingsM.‌‌Marie Cousin. Meaning-Text Theory within Abstract Categorial‌ Grammars: Towards Paraphrase and‌ Lexical Function Modeling for‌‌ Text Generation.Proceedings of the 15th International‌ Conference on Computational Semantics‌ (IWCS)Nancy, FranceAssociation‌‌ for Computational LinguisticsJune 2023HAL back to‌ text
68 inproceedingsM.‌Marie Cousin. Vers‌‌ une implémentation de la théorie sens-texte avec les‌ grammaires catégorielles abstraites.‌Actes de CORIA-TALN 2023.‌‌ Actes des 16e Rencontres Jeunes Chercheurs en RI‌ (RJCRI) et 25e Rencontre‌ des Étudiants Chercheurs en‌‌ Informatique pour le Traitement Automatique des Langues (RÉCITAL)‌Paris, FranceATALAJune‌ 2023, 72-86HAL‌‌back to text
69 miscM.Mathilde Dargnat‌. Les particules énonciatives‌.September 2024HAL‌‌DOI back to text
70 articleF.Fanny‌ Ducel, A.Aurélie‌ Névéol and K.Karën‌‌ Fort. ''You'll be a nurse, my son!''‌ Automatically Assessing Gender Biases‌ in Autoregressive Language Models‌‌ in French and Italian.Language Resources and‌ EvaluationOctober 2024,‌ 1495--1523HAL DOI back‌‌ to text
71 inbookJ. R.John Ruppert‌ Firth. Studies in‌ Linguistic Analysis. Special volume‌‌ of the Philological Society.Reprinted in: Palmer,‌ F. R. (ed.) (1968).‌ Selected Papers of J.‌‌ R. Firth 1952-59, pages 168-205. Longmans, London.Oxford‌Blackwell1957, A‌ Synopsis of Linguistic Theory,‌‌ 1930-19551--32back to text
72 bookJ.‌Jonatan Ginzburg. The‌ Interactive Stance.Oxford‌‌Oxford University Press2012back to text
73‌ inproceedingsP.Philippe de‌ Groote. Deriving Formal‌‌ Semantic Representations from~Dependency Structures.Logic and Engineering‌ of Natural Language Semantics:‌ 19th International Conference, LENLS19,‌‌ Tokyo, Japan, November 19--21, 2022, Revised Selected Papers‌Lecture Notes in Computer‌ Science14213Tokyo (JP),‌‌ JapanSpringerNovember 2022, 157-172HAL DOI‌back to text
74‌ inproceedingsP.Philippe de‌‌ Groote. On the semantics of dependencies: relative‌ clauses and open clausal‌ complements - extended abstract‌‌ -.Logic and Engineering of Natural Language‌ Semantics 20 (LENLS20)Osaka,‌ JapanNovember 2023HAL‌‌back to text
75 articleP.Philippe de‌ Groote and S.Sylvain‌ Pogodalla. On the‌‌ expressive power of Abstract Categorial Grammars: Representing context-free‌ formalisms.134‌http://www.springerlink.com/content/1572-9583/2004, 421--438‌‌HAL DOI back to text
76 inproceedingsP.‌Philippe de Groote.‌ Towards a Montagovian account‌‌ of dynamics.Proceedings of the 16th Semantics‌ and Linguistic Theory Conference‌ (SALT 16)2006DOI‌‌back to text
77 inproceedingsP.Philippe de‌ Groote. Towards abstract‌ categorial grammars.Association‌‌ for Computational Linguistics, 39th Annual Meeting and 10th‌ Conference of the European‌ ChapterColloque avec actes‌‌ et comité de lecture. internationale.Toulouse, FranceJuly‌ 2001, 148--155HAL‌back to text
78‌‌ articleB.Bruno Guillaume and G.Guy Perrier‌. Interaction Grammars.‌72-42009,‌‌ 171--208HAL DOI back to text
79 article‌Z. S.Zellig S.‌ Harris. Distributional Structure‌‌.Word102-3‌1954, 146-162DOIback to text
80‌ inproceedingsP.Paola Herreño Herreño Castañeda, J.‌Jonathan Ginzburg and M.Mathilde Dargnat. Discourse‌ Markers for Topic Change.TrentoLogue: SemDial workshop‌Università Di TrentoRoverto, ItalySEMDIALSeptember 2024‌, 1-3HAL back to text
81 inproceedings‌K.Kris Heylen, Y.Yves Peirsman,‌ D.Dirk Geeraerts and D.Dirk Speelman.‌ Modelling Word Similarity: an Evaluation of Automatic Synonymy‌ Extraction Algorithms..Proceedings of the Sixth International‌ Conference on Language Resources and Evaluation (LREC'08)Marrakech,‌ MoroccoEuropean Language Resources Association (ELRA)May 2008‌, URL: http://www.lrec-conf.org/proceedings/lrec2008/pdf/818_paper.pdfback to text
82 inproceedings‌ G.Ganesh Jawahar, B.Benôit Sagot and‌ D.Djamé Seddah. What does BERT learn‌ about the structure of language? ACL 2019 -‌ 57th Annual Meeting of the Association for Computational‌ Linguistics Florence, Italy July 2019 HAL back to‌ text
83 unpublishedJ.Jacques Jayez. (Innocent‌ ?) Bias in argumentation. The view from language‌.January 2026, working paper or preprint‌HAL back to text
84 inproceedingsJ.Jacques‌ Jayez. Discourse markers are not special (but‌ they can be complicated.Empirical Issues in‌ Syntax and Semantics. Selected papers from CSSP 2023‌Paris, France2025HALback to text back‌ to text
85 inproceedingsV.Veronika Lux-Pogodalla and‌ A.Alain Polguère. Construction of a French‌ Lexical Network: Methodological Issues.Proceedings of the‌ First International Workshop on Lexical Resources, WoLeR 2011.‌ An ESSLLI 2011 WorkshopLjubljana, SloveniaAugust 2011‌, 54--61URL: https://hal.inria.fr/hal-00686467back to text
86‌ articleC. D.Christopher D. Manning, K.‌Kevin Clark, J.John Hewitt, U.‌Urvashi Khandelwal and O.Omer Levy. Emergent‌ linguistic structure in artificial neural networks trained by‌ self-supervision.Proceedings of the National Academy of‌ Sciences117482020, 30046--30054DOI back‌ to text
87 bookI.Igor Mel'čuk.‌ Semantics: From Meaning to Text.1Studies‌ in Language Companion Series129Amsterdam/PhiladelphiaJohn Benjamins‌ Publishing Company2012back to text
88 article‌S.Sebastian Padó and M.Mirella Lapata.‌ Dependency-Based Construction of Semantic Space Models.Computational‌ Linguistics3322007, 161--199URL: https://www.aclweb.org/anthology/J07-2002‌DOI back to text
89 inproceedingsM.Muntsa‌ Padró, M.Marco Idiart, A.Aline‌ Villavicencio and C.Carlos Ramisch. Comparing Similarity‌ Measures for Distributional Thesauri.Proceedings of LREC‌ 20142014, URL: https://www.aclweb.org/anthology/L14-1496/back to text‌
90 inproceedingsY.Yves Peirsman, K.Kris‌ Heylen and D.Dirk Speelman. Finding semantically‌ related words in Dutch: co-occurrences versus syntactic contexts‌.Proceedings of the 2007 Workshop on Contextual‌ Information in Semantic Space Models: Beyond Words and‌ Documents2007, 9-16URL: https://bibliotek.dk/eng/moreinfo/netarchive/870970-basis:28214510back to‌ text
91 inproceedingsG.Guy Perrier. A‌ French Interaction Grammar.RANLP 2007 - International‌ Conference on Recent Advances in Natural Language Processing‌IPP & BAS & ACL-BulgariaBorovets, BulgariaINCOMA Ltd, Shoumen, BulgariaSeptember‌ 2007, 463--467HAL‌back to text
92‌‌ articleT.Thom Scott-Phillips and C.Christophe Heintz‌. Great ape interaction:‌ Ladyginian but not Gricean‌‌.120422023DOI back to text‌
93 inproceedingsA.Ashish‌ Vaswani, N.Noam‌‌ Shazeer, N.Niki Parmar, J.Jakob‌ Uszkoreit, L.Llion‌ Jones, A. N.‌‌Aidan N. Gomez, \.\L{}ukasz Kaiser and‌ I.Illia Polosukhin.‌ Attention is All You‌‌ Need.Proceedings of the 31st International Conference‌ on Neural Information Processing‌ SystemsNIPS'17Red Hook,‌‌ NY, USALong Beach, California, USACurran Associates‌ Inc.2017, 6000–6010‌URL: https://dl.acm.org/doi/pdf/10.5555/3295222.3295349back to‌‌ text
94 inproceedingsJ.Julie Weeds, D.‌David Weir and D.‌Diana McCarthy. Characterising‌‌ Measures of Lexical Distributional Similarity.COLING 2004:‌ Proceedings of the 20th‌ International Conference on Computational‌‌ LinguisticsGeneva, SwitzerlandCOLING2004, 1015--1021URL:‌ https://www.aclweb.org/anthology/C04-1146back to text‌
95 inproceedingsY.Yuqing‌‌ Yang, Q.Qipeng Guo, X.Xiangkun‌ Hu, Y.Yue‌ Zhang, X.Xipeng‌‌ Qiu and Z.Zheng Zhang. An AMR-based‌ Link Prediction Approach for‌ Document-level Event Argument Extraction‌‌.Proceedings of the 61st Annual Meeting of‌ the Association for Computational‌ Linguistics (Volume 1: Long‌‌ Papers)Toronto, CanadaAssociation for Computational LinguisticsJuly‌ 2023, 12876--12889URL:‌ https://aclanthology.org/2023.acl-long.720/DOI back to‌‌ text

SEMAGRAMME - 2025

SEMAGRAMME - 2025

2025Activity﻿‌​‌ reportTeamSEMAGRAMME

Keywords

Computer Science and​​﻿﻿ Digital Science

Other​​﻿﻿ Research Topics and Application​​​‌ Domains

1 Team members, visitors,​​﻿﻿ external collaborators

Research Scientists​​​‌

Faculty Members

PhD Students

Technical Staff

Interns and​‌﻿﻿ Apprentices

Administrative﻿‌​‌ Assistants

External​​​‌ Collaborators

2 Overall objectives﻿​​﻿

2.1 Scientific Context

2.2 Syntax-Semantics Interface​‌﻿﻿

2.3 Discourse Dynamics

2.4 Common Basic Resources​​​‌

3​​​‌ Research program

3.1 Overview﻿​﻿﻿

3.2 Formal﻿​​﻿ Language Theory

3.3﻿​​﻿ Symbolic Logic

3.4﻿﻿﻿‌ Type Theory and Typed﻿‌​‌ Lambda-Calculus

4 Application domains﻿‌​‌

4.1 Deep Semantic Analysis﻿​​﻿

4.2 Text​​​‌ Transformation

4.3 Types﻿﻿﻿‌ for discourse markers

5 Social​​​‌ and environmental responsibility

5.1﻿​﻿﻿ Footprint of research activities​‌﻿﻿

ANR InExtenso:

6﻿​﻿﻿ Latest software developments, platforms,​‌﻿﻿ open data

6.1 Latest​​﻿﻿ software developments

6.1.1 ACGtk​​​‌

6.1.2 Grew

6.1.3 HostoMytho

6.1.4﻿​​﻿ Arborator-Grew

6.2﻿‌​‌ Open data

7 New﻿​​﻿ results

7.1 Syntax-Semantics Interface​​​‌

7.1.1 Abstract Categorial Grammars﻿﻿﻿‌

Feature Structure

Encoding﻿​﻿﻿ of Meaning-Text Theory Into​‌﻿﻿ ACGs

7.1.2 Formal﻿​﻿﻿ semantics of adnominal modification​‌﻿﻿

7.1.3 Semantic treatment​​​‌ of plurals textual mathematics.﻿﻿﻿‌

7.1.4 Semantic﻿​​﻿ Representation

7.1.5​​﻿﻿ Syntax and semantics of​​​‌ questions

7.1.6 Use​​​‌ of semantics

7.2 Distributional Semantics and​‌﻿﻿ Lexical Structures

7.3﻿​﻿﻿ Discourse Dynamics

7.3.1 Dialogue Modeling​​​‌

7.3.2 Discourse Markers﻿​​﻿

7.3.3 Pathological​​﻿﻿ Discourse Modeling

7.4 Common Basic Resources﻿‌​‌

7.4.1 Universal Dependencies and​​​‌ Surface Syntactic Universal Dependencies﻿﻿﻿‌

7.4.2 Citizen​​​‌ Science

7.4.3 Synthetic clinical texts﻿﻿﻿‌ generation

7.5 Ethics​‌﻿﻿ and biases

7.5.1 Ethics dissemination​​​‌ in scientific communities

7.5.2 Evaluating​‌﻿﻿ stereotypes in autoregressive language​​﻿﻿ models

7.5.3 Biases in​​﻿﻿ the biomedical domain

7.5.4 NLP​​​‌ for NLP and Ethics﻿​﻿﻿

8 Bilateral contracts and​​​‌ grants with industry

8.1﻿﻿﻿‌ Bilateral contracts with industry﻿‌​‌

9 Partnerships and cooperations﻿‌​‌

9.1 International research visitors﻿​​﻿

9.1.1 Visits of international​​​‌ scientists

Casey Kennington

Aarne Ranta

Díaz Hernández Roberto﻿﻿﻿‌ Antonio

9.2 European initiatives​​​‌

9.2.1 Horizon Europe

MALINCA﻿﻿﻿‌

9.2.2 Other european programs/initiatives​‌﻿﻿

9.3​​﻿﻿ National initiatives

9.3.1 ANR​​​‌ Project: InExtenso

9.3.2 ANR Project:​​​‌ CoDeinE

9.3.3 ANR Project:﻿﻿﻿‌ Autogramm

9.3.4 ANR Project:​​​‌ CODIM

9.3.5 PEPR Project Digital﻿​﻿﻿ Health: Autonom Health

2025Activity‌‌ reportTeamSEMAGRAMME

Computer Science and Digital Science

Other Research Topics and Application‌ Domains

1 Team members, visitors, external collaborators

Research Scientists‌

Interns and‌ Apprentices

Administrative‌‌ Assistants

External‌ Collaborators

2 Overall objectives

2.2 Syntax-Semantics Interface‌

2.4 Common Basic Resources‌

3‌ Research program

3.1 Overview

3.2 Formal Language Theory

3.3 Symbolic Logic

3.4‌ Type Theory and Typed‌‌ Lambda-Calculus

4 Application domains‌‌

4.1 Deep Semantic Analysis

4.2 Text‌ Transformation

4.3 Types‌ for discourse markers

5 Social‌ and environmental responsibility

5.1 Footprint of research activities‌

6 Latest software developments, platforms,‌ open data

6.1 Latest software developments

6.1.1 ACGtk‌

6.1.4 Arborator-Grew

6.2‌‌ Open data

7 New results

7.1 Syntax-Semantics Interface‌

7.1.1 Abstract Categorial Grammars‌

Encoding of Meaning-Text Theory Into‌ ACGs

7.1.2 Formal semantics of adnominal modification‌

7.1.3 Semantic treatment‌ of plurals textual mathematics.‌

7.1.4 Semantic Representation

7.1.5 Syntax and semantics of‌ questions

7.1.6 Use‌ of semantics

7.2 Distributional Semantics and‌ Lexical Structures

7.3 Discourse Dynamics

7.3.1 Dialogue Modeling‌

7.3.2 Discourse Markers

7.3.3 Pathological Discourse Modeling

7.4 Common Basic Resources‌‌

7.4.1 Universal Dependencies and‌ Surface Syntactic Universal Dependencies‌

7.4.2 Citizen‌ Science

7.4.3 Synthetic clinical texts‌ generation

7.5 Ethics‌ and biases

7.5.1 Ethics dissemination‌ in scientific communities

7.5.2 Evaluating‌ stereotypes in autoregressive language models

7.5.3 Biases in the biomedical domain

7.5.4 NLP‌ for NLP and Ethics

8 Bilateral contracts and‌ grants with industry

8.1‌ Bilateral contracts with industry‌‌

9 Partnerships and cooperations‌‌

9.1 International research visitors

9.1.1 Visits of international‌ scientists

Díaz Hernández Roberto‌ Antonio

9.2 European initiatives‌

MALINCA‌

9.2.2 Other european programs/initiatives‌

9.3 National initiatives

9.3.1 ANR‌ Project: InExtenso

9.3.2 ANR Project:‌ CoDeinE

9.3.3 ANR Project:‌ Autogramm

9.3.4 ANR Project:‌ CODIM

9.3.5 PEPR Project Digital Health: Autonom Health

10.1 Promoting scientific activities

10.1.1‌ Scientific events: organisation

General chair,‌ scientific chair

10.1.2‌ Scientific events: selection

Chair of conference program committees‌

Member of the conference program committees

Member of the editorial boards‌

Reviewer -‌ reviewing activities

10.1.5 Leadership within the scientific‌ community

10.1.6 Scientific expertise‌

10.1.7 Research administration

10.2‌ Teaching - Supervision - Juries - Educational and‌ pedagogical outreach

10.2.1 Teaching

PhD‌ defended in 2025

PhD in progress