EN FR
EN FR

2025Activity‌​‌ reportTeamSEMAGRAMME

RNSR:​​ 201120979K
  • Research center Inria​​​‌ Centre at Université de‌ Lorraine
  • In partnership with:‌​‌CNRS, Université de Lorraine​​
  • Team name: Semantic Analysis​​​‌ of Natural Language
  • In‌ collaboration with:Laboratoire lorrain‌​‌ de recherche en informatique​​ et ses applications (LORIA)​​​‌

Creation of the Team:‌ 2013 July 01

Each‌​‌ year, Inria research teams​​ publish an Activity Report​​​‌ presenting their work and‌ results over the reporting‌​‌ period. These reports follow​​ a common structure, with​​​‌ some optional sections depending‌ on the specific team.‌​‌ They typically begin by​​​‌ outlining the overall objectives​ and research programme, including​‌ the main research themes,​​ goals, and methodological approaches.​​​‌ They also describe the​ application domains targeted by​‌ the team, highlighting the​​ scientific or societal contexts​​​‌ in which their work​ is situated.

The reports​‌ then present the highlights​​ of the year, covering​​​‌ major scientific achievements, software​ developments, or teaching contributions.​‌ When relevant, they include​​ sections on software, platforms,​​​‌ and open data, detailing​ the tools developed and​‌ how they are shared.​​ A substantial part is​​​‌ dedicated to new results,​ where scientific contributions are​‌ described in detail, often​​ with subsections specifying participants​​​‌ and associated keywords.

Finally,​ the Activity Report addresses​‌ funding, contracts, partnerships, and​​ collaborations at various levels,​​​‌ from industrial agreements to​ international cooperations. It also​‌ covers dissemination and teaching​​ activities, such as participation​​​‌ in scientific events, outreach,​ and supervision. The document​‌ concludes with a presentation​​ of scientific production, including​​​‌ major publications and those​ produced during the year.​‌

Keywords

Computer Science and​​ Digital Science

  • A5.8. Natural​​​‌ language processing
  • A7.2. Logic​ in Computer Science
  • A9.4.​‌ Natural language processing

Other​​ Research Topics and Application​​​‌ Domains

  • B2. Digital health​
  • B9.6.8. Linguistics
  • B9.9. Ethics​‌

1 Team members, visitors,​​ external collaborators

Research Scientists​​​‌

  • Philippe de Groote [​Team leader, INRIA​‌, Senior Researcher]​​
  • Bruno Guillaume [INRIA​​​‌, Researcher]
  • Vincent​ Martin [INRIA,​‌ Researcher]
  • Sylvain Pogodalla​​ [INRIA, Researcher​​​‌]

Faculty Members

  • Maxime​ Amblard [UL,​‌ Professor, HDR]​​
  • Karën Fort [UL​​​‌, Professor, HDR​]
  • Jacques Jayez [​‌ENS DE LYON,​​ Emeritus]
  • Michel Musiol​​​‌ [UL, Professor​ Delegation, until Aug​‌ 2025, HDR]​​

PhD Students

  • Clémentine Bleuze​​​‌ [UL]
  • Hee-Soo​ Choi [UL,​‌ ATER]
  • Marie Cousin​​ [UL]
  • Amandine​​​‌ Decker [UL]​
  • Fanny Ducel [UNIV​‌ PARIS SACLAY]
  • Maxime​​ Guillaume [YSEOP,​​​‌ CIFRE]
  • Amandine Lecomte​ [UL, until​‌ Mar 2025]
  • Siyana​​ Pavlova [UL,​​​‌ ATER]
  • Valentin Richard​ [Univ Amsterdam]​‌
  • Vincent Tourneur [UL​​]
  • Rémi de Vergnette​​​‌ de Lamotte [UL​, from Nov 2025​‌]

Technical Staff

  • Khensa​​ Amani Daoudi [INRIA​​​‌, Engineer, until​ Jan 2025]
  • Amandine​‌ Lecomte [UL,​​ from Mar 2025]​​​‌
  • Iglika Zlatkova Nikolova-Stoupak [​UL]

Interns and​‌ Apprentices

  • Mohammad Al Takach​​ [UL, from​​​‌ Mar 2025 until Aug​ 2025]
  • Jeffrey Andre​‌ [UL, Intern​​, from Apr 2025​​​‌ until May 2025]​
  • Apolline Bastien [UL​‌, Intern, from​​ Jun 2025 until Jun​​​‌ 2025]
  • Ahana Chattopadhyay​ [UL, Intern​‌, from Mar 2025​​ until Aug 2025]​​​‌
  • Luc Cheng [INRIA​, Intern, from​‌ Mar 2025 until Jul​​ 2025]
  • Florian Cuny​​​‌ [UL, Intern​, from Jun 2025​‌ until Aug 2025]​​
  • Lucie Digoin-Caparros [UL​​​‌, Intern, from​ Jun 2025 until Aug​‌ 2025]
  • Mae Dugoua​​ Jacques [CNRS,​​ Intern, from Apr​​​‌ 2025 until Jun 2025‌]
  • Samba Fall [‌​‌INRIA, Intern,​​ from Jun 2025 until​​​‌ Aug 2025]
  • Zsofia‌ Flora Hauk [UL‌​‌, Intern, from​​ Jun 2025 until Aug​​​‌ 2025]
  • Jules Holder‌ [CNRS, Intern‌​‌, from Apr 2025​​ until Jun 2025]​​​‌
  • Vidit Khazanchi [UL‌, Intern, from‌​‌ May 2025 until Jul​​ 2025]
  • Owen Le​​​‌ Ray [UL,‌ Intern, from Nov‌​‌ 2025]
  • Loic Leclere​​ [UL, Intern​​​‌, from Jun 2025‌ until Aug 2025]‌​‌
  • Tadzhat Marharian [UL​​, Intern, from​​​‌ Jun 2025 until Aug‌ 2025]
  • Ivaylo Mitov‌​‌ [UL, Intern​​, from Jun 2025​​​‌ until Aug 2025]‌
  • Wassila Oudinache [UL‌​‌, Intern, from​​ Apr 2025 until Jun​​​‌ 2025]
  • Arthur Pedrini‌ [UL, Intern‌​‌, from Jun 2025​​ until Aug 2025]​​​‌
  • Shayan Ahmed Sharriff [‌UL, Intern,‌​‌ from Jun 2025 until​​ Aug 2025]
  • Austin​​​‌ Tangban [UL,‌ from Jul 2025 until‌​‌ Aug 2025]
  • Enola​​ Thomas [UL,​​​‌ Intern, from Jun‌ 2025 until Jun 2025‌​‌]
  • Celine Zyna Rahme​​ [UL, from​​​‌ Jul 2025 until Aug‌ 2025]
  • Rémi de‌​‌ Vergnette de Lamotte [​​UL, Intern,​​​‌ from Mar 2025 until‌ Aug 2025]

Administrative‌​‌ Assistants

  • Véronique Constant [​​INRIA]
  • Sophie Drouot​​​‌ [INRIA]
  • Anne-Marie‌ Messaoudi [LORIA,‌​‌ from Sep 2025]​​
  • Anne-Marie Messaoudi [UL​​​‌, until Aug 2025‌]
  • Gallown Nizard [‌​‌UL]
  • Cecilia Olivier​​ [INRIA]

External​​​‌ Collaborators

  • Mathieu Constant [‌UL]
  • Khensa Amani‌​‌ Daoudi [UNICAEN,​​ from Feb 2025 until​​​‌ Aug 2025]
  • Roberto‌ Diaz Hernandez [Univ‌​‌ Jaén, from Apr​​ 2025]
  • Michel Musiol​​​‌ [UL, from‌ Sep 2025, HDR‌​‌]

2 Overall objectives​​

2.1 Scientific Context

Computational​​​‌ linguistics is a discipline‌ at the intersection of‌​‌ computer science and linguistics.​​ On the theoretical side,​​​‌ it aims to provide‌ computational models of the‌​‌ human language faculty. On​​ the applied side, it​​​‌ is concerned with natural‌ language processing and its‌​‌ practical applications.

From a​​ structural point of view,​​​‌ linguistics is traditionally organized‌ into the following sub-fields:‌​‌

  • Phonology, the study of​​ language abstract sound systems.​​​‌
  • Morphology, the study of‌ word structure.
  • Syntax, the‌​‌ study of language structure,​​ i.e., the way words​​​‌ combine into grammatical phrases‌ and sentences.
  • Semantics, the‌​‌ study of meaning at​​ the levels of words,​​​‌ phrases, and sentences.
  • Pragmatics,‌ the study of the‌​‌ ways in which the​​ meaning of an utterance​​​‌ is affected by its‌ context.

Computational linguistics is‌​‌ concerned by all these​​ fields. Consequently, various computational​​​‌ models, whose application domains‌ range from phonology to‌​‌ pragmatics, have been developed.​​ Among these, logic-based models​​​‌ play an important part,‌ especially at the “highest”‌​‌ levels.

At the level​​ of syntax, generative grammars​​​‌ may be seen as‌ basic inference systems, while‌​‌ categorial grammars are based​​​‌ on substructural logics specified​ by Gentzen sequent calculi.​‌ Finally, model-theoretic grammars amount​​ to sets of logical​​​‌ constraints to be satisfied.​

At the level of​‌ semantics, the most common​​ approaches derive from Montague​​​‌ grammars, which are based​ on the simply typed​‌ λ-calculus and Church's​​ simple theory of types.​​​‌ In addition, various logics​ (modal, hybrid, intensional, higher​‌ order...) are used to​​ express logical semantic representations.​​​‌

At the level of​ pragmatics, the situation is​‌ less clear. The word​​ pragmatics has been introduced​​​‌ by Morris to designate​ the branch of philosophy​‌ of language that studies,​​ besides linguistic signs, their​​​‌ relation to their users​ and the possible contexts​‌ of use. The definition​​ of pragmatics was not​​​‌ quite precise, and, for​ a long time, several​‌ authors have considered (and​​ some authors are still​​​‌ considering) pragmatics as the​ wastebasket of syntax and​‌ semantics. Nevertheless, as far​​ as discourse processing is​​​‌ concerned (which includes pragmatic​ problems such as pronominal​‌ anaphora resolution), logic-based approaches​​ have also been successful.​​​‌ In particular, Kamp's Discourse​ Representation Theory gave rise​‌ to sophisticated `dynamic' logics.​​ The situation, however, is​​​‌ less satisfactory than it​ is at the semantic​‌ level. On the one​​ hand, we are facing​​​‌ a kind of logical​ “tower of Babel”. The​‌ various pragmatic logic-based models​​ that have been developed,​​​‌ while sharing underlying mathematical​ concepts, differ in several​‌ respects and are too​​ often based on ad​​​‌ hoc features. As a​ consequence, they are difficult​‌ to compare and appear​​ more as competitors than​​​‌ as collaborative theories that​ could be integrated. On​‌ the other hand, several​​ phenomena related to discourse​​​‌ dynamics (e.g., context updating,​ presupposition projection and accommodation,​‌ contextual reference resolution...) are​​ still lacking deep logical​​​‌ explanations. We strongly believe,​ however, that this situation​‌ can be improved by​​ applying to pragmatics the​​​‌ same approach Montague applied​ to semantics, using the​‌ standard tools of mathematical​​ logic.

Accordingly:

The overall​​​‌ objective of the Sémagramme​ project is to design​‌ and develop new unifying​​ logic-based models, methods, and​​​‌ tools for the semantic​ analysis of natural language​‌ utterances and discourses. This​​ includes the logical modeling​​​‌ of pragmatic phenomena related​ to discourse dynamics. Typically,​‌ these models and methods​​ will be based on​​​‌ standard logical concepts (stemming​ from formal language theory,​‌ mathematical logic, and type​​ theory), which should make​​​‌ them easy to integrate.​

The project is organized​‌ along three research directions​​ (i.e., syntax-semantics interface,​​​‌ discourse dynamics, and​ common basic resources),​‌ which interact as explained​​ below.

Moreover, a transversal​​​‌ and transdisciplinary theme has​ been developed in the​‌ team in the past​​ years: ethics in NLP​​​‌ and more generally in​ AI.

2.2 Syntax-Semantics Interface​‌

The Sémagramme project intends​​ to focus on the​​​‌ semantics of natural languages​ (in a wider sense​‌ than usual, including some​​ pragmatics). Nevertheless, the semantic​​​‌ construction process is syntactically​ guided, that is, the​‌ constructions of logical representations​​ of meaning are based​​​‌ on the analysis of​ the syntactic structures. We​‌ do not want, however,​​ to commit ourselves to​​ such or such specific​​​‌ theory of syntax. Consequently,‌ our approach should be‌​‌ based on an abstract​​ generic model of the​​​‌ syntax-semantic interface.

Here, an‌ important idea of Montague‌​‌ comes into play, namely,​​ the “homomorphism requirement”: semantics​​​‌ must appear as a‌ homomorphic image of syntax.‌​‌ While this idea is​​ almost a truism in​​​‌ the context of mathematical‌ logic, it remains challenged‌​‌ in the context of​​ natural languages. Nevertheless, Montague's​​​‌ idea has been quite‌ fruitful, especially in the‌​‌ field of categorial grammars,​​ where van Benthem showed​​​‌ how syntax and semantics‌ could be connected using‌​‌ the Curry-Howard isomorphism. This​​ correspondence is the keystone​​​‌ of the syntax-semantics interface‌ of modern type-logical grammars.‌​‌ It also motivated the​​ definition of our own​​​‌ Abstract Categorial Grammars  77‌.

Technically, an Abstract‌​‌ Categorial Grammar simply consists​​ of a (linear) homomorphism​​​‌ between two higher-order signatures.‌ Extensive studies have shown‌​‌ that this simple model​​ allows several grammatical formalisms​​​‌ to be expressed, providing‌ them with a syntax-semantics‌​‌ interface for free  75​​, 8.

We​​​‌ intend to carry on‌ with the development of‌​‌ the Abstract Categorial Grammar​​ framework. At the foundational​​​‌ level, we will define‌ and study possible type‌​‌ theoretic extensions of the​​ formalism, in order to​​​‌ increase its expressive power‌ and its flexibility. At‌​‌ the implementation level, we​​ will continue the development​​​‌ of an Abstract Categorial‌ Grammar support system.

As‌​‌ said above, considering the​​ syntax-semantics interface as the​​​‌ starting point of our‌ investigations allows us not‌​‌ to be committed to​​ some specific syntactic theory.​​​‌ The Montagovian syntax-semantics interface,‌ however, cannot be considered‌​‌ to be universal. In​​ particular, it does not​​​‌ seem to be well‌ adapted to dependency and‌​‌ model-theoretic grammars. Consequently, in​​ order to be as​​​‌ generic as possible, we‌ intend to explore alternative‌​‌ models of the syntax-semantics​​ interface. In particular, we​​​‌ will explore relational models‌ where several distinct semantic‌​‌ representations can correspond to​​ the same syntactic structure.​​​‌

2.3 Discourse Dynamics

It‌ is well known that‌​‌ the interpretation of a​​ discourse is a dynamic​​​‌ process. Take a sentence‌ occurring in a discourse.‌​‌ On the one hand,​​ it must be interpreted​​​‌ according to its context.‌ On the other hand,‌​‌ its interpretation affects this​​ context, and must therefore​​​‌ result in an updating‌ of the current context.‌​‌ For this reason, discourse​​ interpretation is traditionally considered​​​‌ to belong to pragmatics.‌ The cut between pragmatics‌​‌ and semantics, however, is​​ not that clear.

As​​​‌ we mentioned above, we‌ intend to apply to‌​‌ some aspects of pragmatics​​ (mainly, discourse dynamics) the​​​‌ same methodological tools Montague‌ applied to semantics. The‌​‌ challenge here is to​​ obtain a completely compositional​​​‌ theory of discourse interpretation,‌ by respecting Montague's homomorphism‌​‌ requirement. We think that​​ this is possible by​​​‌ using techniques coming from‌ programming language theory, in‌​‌ particular, continuation semantics, and​​ the related theories of​​​‌ functional control operators.

We‌ have indeed successfully applied‌​‌ such techniques in order​​ to model the way​​​‌ quantifiers in natural languages‌ may dynamically extend their‌​‌ scope  76. We​​​‌ intend to tackle, in​ a similar way, other​‌ dynamic phenomena (typically, anaphora​​ and referential expressions, presupposition,​​​‌ modal subordination...).

What characterizes​ these different dynamic phenomena​‌ is that their interpretations​​ need information to be​​​‌ retrieved from a current​ context. This raises the​‌ question of the modeling​​ of the context itself.​​​‌ At a foundational level,​ we have to answer​‌ questions such as the​​ following. What is the​​​‌ nature of the information​ to be stored in​‌ the context? What are​​ the processes that allow​​​‌ implicit information to be​ inferred from the context?​‌ What are the primitives​​ that allow a context​​​‌ to be updated? How​ does the structure of​‌ the discourse and the​​ discourse relations affect the​​​‌ structure of the context?​ These questions also raise​‌ implementation issues. What are​​ the appropriate data types?​​​‌ How can we keep​ the complexity of the​‌ inference algorithms sufficiently low?​​

2.4 Common Basic Resources​​​‌

Even if our research​ primarily focuses on semantics​‌ and pragmatics, we nevertheless​​ need syntax. More precisely,​​​‌ we need syntactic trees​ to start with. We​‌ consequently need grammars, lexicons,​​ and parsing algorithms to​​​‌ produce such trees. During​ the last years, we​‌ have developed the notion​​ of interaction grammar 78​​​‌ and graph rewriting 3​, 4 as models​‌ of natural language syntax.​​ This includes the development​​​‌ of grammars for French​ 91, together with​‌ morphosyntactic lexicons. We intend​​ to continue this line​​​‌ of research and development.​ In particular, we want​‌ to increase the coverage​​ of our grammars for​​​‌ French, and provide our​ parsers with more robust​‌ algorithms.

Further primary resources​​ are needed in order​​​‌ to put at work​ a computational semantic analysis​‌ of utterances and discourses.​​ As we want our​​​‌ approach to be as​ compositional as possible, we​‌ must develop lexicons annotated​​ with semantic information. This​​​‌ opens the quite wide​ research area of lexical​‌ semantics.

Finally, when dealing​​ with logical representations of​​​‌ utterance interpretations, the need​ for inference facilities is​‌ ubiquitous. Inference is needed​​ in the course of​​​‌ the interpretation process, but​ also to exploit the​‌ result of the interpretation.​​ Indeed, an advantage of​​​‌ using formal logic for​ semantic representations is the​‌ possibility of using logical​​ inference to derive new​​​‌ information. From a computational​ point of view, however,​‌ logical inference may be​​ highly complex. Consequently, we​​​‌ need to investigate which​ logical fragments can be​‌ used efficiently for natural​​ language oriented inference.

3​​​‌ Research program

3.1 Overview​

The research program of​‌ Sémagramme aims to develop​​ models based on well-established​​​‌ mathematics. We seek two​ main advantages from this​‌ approach. On the one​​ hand, by relying on​​​‌ mature theories, we have​ at our disposal sets​‌ of mathematical tools that​​ we can use to​​​‌ study our models. On​ the other hand, developing​‌ various models on a​​ common mathematical background will​​​‌ make them easier to​ integrate, and will ease​‌ the search for unifying​​ principles.

The main mathematical​​​‌ domains on which we​ rely are formal language​‌ theory, symbolic logic, and​​ type theory.

3.2 Formal​​ Language Theory

Formal language​​​‌ theory studies the purely‌ syntactic and combinatorial aspects‌​‌ of languages, seen as​​ sets of strings (or​​​‌ possibly trees or graphs).‌ Formal language theory has‌​‌ been especially fruitful for​​ the development of parsing​​​‌ algorithms for context-free languages.‌ We use it, in‌​‌ a similar way, to​​ develop parsing algorithms for​​​‌ formalisms that go beyond‌ context-freeness. Language theory also‌​‌ appears to be very​​ useful in formally studying​​​‌ the expressive power and‌ the complexity of the‌​‌ models we develop.

3.3​​ Symbolic Logic

Symbolic logic​​​‌ (and, more particularly, proof‌ theory) is concerned with‌​‌ the study of the​​ expressive and deductive power​​​‌ of formal systems. In‌ a rule-based approach to‌​‌ computational linguistics, the use​​ of symbolic logic is​​​‌ ubiquitous. As we previously‌ said, at the level‌​‌ of syntax, several kinds​​ of grammars (generative, categorial...)​​​‌ may be seen as‌ basic deductive systems. At‌​‌ the level of semantics,​​ the meaning of an​​​‌ utterance is captured by‌ computing (intermediate) semantic representations‌​‌ that are expressed as​​ logical forms. Finally, using​​​‌ symbolic logics allows one‌ to formalize notions of‌​‌ inference and entailment that​​ are needed at the​​​‌ level of pragmatics.

3.4‌ Type Theory and Typed‌​‌ Lambda-Calculus

Among the various​​ possible logics that may​​​‌ be used, Church's simply‌ typed λ-calculus and‌​‌ simple theory of types​​ (also known as higher-order​​​‌ logic) play a central‌ part. On the one‌​‌ hand, Montague semantics is​​ based on the simply​​​‌ typed λ-calculus, and‌ so is our syntax-semantics‌​‌ interface model. On the​​ other hand, as shown​​​‌ by Gallin, the target‌ logic used by Montague‌​‌ for expressing meanings (i.e.,​​ his intensional logic) is​​​‌ essentially a variant of‌ higher-order logic featuring three‌​‌ atomic types (the third​​ atomic type standing for​​​‌ the set of possible‌ worlds).

4 Application domains‌​‌

4.1 Deep Semantic Analysis​​

Our applicative domains concern​​​‌ natural language processing applications‌ that rely on a‌​‌ deep semantic analysis. For​​ instance, one may cite​​​‌ the following ones:

  • textual‌ entailment and inference,
  • dialogue‌​‌ systems,
  • semantic-oriented query systems,​​
  • content analysis of unstructured​​​‌ documents,
  • (semi) automatic knowledge‌ acquisition,
  • discourse structure analysis‌​‌ (argumentative relations, discourse markers),​​
  • lexical resources.

4.2 Text​​​‌ Transformation

Text transformation is‌ an application domain featuring‌​‌ two important sub-fields of​​ computational linguistics:

  • parsing, from​​​‌ surface form to abstract‌ representation,
  • generation, from abstract‌​‌ representation to surface form.​​

Text simplification or automatic​​​‌ summarization belong to that‌ domain.

We aim at‌​‌ using the framework of​​ Abstract Categorial Grammars we​​​‌ develop to this end.‌ It is indeed a‌​‌ reversible framework that allows​​ both parsing and generation.​​​‌ Its underlying mathematical structure‌ of λ-calculus makes‌​‌ it fit with our​​ type-theoretic approach to discourse​​​‌ dynamics modeling.

4.3 Types‌ for discourse markers

While‌​‌ there is a rich​​ descriptive literature on Discourse​​​‌ Markers (DM), for instance‌ words/expressions like so or‌​‌ yet in English, the​​ question of their representation​​​‌ in type systems is‌ understudied. In addition to‌​‌ basic types such as​​ individuals or events, or​​​‌ simple functional types (properties,‌ etc.), DM are known‌​‌ to operate on domains​​​‌ like states of affairs,​ beliefs or speech acts.​‌ The entities inhabiting these​​ domains are themselves complex.​​​‌ For instance, speech acts​ involve discourse planning in​‌ the form of a​​ network of intentions and​​​‌ actions. Moreover, DM can​ combine with one another,​‌ forming clusters whose meaning​​ is not always apparent​​​‌ from the meanings of​ the component DM. Within​‌ the context of the​​ ANR CODIM, we​​​‌ aim at developing a​ typing system for (i)​‌ taking into account the​​ array of types denoted​​​‌ by DM and (ii)​ addressing the questions of​‌ the semantic nature of​​ their combinations.

5 Social​​​‌ and environmental responsibility

5.1​ Footprint of research activities​‌

ANR InExtenso:

WP4 of​​ the project is dedicated​​​‌ to the evaluation of​ the environmental impact of​‌ the LLMs. More precisely,​​ it aims at proposing​​​‌ a method for measuring​ the environmental impact of​‌ digital health and use​​ it in the project​​​‌ evaluations and beyond.

6​ Latest software developments, platforms,​‌ open data

6.1 Latest​​ software developments

6.1.1 ACGtk​​​‌

  • Name:
    Abstract Categorial Grammar​ Development Toolkit
  • Keywords:
    Natural​‌ language processing, Functional programming,​​ Logic programming, Lambda-calculus, Ocaml​​​‌
  • Scientific Description:

    Abstract Categorial​ Grammars (ACG) are a​‌ grammatical formalism in which​​ grammars are based on​​​‌ typed lambda-calculus. A grammar​ generates two languages: the​‌ abstract language (the language​​ of parse structures), and​​​‌ the object language (the​ language of the surface​‌ forms, e.g., strings, or​​ higher-order logical formulas), which​​​‌ is the realization of​ the abstract language.

    ACGtk​‌ provides two software tools​​ to develop and to​​​‌ use ACGs: acgc, which​ is a grammar compiler,​‌ and acg, which is​​ an interpreter of a​​​‌ command language that allows​ one, in particular, to​‌ parse and realize terms.​​

  • Functional Description:
    ACGtk provides​​​‌ a piece of software​ for developing and using​‌ Abstract Categorial Grammars (ACG).​​
  • Release Contributions:
    This new​​​‌ version of the software​ provides two important functionalities.​‌ On the one hand,​​ it provides support for​​​‌ parsing with almost linear​ grammars. On the other​‌ hand, it generates a​​ javascript program to be​​​‌ used and loaded by​ web browers, in order​‌ to help demonstrating the​​ software (a demo version​​​‌ is available on-line from​ the public gitlab webpages​‌ of the project).
  • URL:​​
  • Publications:
  • Contact:
    Sylvain Pogodalla​​
  • Participants:
    Philippe De Groote,​​​‌ Pierre Ludmann, Jiri Marsik,​ Sylvain Pogodalla, Vincent Tourneur​‌

6.1.2 Grew

  • Name:
    Graph​​ Rewriting
  • Keywords:
    Semantics, Syntactic​​​‌ analysis, NLP, Graph rewriting​
  • Functional Description:
    Grew is​‌ a Graph Rewriting tool​​ dedicated to applications in​​​‌ NLP. Grew takes into​ account confluent and non-confluent​‌ graph rewriting and it​​ includes several mechanisms that​​​‌ help to use graph​ rewriting in the context​‌ of NLP applications (built-in​​ notion of feature structures,​​​‌ parametrization of rules with​ lexical information).
  • News of​‌ the Year:
    In 2025,​​ three new versions (1.17,​​​‌ 1.18 and 1.19) were​ released (together with several​‌ bug fixes). New features​​ are, for version 1.17:​​​‌ Handling of multi-treebank requests​ in Grew-match, for version​‌ 1.18: Improved handling of​​ metadata and global constraints,​​ for version 1.19: Introduction​​​‌ of tuples of clustering‌ keys, improve corpusbank manager.‌​‌
  • URL:
  • Publications:
  • Contact:
    Bruno​​​‌ Guillaume
  • Participants:
    Bruno Guillaume,‌ Guillaume Bonfante, an anonymous‌​‌ participant

6.1.3 HostoMytho

  • Keywords:​​
    Game with a purpose,​​​‌ Natural language processing
  • Functional‌ Description:
    HostoMytho is a‌​‌ GWAP, or "game with​​ a purpose" developed within​​​‌ the framework of the‌ CODEINE ANR project. The‌​‌ aim of the game​​ is to allow users​​​‌ to annotate medical files‌ generated automatically, in order‌​‌ to evaluate their plausibility​​ (quality of the language​​​‌ and medical semantics) and‌ to add different layers‌​‌ of information (negation, hypothesis,​​ time, etc.). HostoMytho is​​​‌ multiplatform.
  • URL:
  • Publication:‌
  • Contact:
    Karën Fort‌​‌
  • Partners:
    LISN, CEA-List

6.1.4​​ Arborator-Grew

  • Name:
    Arborator's Collaborative​​​‌ Annotation
  • Keywords:
    Annotation tool,‌ Syntactic analysis
  • Functional Description:‌​‌
    The online interface allows​​ managing collaborative annotation projects​​​‌ in dependency syntax. It‌ is possible to use‌​‌ Grew queries and also​​ to directly rewrite graphs​​​‌ in the annotation tool.‌
  • News of the Year:‌​‌
    During 2025, we continued​​ to refactor the code​​​‌ base for both frontend‌ and backend. In addition,‌​‌ we worked on improving​​ existing functionalities and adding​​​‌ new ones based on‌ user requests.
  • URL:
  • Publication:
  • Contact:
    Bruno​​ Guillaume
  • Participant:
    5 anonymous​​​‌ participants
  • Partners:
    Université Paris‌ Nanterre, LIMSI, LISN

6.2‌​‌ Open data

7 New​​ results

7.1 Syntax-Semantics Interface​​​‌

Participants: Maxime Amblard,‌ Marie Cousin, Philippe‌​‌ de Groote, Amandine​​ Decker, Bruno Guillaume​​​‌, Maxime Guillaume,‌ Sylvain Pogodalla, Siyana‌​‌ Pavlova, Valentin Richard​​, Zhengjian Li.​​​‌

7.1.1 Abstract Categorial Grammars‌

Feature Structure

ACG has‌​‌ proven to be a​​ powerful framework with well-defined​​​‌ theoretical properties. It was,‌ however, lacking a facility‌​‌ which is useful and​​ widely used for grammar​​​‌ engineering: feature structures. The‌ latter are often used‌​‌ to express in a​​ concise way some combinatorial​​​‌ properties related to morphosyntactic‌ properties of expressions, for‌​‌ instance subject-verb agreement.

We​​ worked on extending the​​​‌ ACG type system to‌ provide a generic feature‌​‌ structure framework. This extension​​ relies on a restricted​​​‌ addition of the product‌ (records) and dependent types‌​‌ and still allows for​​ the reduction of grammars​​​‌ to Datalog programs (which‌ is used to implement‌​‌ ACG parsing in ACG​​tk, see Sec.​​​‌ 6). In his‌ thesis, Maxime Guillaume introduced‌​‌ Affix Abstract Categorial Grammars​​ (AACGs), an extension of​​​‌ ACGs enriched by the‌ integration of feature structures.‌​‌

First, he defined an​​ enriched λ-calculus that extends​​​‌ the simply typed linear‌ λ-calculus with enumerations, records,‌​‌ and dependent products. On​​ this basis, he defined​​​‌ AACGs and demonstrated their‌ strong equivalence with classical‌​‌ ACGs through a series​​ of formal transformations. The​​​‌ algorithmic implications of this‌ equivalence for parsing were‌​‌ then studied. An adaptation​​ of Kanazawa’s reduction was​​​‌ presented. This adaptation guarantees‌ polynomial-time complexity while preserving‌​‌ the factorization benefits specific​​ to AACGs. Finally, to​​​‌ validate the industrial applicability‌ of this approach, a‌​‌ dedicated compiler for AACGs​​​‌ was designed and implemented,​ integrated into a text​‌ generation engine. Experiments conducted​​ on a large-coverage French​​​‌ grammar highlight a significant​ reduction in grammar size​‌ as well as a​​ notable improvement in parsing​​​‌ and generation performance.

Encoding​ of Meaning-Text Theory Into​‌ ACGs

Meaning-Text Theory (MTT)​​ is a linguistic theory​​​‌ geared towards generating natural​ language expressions from semantic​‌ representations 87. It​​ relies on seven representation​​​‌ levels (e.g., semantics, deep​ syntax, surface syntax, etc.).​‌ Representations at each level​​ are related to representations​​​‌ at the adjacent levels​ by rewriting devices. Each​‌ representation is made of​​ several structures, among which​​​‌ the predicative and the​ communicative ones. MTT uses​‌ the key concept of​​ paraphrase, especially in these​​​‌ rewriting devices. ACGs come​ with several composition modes,​‌ one of which in​​ particular corresponds to transduction​​​‌ of (tree or graph)​ structures.

We have therefore​‌ been studying the ability​​ of ACGs to model​​​‌ MTT structure transformations between​ adjacent levels, focusing on​‌ the structures and levels​​ of semantics, deep syntax,​​​‌ and surface syntax.

In​ previous work 68,​‌ 67 we proposed an​​ encoding of MTT into​​​‌ ACGs where the predicative​ structure of the semantic​‌ level in MTT was​​ used. However, MTT rewriting​​​‌ processes also make use​ of communicative structure information,​‌ decorating the predicate structures​​ (at the semantic level)​​​‌ with theme and rheme​ information.

Indeed, both expressions​‌ "Charlie is Taylor's son"​​ and "Charlie, the son​​​‌ of Taylor" share the​ same predicative structure and​‌ are not paraphrases of​​ each other. While the​​​‌ second one is a​ nominal expression, the first​‌ one is a verbal​​ expression about Charlie, that​​​‌ states that he is​ Taylor's son. The difference​‌ between both expressions, that​​ share the same semantic​​​‌ predicative graph, is made​ by the communicative structure.​‌

It shows the crucial​​ role the communicative structure​​​‌ plays in MTT since​ they determine, from a​‌ given semantic graph (i.e.,​​ predicative and communicative structures),​​​‌ which deep-syntactic graph is​ to be obtained. We​‌ have therefore proposed to​​ also take into account​​​‌ this communicative structure, using​ suitable types and grammatical​‌ composition as offered by​​ the ACG framework 50​​​‌, 59.

We​ also proposed an alternative​‌ approach to representing deep​​ and surface syntactic trees​​​‌ 42, 29.​ This alternative approach, based​‌ on 74, 73​​, allows for a​​​‌ more flexible and generic​ representation of the syntactic​‌ structures, and for a​​ better account of modifiers​​​‌ (adverbs, adjectives).

7.1.2 Formal​ semantics of adnominal modification​‌

We have proposed a​​ treatment of adnominal modification​​​‌ that parallels the treatment​ of adverbial modification in​‌ neo-Davidsonian event semantics 32​​. To this end,​​​‌ we introduced a notion​ of perspective that allows​‌ nouns to be interpreted​​ as sets of sets​​​‌ of perspective. The resulting​ theory provides a unified​‌ compositional treatment of intersective,​​ subsective, modal, and privative​​​‌ adjectives, and avoids the​ intensional paradoxes caused by​‌ an extensional treatment of​​ subsecutive adjectives. Building on​​​‌ this work, we have​ advocated for unifying the​‌ concepts of events, states,​​ and perspectives. We then​​ defined possible worlds as​​​‌ sets of such event-like‌ concepts. This approach allows‌​‌ different semantic treatments proposed​​ in the literature to​​​‌ be reconstructed within a‌ unified framework. In particular,‌​‌ it provides a formal​​ treatment of the ambiguity​​​‌ between the intersective and‌ the subsective interpretation that‌​‌ some adjectives present 52​​. Finally, with the​​​‌ aim of giving an‌ account of hyperintensional phenomena‌​‌ related to the interpretation​​ of proper names that​​​‌ refer to the same‌ individual but cannot be‌​‌ substituted one for the​​ other, we came up​​​‌ with the radical idea‌ of interpreting individuals as‌​‌ sets of perspectives 28​​.

7.1.3 Semantic treatment​​​‌ of plurals textual mathematics.‌

We reviewed issues related‌​‌ to the semantics of​​ plurals in natural language​​​‌ and demonstrated how these‌ issues arise in the‌​‌ case of mathematical texts.​​ In particular, we focused​​​‌ on the distinction between‌ collective and distributive predicates‌​‌ 26. We also​​ studied the conditions under​​​‌ which adjectives that denote‌ binary relations can be‌​‌ used as collective predicates.​​ This led us to​​​‌ propose a fine-grained semantic‌ interpretation of grammatical numbers‌​‌ and to introduce distributivity​​ operators that enable a​​​‌ compositional semantic treatment of‌ plurals in natural mathematics‌​‌ 27.

7.1.4 Semantic​​ Representation

Siyana Pavlova defended​​​‌ her PhD thesis in‌ June 2025 58,‌​‌ in which she presented​​ YARN, a new semantic​​​‌ representation formalism that aims‌ to combine the benefits‌​‌ of logic-based formalisms with​​ direct interpretability, making it​​​‌ widely usable. YARN is‌ rooted in the encoding‌​‌ of different semantic phenomena​​ as separate layers. The​​​‌ thesis presents a formal‌ definition of the mathematical‌​‌ structure that constitutes YARN​​ and illustrates with concrete​​​‌ examples how this structure‌ can be used in‌​‌ the context of semantic​​ representation for encoding multiple​​​‌ phenomena (such as modality,‌ negation and quantification) as‌​‌ layers built on top​​ of a central predicate-argument​​​‌ structure. The benefit of‌ YARN is that it‌​‌ allows for the independent​​ annotation and analysis of​​​‌ different phenomena as they‌ are easy to “switch‌​‌ off”. Furthermore, the ability​​ of YARN to encode​​​‌ simple interactions between phenomena‌ is explored. The thesis‌​‌ concludes with a discussion​​ of some of the​​​‌ interesting observations made during‌ the development of YARN‌​‌ so far and outline​​ our extensive future plans​​​‌ for this formalism.

In‌ 40 Rémi De Vergnette,‌​‌ Maxime Amblard and Bruno​​ Guillaume present different modular​​​‌ evaluation metrics for Layered‌ Meaning Representation, defined as‌​‌ YARN, a semantic formalism​​ encoded using rich structures​​​‌ that generalize AMR graphs.‌ While existing metrics like‌​‌ SMATCH evaluate graph-based semantic​​ representations such as AMR,​​​‌ they cannot directly handle‌ YARN's more complex structures.‌​‌ A full use of​​ the modular nature of​​​‌ YARN is used to‌ propose two families of‌​‌ metrics, depending on the​​ linguistic features and type​​​‌ of semantic phenomenon targeted.‌ The first one, SMATCHY,‌​‌ extends the AMR SMATCH​​ metric. The new metric​​​‌ YARNBLEU, based on the‌ SEMBLEU metric for AMR‌​‌ is also proposed. Both​​ families are evaluated on​​​‌ a small dataset of‌ human annotated YARN structures,‌​‌ adding random modifications simulating​​​‌ annotation mistakes and show​ that SMATCHY provides a​‌ more consistent and reliable​​ approach with respect to​​​‌ the type of modifications​ considered.

Ivaylo Mitov and​‌ Tadzhat Marharian conducted both​​ an M1 internship under​​​‌ the supervision of Bruno​ Guillaume and Maxime Amblard.​‌ Ivaylo Mitov worked on​​ the developement on annotation​​​‌ for AMR and for​ YARN for other languages​‌ and on the production​​ of YARN from AMR,​​​‌ leveraging Universal Dependencies annotations.​ Tadzhat Marharian started the​‌ developement of a new​​ Graphical User Interface for​​​‌ managing YARN annotations.

In​ 43 Amandine Decker and​‌ Maxime Amblard discuss the​​ limits of semantic representation​​​‌ formalisms, in particular when​ it comes to representing​‌ meaning in context and​​ interaction. Detailed representations can​​​‌ be used as basis​ for natural language understanding​‌ or generation. While these​​ formalisms produce thorough analysis,​​​‌ they do not cover​ some crucial aspects of​‌ real language use. Most​​ semantic representation formalisms like​​​‌ AMR, DRS or UMR​ operate out-of-context, which means​‌ they ignore a significant​​ part of the content​​​‌ of the utterances they​ analyse. In this work​‌ they discuss various aspects​​ of language use left​​​‌ out by semantic representation​ formalisms and argue that​‌ future work in this​​ field should include extending​​​‌ these formalisms so they​ could cover the interactive​‌ aspect of language.

7.1.5​​ Syntax and semantics of​​​‌ questions

Natural language statements​ are composed not only​‌ of declarative sentences but​​ also of interrogative ones.​​​‌ Moreover, sentences cannot be​ categorized into purely declarative​‌ or purely interrogative sentences.​​ Typically, a declarative statement​​​‌ may contain a subordinated​ interrogative clause:

  • (a)
    I​‌ don't know where Mary​​ is.

We observe that​​​‌ noun phrases and declarative​ clauses can sometimes raise​‌ alternatives like hidden questions.​​ For example, in a​​​‌ dependence statement like (b),​ several scenarios are considered​‌ (sunny, rainy,..., going to​​ the beach, not going​​​‌ to the beach) and​ are related to each​‌ other implicitly. In 55​​, a compositional way​​​‌ to derive and link​ these alternatives is laid​‌ out.

  • (b)
    Depending the​​ weather, we might go​​​‌ to the beach.
  • (c)​
    Ça dépend (de) quel​‌ temps il fait.

In​​ French, similar sentences using​​​‌ the verb dépendre can​ embed an interrogative clause.​‌ However, it is unclear​​ what is more standard​​​‌ between keeping the de​ preposition or removing it​‌ in cases like (c).​​ The contribution 54 investigates​​​‌ this grammatical issue by​ establishing corpus statistics on​‌ the frequency of a​​ preposition between a verb​​​‌ and its embedded interrogative​ clause.

Like indefinites, interrogative​‌ words can be referred​​ to by other expressions.​​​‌ For example, shei​ in (d) refers to​‌ the person who was​​ sitting there. This kind​​​‌ of anaphora has not​ been fully considered in​‌ anaphora-annotated corpora. The study​​ 53 tries to evaluate​​​‌ this by making an​ inventory of the (missing)​‌ annotations of anaphora with​​ a wh-word in the​​​‌ French corpus ANCOR.

  • (d)​
    Whoi was sitting​‌ there? Shei forgot​​ her bag.

7.1.6 Use​​​‌ of semantics

Before the​ invention of the printing​‌ press, texts could only​​ be reproduced through manual​​ copying, a process prone​​​‌ to errors, accidents, and‌ intentional modifications. These changes‌​‌ altered each manuscript and​​ were subsequently propagated by​​​‌ other scribes. For philologists‌ reconstructing text history and‌​‌ genealogical relationships (stemma codicum),​​ analyzing these variants is​​​‌ crucial. Stemmatology methods aim‌ to objectively construct genealogical‌​‌ trees of textual transmission.​​

At the University of​​​‌ Lorraine, the Écritures laboratory‌ and MSH have focused‌​‌ on uncovering the genealogical​​ lineage of Hebrew manuscripts.​​​‌ A join project with‌ Maxime Amblard seeks to‌​‌ improve the manual work​​ involved in critical editions​​​‌ of the Hebrew Bible‌ by applying advanced methods‌​‌ from applied mathematics and​​ natural language processing to​​​‌ reconstruct stemmas. With Iglika‌ Zlatkova Nikolova-Stoupak, they design,‌​‌ train and test learning​​ model to automatically tag​​​‌ scribal variants in manuscripts.‌

The current project 36‌​‌ is inscribed within the​​ field of stemmatology or​​​‌ the study and/or reconstruction‌ of textual transmission based‌​‌ on the relationship between​​ the available witnesses of​​​‌ given texts. In particular,‌ the variants (differences) at‌​‌ the word-level in manuscripts​​ written in Biblical Hebrew​​​‌ are addressed. A dataset‌ based on the Book‌​‌ of Ben Sira is​​ manually annotated for the​​​‌ following variant categories: ‘plus/minus’,‌ ‘inversion’, ‘morphological’, ‘lexical’ or‌​‌ ‘unclassifiable’. A strong classifier​​ (F1 value of 0.80)​​​‌ is then trained to‌ predict these categories in‌​‌ collated (aligned) pairs of​​ witnesses. The classifier is​​​‌ non-neural and makes use‌ of the two words‌​‌ themselves as well as​​ part-of-speech (POS) tags, hand-crafted​​​‌ rules per category, and‌ additional synthetically derived data.‌​‌ Other models experimented with​​ include neural ones based​​​‌ on the state-of-the-art model‌ for Modern Hebrew, DictaBERT.‌​‌ Other features whose relevance​​ is tested are different​​​‌ types of morphological information‌ pertaining to the word‌​‌ pairs and the Levenshtein​​ distance between the words​​​‌ within a pair. The‌ strongest classifier as well‌​‌ as the used data​​ are made publicly available.​​​‌ Coincidentally, the corelation between‌ two sets of morphological‌​‌ labels is investigated: professionally​​ established as per the​​​‌ QumranDigital online library and‌ automatically derived with the‌​‌ sub-model DictaBERT-morph.

Maxime Amblard​​ pursue a collaboration with​​​‌ the French Company Namkin.‌ With Georgios Zervakis, they‌​‌ develop BEE A First​​ Assessment of Language Models​​​‌ for Business Event Extraction.‌ Event Extraction (EE) is‌​‌ the task of automatically​​ extracting relevant information about​​​‌ events in text. Business‌ events in particular, such‌​‌ as corporate investments or​​ product launches, can provide​​​‌ enterprises with insight into‌ how to better position‌​‌ themselves in the market​​ with respect to the​​​‌ competition. We benchmark existing‌ EE systems in the‌​‌ business domain. To this​​ end, we introduce BEE​​​‌ (Business Event Extraction), a‌ manually-curated corpus for end-to-end‌​‌ business event extraction. Empirical​​ results of four different​​​‌ system architectures demonstrate the‌ challenging nature of BEE,‌​‌ with Large Language Models​​ (LLMs) underperforming compared to​​​‌ smaller models. Finally, we‌ employ complementary evaluation metrics‌​‌ to understand the types​​ of errors and reveal​​​‌ significant performance gains

While‌ modern semantic representations may‌​‌ contain vast quantities of​​ information, they do not​​​‌ always (or necessarily) contain‌ the information that is‌​‌ useful for the concrete​​​‌ application. For instance, significant​ challenges still persist in​‌ dealing with temporal relations​​ and finely-grained negation interpretation.​​​‌

Recent research has looked​ into the benefits of​‌ exploiting semantic representations, and​​ in particular Abstract Meaning​​​‌ Representation, for low-resources scenarios​ and document level event​‌ argument extraction. However, it​​ appears that AMR has​​​‌ to be adapted in​ order to optimally support​‌ event extraction related tasks​​ 95. One major​​​‌ limitation of AMR for​ document-level event extraction is​‌ that AMR works at​​ the sentence level, and​​​‌ thus requires the aggregation​ of sentence-level representations. AMR​‌ is also limited in​​ terms of negation and​​​‌ universal quantification expressive power.​

7.2 Distributional Semantics and​‌ Lexical Structures

Participants: Sylvain​​ Pogodalla.

Numerical and​​​‌ continuous representation of word​ semantics, in particular vector​‌ representations, and neural learning​​ techniques gave rise to​​​‌ impressing results on a​ large number of natural​‌ language processing tasks. These​​ representations, or embeddings,​​​‌ rely on the distributional​ hypothesis 79, 71​‌: the meaning of​​ word is provided by​​​‌ the linguistic context in​ which it occurs, and​‌ semantically related words should​​ be represented by related​​​‌ embeddings.

However, the very​ nature of semantic relatedness​‌ encoded in embeddings remains​​ somewhat unspecified, and can​​​‌ express different relations as​ classified by linguists (e.g.,​‌ synonymy, hyponymy, etc.) 90​​, 81, and​​​‌ may even depend on​ the chosen methods to​‌ compute the vector similarity​​ 88, the size​​​‌ of the context or​ its type 94,​‌ 89.

We have​​ been studying the vector​​​‌ representations as provided by​ transformer and attention models​‌ 93 and compare them​​ with linguistic knowledge as​​​‌ expressed by linguists. We​ rely more precisely on​‌ the theory of combinatorial​​ explanatory lexicology, the lexicological​​​‌ part of the Meaning-Text​ Theory melcuk-polguere:2016,melcuk-polguere:2021, which hinges​‌ upon collocations to structure​​ lexical knowledge as graphs.​​​‌ This theory provides a​ fine-grained description of lexical​‌ relations against which numerical​​ models can be compared,​​​‌ as well as lexical​ resources (a lexicon for​‌ French 85, 65​​ and annotated examples 64​​​‌). We focus on​ lexical structure, where previous​‌ works rather focused on​​ morphosyntactic information and syntactic​​​‌ structures 82, 86​. Data construction and​‌ statistical analysis is being​​ performed and a publication​​​‌ is in preparation.

7.3​ Discourse Dynamics

Participants: Maxime​‌ Amblard, Philippe de​​ Groote, Amandine Decker​​​‌, Jacques Jayez,​ Michel Musiol, Emeric​‌ Licorni, Ines Hernandez​​.

7.3.1 Dialogue Modeling​​​‌

Dialogue encompasses a vast​ diversity of interactional forms​‌ which grows with technological​​ and societal evolutions, such​​​‌ as the generalisation of​ video-mediated communication following the​‌ COVID-19 pandemic. As dialogue​​ data becomes increasingly heterogeneous,​​​‌ modelling dialogue requires not​ only algorithmic advances but​‌ also a precise characterisation​​ of the data on​​​‌ which these models are​ developed and evaluated. In​‌ order to better understand​​ current practices in the​​​‌ field, Amandine Decker, Maxime​ Amblard and Ellen Breitholtz​‌ (Gothenburg University, Sweden) conducted​​ a meta-review of papers​​​‌ on dialogue published in​ the ACL Anthology in​‌ 2024 30. This​​ analysis provides an empirical​​ overview of how dialogue​​​‌ data is really described‌ and used by the‌​‌ community. One of the​​ main observations is that​​​‌ dialogue data is increasingly‌ treated primarily as a‌​‌ resource for model training,​​ rather than as an​​​‌ object of analysis in‌ its own right. As‌​‌ a consequence, research overwhelmingly​​ focuses on English-language datasets,​​​‌ with a strong preference‌ for clean, high-quality dialogues,‌​‌ and often overlook distinctions​​ between task-oriented and open-domain​​​‌ interactions. These practices make‌ it difficult to establish‌​‌ a principled framework for​​ selecting appropriate dialogue resources​​​‌ for a given task‌ and limit our ability‌​‌ to assess the scope​​ and generalisability of reported​​​‌ results. This line of‌ work aims to contribute‌​‌ to a more explicit​​ reflection on dialogue data​​​‌ and its role in‌ dialogue modelling research.

This‌​‌ work is complemented by​​ ongoing research by Amandine​​​‌ Decker, Maxime Amblard and‌ Ellen Breitholtz on topical‌​‌ structure analysis through the​​ collection of a corpus​​​‌ of chat-based interactions in‌ both English and French.‌​‌ The objective is to​​ develop a resource specifically​​​‌ designed to support the‌ study of topical organisation‌​‌ in dialogue, with a​​ particular focus on how​​​‌ participants interpret and accommodate‌ potentially incoherent contributions during‌​‌ interaction.

7.3.2 Discourse Markers​​

Jacques Jayez continues working​​​‌ with Mathilde Dargnat (ATILF),‌ Paola Herreño (Ph.D. candidate‌​‌ ATILF-LLF) and Maeva Sillaire​​ (Ph.D. candidate ATILF) on​​​‌ the semantic representation of‌ D(iscourse) M(arkers). DMs are‌​‌ words/expressions like so or​​ well in English which​​​‌ help structuring discourse or‌ communicating speakers' internal epistemic‌​‌ or affective states as​​ well as interactional moves.​​​‌ The discourse structuring function‌ is the hallmark of‌​‌ connective DMs, which correspond​​ to a large variety​​​‌ of discourse relations (causal,‌ explanatory, concessive, temporal, etc.).‌​‌ Other functions are realized​​ by discourse particles which​​​‌ can express for instance‌ surprise, attention modification or‌​‌ various interactional moves (backchannels,​​ calls to attention, etc.)​​​‌ 80.

Investigating the‌ semantic profile of DMs‌​‌ is developed through three​​ distinct but not quite​​​‌ independent subtasks. (1) Characterizing‌ what DMs index (refer‌​‌ to, denote, etc). The​​ domain-based approach initiated in​​​‌ the 90s consists in‌ defining different types (aka‌​‌ domains) of semantic​​ objects, like states of​​​‌ affairs, beliefs or speech‌ acts. Domains are instrumental‌​‌ in teasing apart subclasses​​ of connective DMs 84​​​‌. Discourse particles index‌ internal states of speakers‌​‌ or interactional operations 69​​. (2) The second​​​‌ subtask consists in determining‌ what the semantic contribution‌​‌ of a DM is​​ (propositional content, presupposition, conventional​​​‌ implicature). The semantic contribution‌ aspect interacts with the‌​‌ indexing behaviour of DMs​​ for connectives 84 and,​​​‌ moreover, in the case‌ of particles, raises the‌​‌ question of the semantic​​ analysis of `side effects'​​​‌ in terms of monads,‌ as exemplified by Asudeh‌​‌ and Giorgolo 66 a.o.​​ (3) The intuitions about​​​‌ the lexical meaning of‌ DMs are notoriously difficult‌​‌ to substantiate, in particular​​ for particles. We are​​​‌ currently studying how different‌ types of intuitions can‌​‌ be coded in the​​ declarative format of Dialogue​​​‌ Game Boards of 72‌. Points (2) and‌​‌ (3) converge toward the​​​‌ problem of defining an​ ontology which extends that​‌ of Ginzburg by including​​ commitment, intentions and side-effects,​​​‌ in order to take​ into account the distinctions​‌ introduced in 92.​​

In the context of​​​‌ the CODIM ANR project​, we have designed​‌ a workflow for annotating​​ the DMs in our​​​‌ set of French spoken​ and written corpora and​‌ analysing the statistical properties​​ of DM sequences. Given​​​‌ the overall poor performance​ of LLMs, we have​‌ kept the finite automata​​ approach previously developed in​​​‌ CODIM, constructing a final​ cascade of 622 automata​‌ with the help of​​ the Unitex-Gramlab software. The​​​‌ cascade extracts 900 DM​ types from the corpora​‌ for a total of​​ 8195046 DM occurrences. The​​​‌ annotation results are normalized​ and passed to a​‌ set of 10 association​​ measure functions, which estimate​​​‌ the strength of association​ between any two juxtaposed​‌ DMs in the corpora.​​ The resulting vectors are​​​‌ scaled and compared by​ various distance estimators, in​‌ order to create a​​ hierarchy of association for​​​‌ any two DMs sharing​ a common associate, for​‌ instance alors and bon​​ with respect to mais​​​‌ in the pairs mais​ alors and mais bon​‌.

Jacques Jayez has​​ refined his work on​​​‌ the argumentative dimension of​ discourse, and the last​‌ version of his submission​​ for a book on​​​‌ implicit manipulation has been​ accepted by de Gruyter​‌ 83.

7.3.3 Pathological​​ Discourse Modeling

Also based​​​‌ on interviews between psychologists​ and schizophrenia patient, we​‌ began a study on​​ the alignement between discourse​​​‌ descriptors and speech characteristics,​ in order to uncover​‌ potential link between what​​ is said (discourse) and​​​‌ how it is said​ (speech characteristics). To do​‌ so, Vincent Martin supervised​​ two M1 students (Speech​​​‌ pathologists) who worked on​ pauses characteristics on the​‌ difference discourse structures ;​​ he then supervised two​​​‌ other interns (Zsofia Hauwk,​ M1 and Maé Dugoua-Jacques,​‌ L3) to work on​​ the automation of diarization​​​‌ (speakers separation) and text​ transcription of these interviews.​‌ The low audio quality​​ has represented a significant​​​‌ challenge, which we are​ currently trying to resolve​‌ at the time of​​ writing this report.

Vincent​​​‌ Martin also proposed a​ new framework for analysing​‌ speech acoustic quality using​​ network analyses of acoustic​​​‌ descriptors18. This​ framework has obtained relevant​‌ results on the SpeechWelness​​ challenge, adressing suicidability in​​​‌ adolescent using only speech,​ the resultats have been​‌ presented at Interspeech 2025​​ 37.

In parallel​​​‌ with this work, Vincent​ Martin pursued his work​‌ about refining sleep 13​​, 11, 12​​​‌, 14 and psychiatric​ semiology 15, in​‌ order to improve the​​ accuracy of digital psychiatry​​​‌ devices by refining their​ targets.

Michel Musiol has​‌ conducted theoretical, formal and​​ empirical researches in semantics​​​‌ and conversation analysis in​ order to relate the​‌ linguistic, cognitive and psycholinguistic​​ aspects of semantic representations​​​‌ as they appear in​ discourse. For instance, with​‌ Maxime Amblard, we build​​ a formal, computational and​​​‌ dynamic model likely to​ reveal the properties of​‌ pathological discourse, based on​​ the modeling of violations​​ to coherence. In that​​​‌ way, empirical studies were‌ based on clinical interviews‌​‌ between psychologists or psychiatrists​​ and schizophrenic patients 19​​​‌ or between psychologists or‌ psychiatrists and bipolar patients‌​‌ 10. In the​​ first paper, our dialog​​​‌ analysis model supplements to‌ existing methods which often‌​‌ suffer from being ad-hoc,​​ lacking compatibility with manual​​​‌ analysis, or failing to‌ produce variables that align‌​‌ with computational or algebraic​​ analysis. In the second​​​‌ paper, we show that‌ cognitive and conversational properties‌​‌ measured with clinical assessment​​ or discourse analysis have​​​‌ led to the formulation‌ of a hypothesis suggesting‌​‌ that the two pathologies​​ might be situated on​​​‌ a continuum. We examined‌ the hypothesis of such‌​‌ a continuum in the​​ context of the pragmatic​​​‌ discontinuities that occur in‌ dialogue with a psychologist‌​‌ and either a schizophrenic​​ or a bipolar patient.​​​‌ Furthermore, the aim is‌ to delineate the cognitive‌​‌ and psycholinguistic impairments observed​​ in the schizophrenic group​​​‌ in comparison to the‌ bipolar group.

Anyway, this‌​‌ program is intended to​​ subsequently propose computerized tools​​​‌ for diagnosis assistance, screening‌ of people at risk,‌​‌ as well as psychotherapeutic​​ and therapeutic evaluation and​​​‌ follow-up 56. For‌ instance, we have investigated‌​‌ the socio-behavioral dynamics of​​ Shwachman-Diamond Syndrom, focusing on​​​‌ how children with the‌ condition navigate cooperative interactions.‌​‌ Using computational pragmatics, we​​ aimed to identify the​​​‌ underlying principles guiding their‌ social behavior 20.‌​‌

In the line of​​ last years project, Michel​​​‌ Musiol and Maxime Amblard‌ pursue on the caracterisation‌​‌ of pathological discourse. With​​ Arthur Trognon, they published​​​‌ a book chapter.

For‌ the PhD work of‌​‌ Vincent-Thomas Barrouillet, in 10​​ they compare two matched​​​‌ clinical interview corpora, conducted‌ with bipolar patients and,‌​‌ under the same conditions,​​ with schizophrenic patients. The​​​‌ interview is non-directive, which‌ encourages the patient to‌​‌ speak freely. Both corpora​​ contain the same number​​​‌ of words. They conduct‌ an exhaustive search for‌​‌ "breaks" using an investigative​​ model of discursive disorganization​​​‌ that is sensitive to‌ the linguistic and illocutionary‌​‌ properties of speech acts.​​ We conduct an exhaustive​​​‌ search for "breaks" using‌ an investigative model of‌​‌ discursive disorganization that is​​ sensitive to the linguistic​​​‌ and illocutionary properties of‌ speech acts. These "breaks"‌​‌ are then formally analyzed​​ using hierarchical modeling, which​​​‌ reveals the defective relationships‌ between speech acts in‌​‌ the dynamic structuring of​​ conversational sequences. They conclude​​​‌ that hierarchical and dynamic‌ discourse analysis methodology is‌​‌ a valuable tool for​​ identifying certain bipolar disorders​​​‌ as well as for‌ recognizing schizophrenic symptoms. It‌​‌ also makes it possible​​ to clarify the psycholinguistic​​​‌ processes associated with the‌ expression of bipolar and‌​‌ schizophrenic disorders in verbal​​ interaction. Finally, it contributes​​​‌ to the hypothesis of‌ a continuum between schizophrenia‌​‌ and bipolar disorder, supporting​​ the high-level cognitive processes​​​‌ that underpin discursive competence.‌

7.4 Common Basic Resources‌​‌

Participants: Maxime Amblard,​​ Hee-Soo Choi, Philippe​​​‌ de Groote, Bruno‌ Guillaume, Sylvain Pogodalla‌​‌, Karën Fort.​​

7.4.1 Universal Dependencies and​​​‌ Surface Syntactic Universal Dependencies‌

The Universal Dependencies (UD)‌​‌ project aims to build​​​‌ a syntactic dependency scheme​ that enables similar analyses​‌ of several different languages.​​ Bruno Guillaume is an​​​‌ active member of the​ UD community and contributes​‌ to the development and​​ the improvement of the​​​‌ French data within this​ international initiative.

In 2025,​‌ he continued to work,​​ in collaboration with Sylvain​​​‌ Kahane, Kim Gerdes and​ their teams to promote​‌ the Surface Syntactic Universal​​ Dependencies (SUD) framework. SUD​​​‌ is an annotation scheme​ for syntactic dependency treebanks,​‌ that is almost isomorphic​​ to UD (Universal Dependencies).​​​‌ Unlike to UD, it​ is based on syntactic​‌ criteria (favouring functional heads)​​ and the relations are​​​‌ defined on distributional and​ functional bases.

This work​‌ is mainly conducted in​​ the ANR project Autogramm​​​‌ (Induction of descriptive grammar​ from annotated corpora), which​‌ started in 01 2022.​​ The project aims to​​​‌ automate, as far as​ possible, the extraction of​‌ descriptive grammars and grammatical​​ descriptions from annotated corpora​​​‌ for linguistic and typological​ studies. The project also​‌ promotes the development of​​ treebanks for low-resourced languages,​​​‌ in order to extract​ quantitative descriptive grammars for​‌ these languages.

In 38​​, the authors present​​​‌ a new format of​ the Rhapsodie Treebank, which​‌ contains both syntactic and​​ prosodic annotations. This provides​​​‌ a comprehensive dataset for​ the study of spoken​‌ French. This integrated format​​ enables complex, multilevel queries​​​‌ and paves the way​ for intonosyntactic studies.

In​‌ 34, the authors​​ proposed a study of​​​‌ the different statuses of​ the morphosyntactic features used​‌ in UD treebanks. If​​ most of these features​​​‌ correspond to values of​ inflectional morphemes, some describe​‌ lexical subclasses or are​​ just conventional names of​​​‌ (polysemic) morphemes. Syncretism is​ also a challenge, because​‌ exact values are only​​ deductible from contextual information.​​​‌ An attempt at clarification​ and an implementation in​‌ written and spoken French​​ treebanks is then proposed.​​​‌

Bruno Guillaume, in collaboration​ with Santiago Herrera, Ioana-Madalina​‌ Silai, Caio Corro and​​ Sylvain Kahane 33 have​​​‌ developed a a data-driven​ contrastive framework to extract​‌ common and distinctive linguistic​​ descriptions from syntactic treebanks.​​​‌ The extracted contrastive rules​ are defined by a​‌ statistically significant difference in​​ frequency and precision, and​​​‌ classified as common and​ distinctive rules across the​‌ set of treebanks. The​​ method is illustrated by​​​‌ working on object word​ order using Universal Dependencies​‌ (UD) treebanks in 6​​ Romance languages: Brazilian Portuguese,​​​‌ Catalan, French, Italian, Romanian​ and Spanish. The paper​‌ discusses the limitations faced​​ due to inconsistent annotation​​​‌ and the feasibility of​ conducting contrastive studies using​‌ the UD collection.

During​​ his M2 internship, Luc​​​‌ Cheng has applied the​ methodology used for contrastive​‌ studies to the corpus​​ correction application. This study​​​‌ was conducted using written​ and spoken French, as​‌ well as two English​​ corpora.

In 2025, two​​​‌ new versions of Universal​ Dependencies were released. Bruno​‌ Guillaume collaborated with field​​ linguists to produce or​​​‌ improve Surface Syntactic Universal​ Dependencies treebanks and to​‌ convert them to Universal​​ Dependencies:

  • Version 2.16 on​​​‌ May:
    • new treebank for​ Bokota (with Marie Benzerrak​‌ and Natalia Cáceres Arandia)​​
    • new treebank for Ika​​ (with Jana Bajorat and​​​‌ Natalia Cáceres Arandia)
    • new‌ treebank for Nenets (with‌​‌ Nikolett Mus)
    • enhanced mSUD​​ treebank for Gbaya (with​​​‌ Paulette Roulon)
  • Version 2.17‌ on November:
    • enhanced UD‌​‌ treebank for Old Egyptian​​ (with Roberto Antonio Díaz​​​‌ Hernández)
    • new treebank for‌ Western Hausa (with Bernard‌​‌ Caron)

In April and​​ May 2025, Roberto Antonio​​​‌ Díaz Hernández undertook a‌ three-week visit to the‌​‌ LORIA, funded by an​​ Short-Term Scientific Mission of​​​‌ the UniDive COST action.‌ He collaborated with Bruno‌​‌ Guillaume to build a​​ Grew-match instance dedicated to​​​‌ the annotations of the‌ Ancient Egyptian hieroglyphic text‌​‌ from the pyramids: GrewPT​​.

In May 2025,​​​‌ Bruno Guillaume made a‌ two-week visit to the‌​‌ University of Bologna (funded​​ by an Short-Term Scientific​​​‌ Mission of the UniDive‌ COST action). He collaborated‌​‌ with Ludovica Pannitto on​​ a survey of the​​​‌ annotation of ppoken data‌ in the Universal Dependencies‌​‌ project.

In 35,​​ Nikolett Mus, in collaboration​​​‌ with Bruno Guillaume, Sylvain‌ Kahane and Daniel Zeman,‌​‌ presents the development of​​ the Tundra Nenets Universal​​​‌ Dependencies (UD) Treebank, the‌ first syntactically annotated resource‌​‌ for the Samoyedic branch​​ of the Uralic family.​​​‌ The treebank integrates spokenlanguage‌ data and adopts the‌​‌ morphologically enhanced Surface-Syntactic UD​​ (mSUD) framework to capture​​​‌ inflectional morphology and morphology-based‌ syntactic relations. It further‌​‌ incorporates Information Structure annotation.​​ The methodological workflow includes​​​‌ data selection, transcription conventions,‌ sentence and lexeme segmentation,‌​‌ annotation of spoken-language features,​​ lemmatization, treatment of morpheme​​​‌ status, part-of-speech and morphological‌ tagging, and syntactic annotation‌​‌ based on the functional​​ and distributional properties of​​​‌ syntactic elements. The paper‌ also outlines the principles‌​‌ guiding multilevel annotation and​​ justify the theoretical choices​​​‌ underlying the integration of‌ prosodic, morphological, and syntactic‌​‌ information.

The work on​​ the Gbaya treebank was​​​‌ publised in 39.‌ The paper presents the‌​‌ first treebank for Gbaya,​​ a language from the​​​‌ under-resourced Niger-Congo family. The‌ language has a rich‌​‌ system of tonal morphemes​​ and virtually no affixes.​​​‌ The dependency analysis is‌ based on a morpheme-based‌​‌ tokenisation and the treebank​​ is also distributed in​​​‌ a word-based Universal Dependencies‌ version. Several constructions are‌​‌ discussed in the paper:​​ genitive construction, clause coordination,​​​‌ sentence particles, adverbial and‌ relative clauses, serial verb‌​‌ constructions, reported speech, topicalization,​​ and focalization.

7.4.2 Citizen​​​‌ Science

Karën Fort worked‌ with colleagues from Sorbonne‌​‌ on guidelines to develop​​ citizen science projects. These​​​‌ guidelines were finally published‌ in a journal article‌​‌ 21 and at a​​ TALN workshop 48.​​​‌

7.4.3 Synthetic clinical texts‌ generation

In the context‌​‌ of the CODEINE ANR​​ project and more specifically​​​‌ of Nicolas Hiebel's PhD‌ thesis, Karën Fort worked‌​‌ with Aurélie Névéol (LISN-CNRS)​​ and Olivier Ferret (CEA)​​​‌ on the generation of‌ synthetic clinical texts.

The‌​‌ key idea of the​​ project is to use​​​‌ confidential corpora to automatically‌ generate anonymous synthetic texts‌​‌ capable of emulating real​​ documents from the perspective​​​‌ of their linguistic characteristics.‌ Nicolas Hiebel worked on‌​‌ a state of the​​ art of clinical texts​​​‌ generation that has been‌ published in a journal‌​‌ 16.

Another part​​​‌ of the project consists​ in using a Games​‌ With A Purpose to​​ validate and then annotate​​​‌ the synthesized clinical texts.​ This game, developed by​‌ Bertrand Remy, is called​​ HostoMytho (see Section 6.1.3​​​‌), and includes various​ mini-games for different annotation​‌ layers, such as negation,​​ error typing, or plausibility​​​‌ rating. The game is​ multi-platform, and therefore intended​‌ to be used on​​ the web (see: online​​​‌ HostoMytho), on Android​ and iOS.

7.5 Ethics​‌ and biases

Participants: Karën​​ Fort, Maxime Amblard​​​‌, Michel Musiol,​ Marc Anderson, Fanny​‌ Ducel, Clémentine Bleuze​​.

7.5.1 Ethics dissemination​​​‌ in scientific communities

Karën​ Fort and Fanny Ducel,​‌ together with other members​​ of the ACL Ethics​​​‌ committee and student volunteers​ to the committee, participated​‌ in the creation, organization,​​ and presentation of a​​​‌ tutorial on ethical challenges​ in NLP, which took​‌ place at the ACL​​ conference in July 2025​​​‌ 61 and attracted around​ 40 attendees.

Fanny Ducel,​‌ under the supervision of​​ Karën Fort and Aurélie​​​‌ Névéol, authored a long​ abstract on the role​‌ that applied linguistics could​​ play to aim at​​​‌ ethical NLP research, calling​ for more interdisciplinarity. This​‌ work was presented in​​ French at NÉALA, a​​​‌ national applied linguistics conference​ 51.

7.5.2 Evaluating​‌ stereotypes in autoregressive language​​ models

Fanny Ducel, under​​​‌ the supervision of Karën​ Fort and Aurélie Névéol,​‌ and in collaboration with​​ Nicolas Hiebel, measured gender​​​‌ stereotypical biases in LLM-generated​ clinical cases, in French.​‌ This work has been​​ presented and published at​​​‌ NAACL in English 31​, and its translated​‌ French version at TALN​​ 45.

Jeffrey André,​​​‌ under the supervision of​ Fanny Ducel, Karën Fort​‌ and Aurélie Névéol, designed​​ a web interface (​​​‌Masculead) that allows​ users to contribute to​‌ an interactive leaderboard, which​​ is based on the​​​‌ previously published framework for​ gender bias detection 70​‌. This interface, as​​ well as arguments on​​​‌ the notion and flaws​ of leaderboards for language​‌ models, were presented at​​ the "Ethic and Alignment​​​‌ of (large) Language Models"​ workshop, at TALN 44​‌.

7.5.3 Biases in​​ the biomedical domain

Karën​​​‌ Fort is PI of​ a 4 year ANR​‌ project (2023-2027), InExtenso (Intrinsic​​ and Extrinsic evaluation of​​​‌ biases in large language​ models), in collaboration with​‌ Rouen's hospital (CHU) and​​ LISN-CNRS. The project aims​​​‌ at better identifying stereotyped​ biases in LLMs in​‌ French and, when possible,​​ mitigate them. Within the​​​‌ framework of this project,​ Clémentine Bleuze supervised the​‌ internship of M2 student​​ Hawawou Oumarou-Tchapchet, along with​​​‌ partners from Rouen's hospital.​ This internship aimed at​‌ evaluating socio-demograpic biases of​​ a french LLM in​​​‌ a medical classification task.​

Under the supervision of​‌ Karën Fort and Aurélie​​ Névéol, and in collaboration​​​‌ with Vincent Martin, Clémentine​ Bleuze conducted a literature​‌ review on the subject​​ of LLM-assisted mental health​​​‌ prediction tasks, which has​ been submitted to the​‌ Journal of Medical Internet​​ Research (JMIR).

7.5.4 NLP​​​‌ for NLP and Ethics​

Clémentine Bleuze continued the​‌ work initiated during her​​ M2 internship in collaboration​​ with Fanny Ducel and​​​‌ under the supervision of‌ Karën Fort and Maxime‌​‌ Amblard. This work explored​​ the notion of scientific​​​‌ overclaiming (when researchers inadequately‌ interpret or present elements‌​‌ of their research) in​​ NLP papers. It also​​​‌ led to the definition‌ of a taxonomy of‌​‌ relevant research claims, the​​ constitution of a corpus​​​‌ of NLP claims originating‌ from ArXiv and ACL‌​‌ papers (a subpart of​​ which has been human-annotated),​​​‌ and the training of‌ BERT-based models to predict‌​‌ claim types. This research,​​ along with new results​​​‌ about typical claim patterns‌ used in research papers,‌​‌ was presented at TALN​​ 2025 as a poster​​​‌ 41.

Karën Fort‌ and Vincent Martin conducted‌​‌ two automatic lexical analysis​​ of the words censored​​​‌ by the Trump administration‌ in the scientific litterature,‌​‌ respectively related to mental​​ health 47 and sleep​​​‌ health 17. The‌ results of these studies,‌​‌ combining lexical networks and​​ temporal analyses, demonstrates the​​​‌ impossibility to produce scientific‌ data – and consequently‌​‌ to produce global health​​ policies based on these​​​‌ missing data – without‌ the vocabulary under censure‌​‌ in the Trump administration.​​

8 Bilateral contracts and​​​‌ grants with industry

8.1‌ Bilateral contracts with industry‌​‌

Maxime Amblard pursue a​​ collaboration with the French​​​‌ Company Namkin. The industry‌ faces numerous challenges that‌​‌ necessitate the evolution of​​ BtoB marketing tools, in​​​‌ order to develop a‌ valuable offer and provide‌​‌ an enhanced customer experience.​​ Namkin's BrainLab develops industrial​​​‌ marketing tools for digitalizing‌ customer relations, evolving business‌​‌ models, and exploiting business​​ and economic data for​​​‌ business development. One of‌ the key challenges of‌​‌ marketing intelligence is to​​ identify risks and opportunities​​​‌ so as to guide‌ marketing strategies. Among the‌​‌ sources of information useful​​ to detect risks and​​​‌ opportunities, Namkin has identified‌ Business Events, that is,‌​‌ “textually reported real-world occurrences,​​ actions, relations, and situations​​​‌ involving companies and firms”.‌ Un postdoctorant, Georgios Zervakis,‌​‌ chez Namkin et un​​ ingénieur, Sullivan Benard ont​​​‌ participé à la collaboration.‌

9 Partnerships and cooperations‌​‌

9.1 International research visitors​​

9.1.1 Visits of international​​​‌ scientists

Casey Kennington
  • Status‌
    Researcher
  • Institution of origin:‌​‌
    Boise State University
  • Country:​​
    USA
  • Dates:
    25-29 march​​​‌ 2025
  • Context of the‌ visit:
    invitation to give‌​‌ a seminar
  • Mobility program/type​​ of mobility:
    research stay​​​‌
Aarne Ranta
  • Status
    Professor‌
  • Institution of origin:
    University‌​‌ of Gothenburg
  • Country:
    Sweden​​
  • Dates:
    22-25 july 2025​​​‌
  • Context of the visit:‌
    Collaboration in the context‌​‌ of the Malinca project​​
  • Mobility program/type of mobility:​​​‌
    Invitation
Díaz Hernández Roberto‌ Antonio
  • Status
    Researcher
  • Institution‌​‌ of origin:
    Universidad de​​ Jaén
  • Country:
    Spain
  • Dates:​​​‌
    28 april - 16‌ may 2025
  • Context of‌​‌ the visit:
    development of​​ NLP tools for Old​​​‌ Egyptian
  • Mobility program/type of‌ mobility:
    Short Term Scientific‌​‌ Mission (STSM) funded by​​ UniDive

9.2 European initiatives​​​‌

9.2.1 Horizon Europe

MALINCA‌

Participants: Philippe de Groote‌​‌.

MALINCA project on​​ cordis.europa.eu

  • Title:
    Mathematicae Lingua​​​‌ Franca: Bridging the Linguistic‌ Gap Between the Mathematician‌​‌ and the Machine
  • Duration:​​
    From March 1, 2025​​​‌ to February 28, 2031‌
  • Partners:
    • Institut National de‌​‌ Recherche en Informatique et​​​‌ Automatique (Inria), France
    • Universidad​ Pontificia Comillas (Comillas), Spain​‌
    • Université Paris Cité (UPCité),​​ France
    • Centre National de​​​‌ la Recherche Scientifique (CNRS),​ France
  • Inria contact:
    Hugo​‌ Herbelin
  • Summary:
    In the​​ recent years, proof assistants​​​‌ have shown their astounding​ ability to tackle the​‌ complete formalisation of large​​ pieces of mathematics, with​​​‌ the celebrated certifications of​ the Feit-Thompson theorem, of​‌ the Kepler conjecture, and​​ more recently, the resolution​​​‌ of Scholze liquid tensor​ challenge. We believe that​‌ the time is ripe​​ to demonstrate that they​​​‌ can tackle mathematics in​ the flexible and semi-formal​‌ way it is created​​ and exchanged by the​​​‌ mathematicians. To that purpose,​ we aim to develop​‌ proof assistant technologies of​​ an entirely new nature,​​​‌ including a formal language​ and a foundational approach​‌ to mathematical meaning, with​​ the versatility necessary to​​​‌ represent the dynamic linguistic​ structures to be found​‌ in the daily practice​​ of mathematics. The result​​​‌ will be a linguistic​ front-end that will allow​‌ mathematicians, and scientists in​​ general, to express in​​​‌ proof assistants their proofs​ and computations the semi-formal​‌ way they think of​​ them. Three research tracks​​​‌ stand out: the mathematical​ and linguistic foundations; formalisation​‌ of real-world vernacular mathematics​​ into a high-level language​​​‌ of representation (Godement challenge);​ new techniques and software​‌ tools, based on natural​​ language processing, to automate​​​‌ the formalisation process. The​ translation in the machine​‌ of semi-formal mathematics needs​​ to go beyond the​​​‌ traditional view that reduces​ reasoning to logic, and​‌ requires to understand the​​ dynamics of the discursive​​​‌ linguistic process which underlines​ mathematics. Building on advances​‌ of linguistics, mathematical logic,​​ programming language semantics and​​​‌ machine learning, we will​ contribute significantly to the​‌ rise of a new​​ generation of proof assistants,​​​‌ integrating at their heart​ a linguistic layer and​‌ automated guidance tools for​​ mathematical proofs, theorems and​​​‌ definitions. The resulting high-level​ manipulation of concepts will​‌ lead to novel research​​ outcomes supporting the daily​​​‌ activity of mathematical scientists.​

9.2.2 Other european programs/initiatives​‌

9.3​​ National initiatives

9.3.1 ANR​​​‌ Project: InExtenso

Participants: Karën​ Fort, Maxime Amblard​‌, Michel Musiol,​​ Fanny Ducel.

  • Title:​​​‌
    Intrinsic and Extrinsic evaluation​ of biases in large​‌ language models
  • Duration:
    10​​ 2023–09 2027
  • Coordinator:
    Karën​​​‌ Fort
  • Partners:
    CHU Rouen,​ LISN, LORIA
  • Participants:
    Maxime​‌ Amblard, Fanny Ducel, Karën​​ Fort (coordinator), Michel Musiol,​​​‌ Miguel Couceiro
  • Abstract:
    Large​ Language Models (LLM) are​‌ the Swiss Army knife​​ of today’s Natural Language​​​‌ Processing (NLP). They often​ outperform the state-of-the-art on​‌ benchmarks commonly used in​​ the field for tasks​​​‌ such as part-of-speech tagging,​ text classification and named-entity​‌ recognition, thus paving the​​ way to a myriad​​​‌ of end-user applications. However,​ it has been shown​‌ that LLM exhibit major​​ ethical issues including significant​​​‌ environmental impact, mirroring and​ amplification of stereotyped biases,​‌ which in turn have​​ a disproportionate impact on​​ historically disadvantaged social groups.​​​‌ It is urgent to‌ address the social impact‌​‌ of NLP as the​​ applications we develop, such​​​‌ as chatGPT, are now‌ directly made available to‌​‌ end users. The detection​​ and mitigation of biases​​​‌ have therefore become an‌ active area of research‌​‌ in the past few​​ years, focusing mainly on​​​‌ Masked Language Models (MLM)‌ such as BERT in‌​‌ English and the North​​ American social context. Several​​​‌ sources of bias were‌ identified in the NLP‌​‌ pipeline. However the interconnection​​ between sources and overall​​​‌ impact of each source‌ on downstream applications remains‌​‌ unclear. In this project,​​ we want to observe​​​‌ the entire pipeline, from‌ the intrinsic point of‌​‌ view (within the model​​ itself), to the pre-training​​​‌ task point of view‌ (in the case of‌​‌ autoregressive LLM, text generation),​​ on to some real-world​​​‌ downstream applications. We chose‌ to focus on two‌​‌ types of medical applications:​​ mental illness diagnosis help​​​‌ and information extraction from‌ clinical records for public‌​‌ health purposes such as​​ patient enrollment into clinical​​​‌ trials. The project will‌ provide corpora and methods‌​‌ for a global evaluation​​ of bias in LLM​​​‌ in French as well‌ as studies to further‌​‌ the understanding of biases​​ in clinical NLP pipelines​​​‌ and the environmental impact‌ of the integration of‌​‌ these models in digital​​ health.

9.3.2 ANR Project:​​​‌ CoDeinE

Participants: Karën Fort‌, Bruno Guillaume,‌​‌ Bertrand Remy.

  • Title:​​
    artificial text COrpus DEsIgNed​​​‌ Ethically automatic synthesis of‌ clinical documents
  • Duration:
    03‌​‌ 2021–02 2026
  • Coordinator:
    Aurélie​​ Névéol (Limsi)
  • Partners:
    CRC,​​​‌ CEA List, LISN, LORIA‌
  • Participants:
    Bruno Guillaume, Karën‌​‌ Fort (local coordinator), Bertrand​​ Remy
  • Abstract:
    Machine learning​​​‌ methods have become prevalent‌ in language technologies. They‌​‌ rely on annotated corpora​​ to train models and​​​‌ evaluate algorithms. The CoDeinE‌ project proposes to address‌​‌ the lack of shareable​​ corpora in sensitive domains​​​‌ such as health or‌ banking. The key idea‌​‌ of the project is​​ to use confidential corpora​​​‌ to automatically generate synthetic‌ texts that mimic the‌​‌ linguistic properties of real​​ documents while preserving confidentiality.​​​‌ We will use clinical‌ documents in electronic patient‌​‌ records as a case​​ study. Furthermore, the project​​​‌ will rely on Games‌ With A Purpose and‌​‌ crowd sourcing to validate​​ and annotate the synthesized​​​‌ texts.

9.3.3 ANR Project:‌ Autogramm

Participants: Bruno Guillaume‌​‌, Karën Fort,​​ Khensa Amani Daoudi.​​​‌

  • Title:
    Induction of descriptive‌ grammar from annotated corpora‌​‌
  • Duration:
    01 2022–12 2025​​
  • Coordinator:
    Sylvain Kahane (Université​​​‌ Paris Nanterre)
  • Partners:
    MoDyCo,‌ LACITO, LISN, Inria Nancy‌​‌ – Grand Est
  • Participants:​​
    Bruno Guillaume (local coordinator),​​​‌ Karën Fort
  • Abstract:
    The‌ goal of this project‌​‌ is to automate, as​​ far as possible, the​​​‌ extraction of descriptive grammars‌ and grammatical descriptions from‌​‌ annotated corpora for linguistic​​ and typological studies. The​​​‌ project also promotes the‌ development of treebanks for‌​‌ under-endowed languages, in order​​ to extract quantitative descriptive​​​‌ grammars for these languages.‌ The project uses the‌​‌ annotation scheme SUD (Surface-syntactic​​ Universal Dependencies), the​​​‌ query tool Grew-match and‌ the annotation tool ArboratorGrew‌​‌.

9.3.4 ANR Project:​​​‌ CODIM

Participants: Maxime Amblard​, Jacques Jayez.​‌

  • Title:
    Compositionality and discourse​​ markers
  • Duration:
    01 2023–12​​​‌ 2026
  • Coordinator:
    Mathilde Dargnat​ (Université de Lorraine and​‌ ATILF)
  • Partners:
    ATILF, LLF,​​ LORIA
  • Participants:
    Maxime Amblard,​​​‌ Jacques Jayez
  • Abstract:
    The​ CODIM project focuses on​‌ the two main linguistic​​ resources for organizing monologues​​​‌ or conversations in human​ languages : D(iscourse) M(arkers)(​‌therefore/donc, well/ben,​​ bon etc. in English/French)​​​‌ and prosody (in particular,​ intonation). It will evaluate​‌ their status with respect​​ to two major views​​​‌ on communication: compositionality (the​ possibility of combining meaningful​‌ expressions into more complex​​ meaningful expressions) and pattern​​​‌ or construction-based approaches (the​ idea that language users​‌ exploit partly `frozen’ strings​​ of words). We will​​​‌ compare the semantic and​ prosodic properties of simple​‌ and complex French DM​​ (e.g. ah + bon​​​‌) found in corpora​ for written and spoken​‌ French, using a variety​​ of technical tools for​​​‌ DM identification (category-driven text​ mining), clustering (statistics and​‌ Machine Learning) and research​​ in prosody (duration and​​​‌ intensity measures, contour representation).​ The project fosters a​‌ number of collaborations between​​ linguists and computer scientists.​​​‌

9.3.5 PEPR Project Digital​ Health: Autonom Health

Participants:​‌ Maxime Amblard, Michel​​ Musioil, Vincent Martin​​​‌.

  • Title:
    Autonom Health​
  • Duration:
    06 2023–12 2030​‌
  • Coordinator:
    Pierre Philip (Université​​ de Bordeaux)
  • Partners:
    LABRI,​​​‌ Sanpsy, LORIA, ISIR, CES,​ LIRIS
  • Participants:
    Maxime Amblard,​‌ Michel Musiol, Vincent Martin​​
  • Abstract:
    Western populations face​​​‌ an increase of longevity​ which mechanically increases the​‌ number of chronic disease​​ patients to manage. Current​​​‌ healthcare strategies will not​ allow to maintain a​‌ high level of care​​ with a controlled cost​​​‌ in the future and​ E health can optimize​‌ the management and costs​​ of our health care​​​‌ systems. Healthy behaviors contribute​ to prevention and optimization​‌ of chronic diseases management,​​ but their implementation is​​​‌ still a major challenge.​ Digital technologies could help​‌ their implementation through numeric​​ behavioral medicine programs to​​​‌ be developed in complement​ (and not substitution) to​‌ the existing care in​​ order to focus human​​​‌ interventions on the most​ severe cases demanding medical​‌ interventions.

10 Dissemination

10.1​​ Promoting scientific activities

10.1.1​​​‌ Scientific events: organisation

  • Vincent​ Martin has been moderator​‌ for a session from​​ the Société Médico-Psychologique entitled​​​‌ “La psychiatrie à ses​ frontières”, 09 2025, Bordeaux,​‌ France.
  • Vincent Tourneur has​​ organized the Loria PhD​​​‌ seminar (8 presentations during​ the year).
General chair,​‌ scientific chair

10.1.2​‌ Scientific events: selection

Chair​​ of conference program committees​​​‌
Member of the​ conference program committees
  • Vincent​‌ Martin: member of the​​ conference program committees for​​ the Journée d’étude sur​​​‌ les technologies linguistiques pour‌ les langues peu dotées‌​‌ (AFIA/AFCP), 12 2025, Paris,​​ France.
Reviewer

10.1.3 Journal

Member​​ of the editorial boards​​​‌
Reviewer -​​​‌ reviewing activities

10.1.4 Invited talks

Philippe‌ de Groote gave an‌​‌ invited talk at the​​ Conference on Mathematical and​​​‌ Computational Linguistics for Proofs‌26.

Karën Fort‌​‌ was invited to give​​ a keynote speech at​​​‌ the Italian NLP conference‌ CLiC-it in Sept. 2025‌​‌ on the subject of​​ "Large Language Models: the​​​‌ challenge of evaluation" 22‌.

Karën Fort was‌​‌ invited to give a​​ keynote speech at the​​​‌ Association française de linguistique‌ appliquée (AFLA) conference: Naturel‌​‌ et Artificiel en Linguistique​​ Appliquée : une époque​​​‌ de paradoxes – Neala‌25, in Nancy,‌​‌ in July 2025, on​​ the subject of "Les​​​‌ grands modèles de langue‌ : des outils situés".‌​‌

Karën Fort was invited​​ to give a speech​​​‌ at the Conseil Scientifique‌ of the Institut CNRS‌​‌ in Computer Science, in​​ Paris, in March 2025,​​​‌ on the subject of‌ "Les grands modèles de‌​‌ langue : les défis​​ de l'évaluation." 23.​​​‌

Fanny Ducel was invited‌ to give a presentation‌​‌ about her research on​​ stereotypical biases in LLMs​​​‌ to the work group‌ "Intelligence Artificielle Soutenable, Intelligible‌​‌ et Vérifiable" of Université​​ Paris-Saclay.

Vincent Martin was​​​‌ invited to give a‌ talk at the French‌​‌ National Sleep Medicine Congress:​​ `Enjeux des modélisations pour​​​‌ aborder la sémiologie du‌ sommeil', 11 2025, Congrès‌​‌ du Sommeil, Strasbourg

10.1.5​​ Leadership within the scientific​​​‌ community

  • Maxime Amblard is‌ PI of INSIGHT project‌​‌ (Initiative d'Excellence Lorraine -​​ PIA).
  • Vincent Martin is​​​‌ member from the steering‌ comitee of the Collège‌​‌ Technologies du Langage Humain​​ (TLH) from the Association​​​‌ française pour l’Intelligence Artificielle‌ (AfIA) since 09 2024.‌​‌
  • Karën Fort is PI​​ of the GDR LIFT​​​‌ 2.

10.1.6 Scientific expertise‌

10.1.7​ Research administration

  • Maxime Amblard:​‌
    • Member of CNU 27​​ (Computer Science)
    • Head of​​​‌ the master in Natural​ Language Processing
  • Karën Fort:​‌
    • Elected member of the​​ Conseil de Pôle AM2I​​​‌
    • Chair of the Ethics​ committee of the ENACT​‌ AI cluster
    • Member of​​ the Steering Committee of​​​‌ the INSIGHT project
  • Sylvain​ Pogodalla:
    • Elected member of​‌ the comité de centre​​ Inria Nancy – Grand​​​‌ Est.
    • In charge of​ the local commission IES​‌ (information et édition​​ scientifique) of the​​​‌ Inria Nancy – Grand​ Est and LORIA.
    • Member​‌ of the national commission​​ IES of Inria.

10.2​​​‌ Teaching - Supervision -​ Juries - Educational and​‌ pedagogical outreach

10.2.1 Teaching​​

  • Licence:
    • Maxime Amblard, AI​​​‌ Introduction, 14h, L1, Université​ de Lorraine, France.
    • Maxime​‌ Amblard, Ethical aspects of​​ NLP, 10h, L3, Université​​​‌ de Lorraine, France.
    • Maxime​ Amblard, Human in the​‌ loop, 10h, L3, Université​​ de Lorraine, France.
    • Karën​​​‌ Fort, De l'écrit à​ l'information, 20h, L1 MIASHS,IDMC,​‌ Université de Lorraine, France.​​
    • Karën Fort, Outils pour​​​‌ l'analyse linguistique, 25h, L3​ MIASHS,IDMC, Université de Lorraine,​‌ France.
    • Hee-Soo Choi and​​ Fanny Ducel, De l'écrit​​​‌ à l'information, 5h, L1​ MIASHS,IDMC, Université de Lorraine,​‌ France.
    • Hee-Soo Choi, Langages​​ de Script, 20h, L1​​​‌ MIASHS, IDMC, Université de​ Lorraine, France.
    • Hee-Soo Choi,​‌ Initiation aux Bases de​​ Données, 24h, L1 MIASHS,​​​‌ IDMC, Université de Lorraine,​ France.
    • Hee-Soo Choi, Bases​‌ de Données Avancées, 28h,​​ L2 MIASHS, IDMC, Université​​​‌ de Lorraine, France.
    • Hee-Soo​ Choi, Suivi de stages,​‌ 4h, L3, IDMC, Université​​ de Lorraine, France.
    • Hee-Soo​​​‌ Choi, Algorithmique et Programmation​ Impérative, 30h, L1 Informatique,​‌ FST, Université de Lorraine,​​ France.
    • Hee-Soo Choi, Algorithmique​​​‌ et Programmation, 20h, L1​ Informatique, FST, Université de​‌ Lorraine, France.
    • Hee-Soo Choi,​​ Programmation, 36,7h, L1 Mathématiques,​​​‌ FST, Université de Lorraine,​ France.
    • Hee-Soo Choi, Algorithmique​‌ et Programmation, 36,4h, L1​​ SPI, FST, Université de​​​‌ Lorraine, France.
    • Vincent Tourneur,​ Administration UNIX, 24h, L2,​‌ IUT Charlemagne, Université de​​ Lorraine, France.
    • Vincent Tourneur,​​​‌ Compilation, 40h, L3, IUT​ Charlemagne, Université de Lorraine,​‌ France.
    • Marie Cousin, Recherche​​ Opérationnelle, 4h, L3, École​​​‌ des Mines de Nancy,​ Université de Lorraine, France.​‌
    • Clémentine Bleuze, Ingénierie de​​ la langue, 15h, L3​​​‌ MIASHS option TAL, IDMC,​ Université de Lorraine, France.​‌
    • Maxime Amblard and Clémentine​​ Bleuze, Découverte du traitement​​​‌ des données langagières, 30h,​ L3 MIASHS option TAL,​‌ IDMC, Université de Lorraine,​​ France.
    • Clémentine Bleuze, Découverte​​​‌ du traitement des données​ langagières, 15h, L2 MIASHS​‌ option TAL, IDMC, Université​​ de Lorraine, France.
    • Iglika​​​‌ Zlatkova Nikolova-Stoupak, Découverte du​ traitement des données langagières,​‌ 15h, L2 MIASHS option​​ TAL, IDMC, Université de​​​‌ Lorraine, France.
  • Master:
    • Maxime​ Amblard and Amandine Decker,​‌ Methods for NLP, 20h,​​ M1 NLP (IDMC), Université​​​‌ de Lorraine, France.
    • Maxime​ Amblard, NLP project, 30h,​‌ M1 NLP (IDMC), Université​​ de Lorraine, France.
    • Maxime​​​‌ Amblard and Amandine Decker,​ Dialogue ChatBot and Question​‌ Answering, 28h, M2 NLP​​ (IDMC), Université de Lorraine,​​​‌ France.
    • Karën Fort, Written​ Corpora (English), 37.5h, M1​‌ NLP (IDMC), Université de​​ Lorraine, France.
    • Clémentine Bleuze,​​ Written corpora (English), 16h,​​​‌ Master M1 NLP (IDMC),‌ Université de Lorraine, France.‌​‌
    • Karën Fort, Software Projects​​ (English), 25h, M2 NLP​​​‌ (IDMC), Université de Lorraine,‌ France.
    • Karën Fort, Python‌​‌ Programming (English), 37.5h, M1​​ NLP (IDMC), Université de​​​‌ Lorraine, France.
    • Philippe de‌ Groote, Formal Logic, 22h,‌​‌ M1 NLP (IDMC), Université​​ de Lorraine, France.
    • Philippe​​​‌ de Groote, Formal languages,‌ 22h, M1 NLP (IDMC),‌​‌ Université de Lorraine, France.​​
    • Philippe de Groote, Semantics,​​​‌ 22h, M2 NLP (IDMC),‌ Université de Lorraine, France.‌​‌
    • Karën Fort, Clémentine Bleuze,​​ Ethics and NLP (English),​​​‌ 19h, M1 NLP (IDMC),‌ Université de Lorraine, France.‌​‌
    • Karën Fort, Ethics (English),​​ 25h, M2 NLP (IDMC),​​​‌ Université de Lorraine, France.‌
    • Karën Fort, Génie logiciel,‌​‌ 56.25h, M1 MIAGE,IDMC, Université​​ de Lorraine, France.
    • Bruno​​​‌ Guillaume, Lexical Resources (English),‌ 15h, M2 NLP (IDMC),‌​‌ Université de Lorraine, France.​​
    • Vincent Martin, Speech processing​​​‌ (English), 14h, M2 NLP‌ (IDMC), Université de Lorraine,‌​‌ France
    • Vincent Martin, Signal​​ processing (English), 12h, M2​​​‌ NLP (IDMC), Université de‌ Lorraine, France
    • Vincent Martin,‌​‌ NLP projects (English), 3h,​​ M1 NLP (IDMC), Université​​​‌ de Lorraine, France
    • Vincent‌ Martin, Critical analysis of‌​‌ artificial intelligence for health​​ (English), 6h, Master 2​​​‌ Health Engineering, Université Grenoble‌ Alpes, France
    • Vincent Martin,‌​‌ Back to the big​​ wide world: how to​​​‌ integrate digital tools into‌ clinical practice? (English), 6h,‌​‌ Master 2 Health Engineering,​​ Université Grenoble Alpes, France​​​‌
    • Vincent Martin, Quelques éléments‌ de STS, 2h, Licence-Master‌​‌ Science de la Santé,​​ Université de Bordeaux, France​​​‌
    • Sylvain Pogodalla, Semantics, 10h,‌ M1 NLP (IDMC), Université‌​‌ de Lorraine, France
    • Sylvain​​ Pogodalla and Amandine Decker,​​​‌ Syntactic Models, 20h, M2‌ NLP (IDMC), Université de‌​‌ Lorraine, France
    • Fanny Ducel,​​ Software Projects (English), 10h,​​​‌ M2 NLP (IDMC), Université‌ de Lorraine, France.
    • Fanny‌​‌ Ducel, Python Programming (English),​​ 14h, M1 NLP (IDMC),​​​‌ Université de Lorraine, France‌
    • Fanny Ducel, Project Management‌​‌ Tools (English), 8h, M1​​ NLP (IDMC), Université de​​​‌ Lorraine, France.
    • Clémentine Bleuze,‌ NLP for low-resource language‌​‌ (English), 8h, Master M2​​ NLP (IDMC), Université de​​​‌ Lorraine, France.
    • Maxime Amblard‌ and Amandine Decker, Introduction‌​‌ to NLP, M1 NLP​​ (IDMC), Université de Lorraine,​​​‌ France.
    • Maxime Amblard and‌ Amandine Decker, Dialogue Engineering,‌​‌ 14h, M2 NLP (IDMC)​​ LI, Université de Lorraine,​​​‌ France.
    • Maxime Amblard and‌ Amandine Decker, Discourse, 14h,‌​‌ M2 NLP (IDMC), Université​​ de Lorraine, France.
    • Amandine​​​‌ Decker and Maxime Amblard,‌ Dialogue Engineering, 14h, M2‌​‌ NLP (IDMC), Université de​​ Lorraine, France.
    • Marie Cousin,​​​‌ Foundation of Computing, 14h,‌ M1, École des Mines‌​‌ de Nancy, Université de​​ Lorraine, France.
  • Doctorate:
    • Maxime​​​‌ Amblard Introduction to AI,‌ Doctoral School SLTC, Université‌​‌ de Lorraine, 2 x​​ 7h
  • Tutorials:
    • Karën Fort,​​​‌ Fanny Ducel, Navigating Ethical‌ Challenges in NLP: Hands-on‌​‌ strategies for students and​​ researchers 61
  • International Summer​​​‌ School:

10.2.2 Supervision

PhD‌ defended in 2025
  • Maxime‌​‌ Guillaume, Structures de traits​​ pour les Grammaires Catégorielles​​​‌ Abstraites, since 07‌ 2021. Supervision: Philippe de‌​‌ Groote and Raphaël Salmon​​ (Yseop).
  • Santiago Herrera, Extraction​​​‌ de grammaires descriptives à‌ partir de corpus annotés‌​‌ en syntaxe, since​​ 09 2022. Supervision: Sylvain​​​‌ Kahane (MoDyCo, Université Paris‌ Nanterre) and Bruno Guillaume.‌​‌
  • Nicolas Hiebel, Création éthique​​​‌ de données textuelles artificielles​ : application au domaine​‌ biomédical, since 10​​ 2021. Supervision: Aurélie Névéol​​​‌ (LISN-CNRS), Karën Fort and​ Olivier Ferret (CEA).
  • Siyana​‌ Pavlova, Tools and Methods​​ for Semantic Annotation,​​​‌ since 11 2020. Supervision:​ Maxime Amblard and Bruno​‌ Guillaume.
PhD in progress​​
  • Vincent-Thomas Barrouillet, Le discours​​​‌ pathologique du sujet schizophrène,​ caractérisation psycholinguistique et computationnelle​‌ des déviations décisives à​​ la logicité dialogique en​​​‌ étude de corpus,​ since 10 2019. Supervision:​‌ Michel Musiol and Maxime​​ Amblard.
  • Clémentine Bleuze, Perception​​​‌ et évaluation des biais​ dans les applications des​‌ LLM au domaine biomédical​​, since 10 2024.​​​‌ Supervision: Karën Fort and​ Aurélie Névéol (LISN-CNRS).
  • Colleen​‌ Beaumard, Biomarqueurs vocaux collectés​​ par des agents conversationnels​​​‌ pour l'aide au diagnostic​ et le suivi des​‌ troubles du sommeil et​​ des troubles mentaux,​​​‌ since 10 2022. Supervision:​ Jean-Luc Rouas (Université de​‌ Bordeaux, LaBRI), Pierre Philip​​ (Université de Bordeaux, SANPSY)​​​‌ and Vincent Martin.
  • Elio​ Stasica, Diagnostic différentiel d'infarctus​‌ à partir de la​​ parole, since 9​​​‌ 2025. Supervision: Emmanuel Vincent​ (Multispeech), Romain Serizel (Multispeech),​‌ and Vincent Martin.
  • Hee-Soo​​ Choi, Lier des ressources​​​‌ lexicales du français en​ vue d'une interopérabilité entre​‌ niveaux linguistiques, since​​ 10 2021. Supervision: Karën​​​‌ Fort and Mathieu Constant.​
  • Marie Cousin, Modélisation de​‌ paraphrase dans les grammaires​​ catégorielles abstraites, since​​​‌ 10 2022. Supervision: Philippe​ de Groote and Sylvain​‌ Pogodalla.
  • Amandine Decker, Modelling​​ Topic-level Interaction in Pathological​​​‌ Conversations, since 10​ 2022. Supervision: Maxime Amblard​‌ and Ellen Breitholtz (University​​ of Gothenburg, Sweden).
  • Fanny​​​‌ Ducel, Evaluating stereotyped biases​ in auto-regressive language models​‌, since 10 2023.​​ Supervision: Karën Fort and​​​‌ Aurélie Névéol (LISN-CNRS).
  • Amandine​ Lecomte, Analyse longitudinale de​‌ prise en charge psychothérapeutique​​ de patients psychiatriques et​​​‌ de patients atteints de​ maladies neurodégénératives : informatisation​‌ et modélisation dialogique des​​ indices comportementaux associés à​​​‌ l’efficacité (vs échec) des​ stratégies de prise en​‌ charge tentées par les​​ thérapeutes, since 10​​​‌ 2019. Supervision: Michel Musiol​ and Alexandra König.
  • Valentin​‌ Richard, Aspects dynamiques et​​ présuppositionnels des questions,​​​‌ since 09 2021. Supervision:​ Philippe de Groote, Floris​‌ Roelofsen and Reinhard Muskens​​ (Universiteit van Amsterdam, ILLC).​​​‌
  • Vincent Tourneur, Algorithmes d’analyse​ syntaxique pour les grammaires​‌ catégorielles abstraites, since​​ 10 2024. Supervision: Philippe​​​‌ de Groote.

10.2.3 Other​ supervisions

Karën Fort and​‌ Fanny Ducel supervised six​​ M1 students during their​​​‌ 2-month internship at LORIA.​ Four of these students​‌ worked on the stereotypes​​ present in benchmarks used​​​‌ for LLMs, while the​ two others developed a​‌ method to measure racist​​ biases in reaction to​​​‌ the presence of code-switching​ in LLM prompts. Karën​‌ Fort and Fanny Ducel​​ also supervised two L3​​​‌ interns, one of whom​ worked on the code-switching​‌ project, and the second​​ one developed an interface​​​‌ based on previous work​ on biases by Karën​‌ Fort and Fanny Ducel.​​ This work was published​​​‌ in a TALN workshop​ 44.

10.2.4 Juries​‌

  • Karën Fort, Maxime Amblard,​​ Bruno Guillaume: NLP Master​​​‌ 1 and 2 juries​ (IDMC)
  • Maxime Amblard was​‌ reviewer, president and member​​ of the PhD jury​​ of Zacchary Sadeddine, Meaning​​​‌ Representation Frameworks and Reasoning‌ in the Era of‌​‌ LLMs, under the supervision​​ of Fabian Suchanek (Telecom​​​‌ Paris), Institut polytechnique de‌ Paris, 10 octobre 2025‌​‌
  • Maxime Amblard was reviewer​​ od Jarom´ır Salamon, Influencing​​​‌ text generation by biological‌ signal Roman Mouˇcek, University‌​‌ of West Bohemia, aout​​ 2025.
  • Maxime Amblard was​​​‌ president and member of‌ the PhD jury of‌​‌ Aman Sinha (président), Evaluation​​ of Medical Language Models,under​​​‌ the supervision of Marianne‌ Clausel, Mathieu Constant, Université‌​‌ de Lorraine, 12 décembre​​ 2025
  • Maxime Amblard was​​​‌ president and member of‌ the PhD jury of‌​‌ William eduardo Soto martinez​​ (président), Multilingual Graph-to-Text Generation​​​‌ and Evaluation, under the‌ supervision of Claire Gardent‌​‌ (DR CNRS), Yannick Parmentier​​ (Université de Lorraine), 07​​​‌ octobre 2025

10.2.5 Educational‌ and pedagogical outreach

  • Marie‌​‌ Cousin and Amandine Decker:​​ animation of a MATh.en.JEANS​​​‌ workshop within Edmond de‌ Goncourt secondary school in‌​‌ Pulnoy.
  • Karën Fort presented​​ her work on ethics​​​‌ of AI to CPGE‌ students from Lycée Poincaré,‌​‌ LORIA, Nancy, Ethics of​​ AI from an NLP​​​‌ point of view :‌ the good, the bad‌​‌ and the evaluation.​​ January 2025.

10.3 Popularization​​​‌

10.3.1 Productions (articles, videos,‌ podcasts, serious games, ...)‌​‌

10.3.2 Participation in​​​‌ Live events

  • Maxime Amblard‌ participate in the event‌​‌ le Procès du Robot,​​ at Lycée Loritz, 2025-02-28​​​‌
  • Fanny Ducel gave a‌ presentation about her projects‌​‌ on stereotypical biases in​​ LLMs at the Université​​​‌ Champagne-Ardenne, in the context‌ of its AI Week‌​‌ and of "Fête de​​ la Science".
  • Amandine Decker:​​​‌ 2025-02-07, participation in FIRST‌ (Femmes Ingénieures, Réussir en‌​‌ Sciences et Technologies), présentation​​ de la recherche à​​​‌ des élèves (filles) de‌ seconde (Lycée Fabert, Metz,‌​‌ France),
  • Hee-Soo Choi and​​ Fanny Ducel: 2025-02-27, participation​​​‌ to the Elles Bougent‌ : Filles - Maths‌​‌ et Science day to​​ promote scientific studies and​​​‌ careers to 150 female‌ students from 40 middle‌​‌ schools.
  • Marie Cousin: 2025-02-27,​​ participation to the Grand-Est​​​‌ edition of "Sciences, un‌ métier de femmes", presentation‌​‌ of what research in​​ computer science is, interactions​​​‌ with high school female‌ students (FST, Nancy).
  • Marie‌​‌ Cousin: 2025-01-31, presentation to​​ high school students in​​​‌ the context of the‌ "Chiche !" initiative (Lycée‌​‌ des métiers du tertiaire​​ Jean-Victor Poncelet Saint-Avold, France),​​​‌
  • Marie Cousin: 2025-09-20, participation‌ in “Journées européennes du‌​‌ Matrimoine” (Féru des Sciences,​​ Nancy, France).

11 Scientific​​​‌ production

11.1 Major publications‌

  • 1 inproceedingsM.Mohamed‌​‌ Abdalla, J. P.​​Jan Philip Wahle,​​​‌ T.Terry Ruas,‌ A.Aurélie Névéol,‌​‌ F.Fanny Ducel,​​ S. M.Saif M.​​​‌ Mohammad and K.Karën‌ Fort. The Elephant‌​‌ in the Room: Analyzing​​ the Presence of Big​​​‌ Tech in Natural Language‌ Processing Research.Proceedings‌​‌ of the 61st Annual​​​‌ Meeting of the Association​ for Computational LinguisticsVolume 1:​‌ Long Papers61st Annual​​ Meeting of the Association​​​‌ for Computational Linguistics1​Toronto, CanadaAssociation for​‌ Computational Linguitics2023,​​ 13141-13160HAL
  • 2 article​​​‌M.Marc Anderson and​ K.Karën Fort.​‌ Human Where? A New​​ Scale Defining Human Involvement​​​‌ in Technology Communities from​ an Ethical Standpoint.​‌International Review of Information​​ EthicsAugust 2022HAL​​​‌
  • 3 articleG.Guillaume​ Bonfante and B.Bruno​‌ Guillaume. Non-size increasing​​ Graph Rewriting for Natural​​​‌ Language Processing.Mathematical​ Structures in Computer Science​‌28082018,​​ 1451--1484HALDOIback​​​‌ to text
  • 4 book​G.Guillaume Bonfante,​‌ B.Bruno Guillaume and​​ G.Guy Perrier.​​​‌ Application of Graph Rewriting​ to Natural Language Processing​‌.1Logic, Linguistics​​ and Computer Science Set​​​‌ISTE Wiley2018,​ 272HALback to​‌ text
  • 5 articleF.​​Fanny Ducel, A.​​​‌Aurélie Névéol and K.​Karën Fort. "You'll​‌ be a nurse, my​​ son!" Automatically Assessing Gender​​​‌ Biases in Autoregressive Language​ Models in French and​‌ Italian.Language Resources​​ and EvaluationOctober 2024​​​‌HALDOI
  • 6 article​P.Philippe de Groote​‌ and M.Makoto Kanazawa​​. A Note on​​​‌ Intensionalization.Journal of​ Logic, Language and Information​‌2222013,​​ 173-194HALDOI
  • 7​​​‌ inproceedingsA.Aurélie Névéol​, Y.Yoann Dupont​‌, J.Julien Bezançon​​ and K.Karën Fort​​​‌. French CrowS-Pairs: Extending​ a challenge dataset for​‌ measuring social bias in​​ masked language models to​​​‌ a language other than​ English.ACL 2022​‌ - 60th Annual Meeting​​ of the Association for​​​‌ Computational LinguisticsDublin, Ireland​May 2022HAL
  • 8​‌ articleS.Sylvain Pogodalla​​. A syntax-semantics interface​​​‌ for Tree-Adjoining Grammars through​ Abstract Categorial Grammars.​‌Journal of Language Modelling​​532017,​​​‌ 527--605HALDOIback​ to text
  • 9 article​‌R.Robert Reinecke,​​ T. A.Tatjana A​​​‌ Nazir, S.Sarah​ Carvallo and J.Jacques​‌ Jayez. Factives at​​ hand: When presupposition mode​​​‌ affects motor response.​Journal of Experimental Psychology​‌2022HALDOI

11.2​​ Publications of the year​​​‌

International journals

Invited conferences

  • 22 inproceedings​​K.Karën Fort.​​​‌ Large Language Models: the​ challenge of evaluation.​‌CLiC-it 2025 - Eleventh​​ Conference on Computational Linguistics​​​‌Cagliari, ItalySeptember 2025​HALback to text​‌
  • 23 inproceedingsK.Karën​​ Fort. Large Language​​​‌ Models:the challenge of evaluation​.2025 - Séminaire​‌ IA Génératives: Promesses et​​ DéfisParis, FranceMarch​​​‌ 2025HALback to​ text
  • 24 inproceedingsK.​‌Karën Fort. Les​​ enjeux éthiques de l’IA​​​‌ vus depuis le traitement​ automatique des langues.​‌Journée de lancement du​​ projet InsightNancy, France​​​‌December 2025HAL
  • 25​ inproceedingsK.Karën Fort​‌. Les grands modèles​​ de langue : des​​​‌ outils situés.NéALA​ 2025 - Naturel et​‌ Artificiel en Linguistique Appliquée​​ : une époque de​​​‌ paradoxesNancy, FranceJuly​ 2025HALback to​‌ text
  • 26 inproceedingsP.​​Philippe de Groote.​​​‌ Some observations about plurals​ in textual mathematics.​‌MCLP 2025 - International​​ Conference on Mathematical and​​​‌ Computational Linguistics for Proofs​Orsay, FranceSeptember 2025​‌HALback to text​​back to text

International​​​‌ peer-reviewed conferences

National peer-reviewed​​​‌ Conferences

  • 41 inproceedingsC.​Clémentine Bleuze, F.​‌Fanny Ducel, M.​​Maxime Amblard and K.​​​‌Karën Fort. "Nowadays,​ the focus is on​‌ results" : creation and​​ exploratory investigation of a​​​‌ corpus of claims from​ NLP articles..Actes​‌ des 32ème Conférence sur​​ le Traitement Automatique des​​​‌ Langues Naturelles (TALN)TALN​ 2025 - 32ème Conférence​‌ sur le Traitement Automatique​​ des Langues Naturelles1​​​‌Marseille, France2025HAL​back to text
  • 42​‌ inproceedingsM.Marie Cousin​​. Syntaxe en dépendance​​​‌ avec les grammaires catégorielles​ abstraites : une application​‌ à la théorie sens-texte​​.Actes des 32ème​​​‌ Conférence sur le Traitement​ Automatique des Langues Naturelles​‌ (TALN), volume 1 :​​ articles scientifiques originaux20e​​​‌ Conférence en Recherche d’Information​ et Applications (CORIA) 32ème​‌ Conférence sur le Traitement​​ Automatique des Langues Naturelles​​​‌ (TALN) 27ème Rencontre des​ Étudiants Chercheurs en Informatique​‌ pour le Traitement Automatique​​ des Langues (RECITAL) Les​​​‌ 18e Rencontres Jeunes Chercheurs​ en RI (RJCRI)Marseille,​‌ FranceATALA & ARIA​​2025, 715-728HAL​​​‌back to text
  • 43​ inproceedingsA.Amandine Decker​‌ and M.Maxime Amblard​​. L'essentiel est invisible​​​‌ pour les représentations sémantiques​.Actes de l'atelier​‌ Avancement de l’AMR et​​ de l’Analyse Sémantique 2025​​​‌ (4AS)20e Conférence en​ Recherche d’Information et Applications​‌ (CORIA) 32ème Conférence sur​​ le Traitement Automatique des​​​‌ Langues Naturelles (TALN) 27ème​ Rencontre des Étudiants Chercheurs​‌ en Informatique pour le​​ Traitement Automatique des Langues​​​‌ (RECITAL) Les 18e Rencontres​ Jeunes Chercheurs en RI​‌ (RJCRI)Marseille, FranceATALA​​ & ARIA2025,​​​‌ 1-8HALback to​ text
  • 44 inproceedingsF.​‌Fanny Ducel, J.​​Jeffrey André, A.​​​‌Aurélie Névéol and K.​Karën Fort. Introducing​‌ MascuLead: the First Gender​​ Bias Leaderboard.Actes​​​‌ de l’atelier Ethic and​ Alignment of (Large) Language​‌ Models 2025 (EALM)EALM​​ 2025 - Ethic and​​​‌ Alignment of (Large) Language​ ModelsMarseille, FranceJune​‌ 2025, 12-19HAL​​back to textback​​​‌ to text
  • 45 inproceedings​F.Fanny Ducel,​‌ N.Nicolas Hiebel,​​ O.Olivier Ferret,​​​‌ K.Karën Fort and​ A.Aurélie Névéol.​‌ "Women do not have​​ heart attacks !" Gender​​​‌ Biases in Automatically Generated​ Clinical Cases in French​‌.TALN 2025 -​​ Actes de la 32ème​​​‌ Conférence sur le Traitement​ Automatique des Langues Naturelles​‌32ème Conférence sur le​​ Traitement Automatique des Langues​​​‌ Naturelles (TALN 2025)2​Marseille, FranceJuly 2025​‌, 1HALback​​ to text
  • 46 inproceedings​​​‌A.Abdelhak Kelious,​ M.Mathieu Constant and​‌ C.Christophe Coeur.​​ Exploration de stratégies de​​ prédiction de la complexité​​​‌ lexicale en contexte multilingue‌ à l'aide de modèles‌​‌ de langage génératifs et​​ d'approches supervisées.Actes​​​‌ de l'atelier Évaluation des‌ modèles génératifs (LLM) et‌​‌ challenge 2025 (EvalLLM)20e​​ Conférence en Recherche d’Information​​​‌ et Applications (CORIA) 32ème‌ Conférence sur le Traitement‌​‌ Automatique des Langues Naturelles​​ (TALN) 27ème Rencontre des​​​‌ Étudiants Chercheurs en Informatique‌ pour le Traitement Automatique‌​‌ des Langues (RECITAL) Les​​ 18e Rencontres Jeunes Chercheurs​​​‌ en RI (RJCRI)CORIA-TALN2025‌Marseille, FranceATALA &‌​‌ ARIA2025, 202-203​​HAL
  • 47 inproceedingsV.​​​‌Vincent P. Martin,‌ K.Karën Fort and‌​‌ J.-A.Jean-Arthur Micoulaud-Franchi.​​ La trumplang, instrument de​​​‌ destruction de la pensée‌ : analyse de l'impact‌​‌ de la censure trumpiste​​ sur la recherche en​​​‌ santé mentale.Actes‌ de TALNTALN 2025‌​‌ - 32ème Conférence sur​​ le Traitement Automatique des​​​‌ Langues Naturelles1Marseille,‌ FranceJuly 2025,‌​‌ pages 478-488HALback​​ to text
  • 48 inproceedings​​​‌L.Laure Turcati,‌ A.Alice Millour,‌​‌ R.Renaud Debailly,​​ K.Karën Fort,​​​‌ A.Asma Steinhausser,‌ C.Corentin Biets and‌​‌ A.Anne Dozières.​​ Citizen Science in Practice:​​​‌ How (not) to Fail?‌Actes de l'atelier Science‌​‌ Participative pour les Données​​ et Corpus Linguistiques 2025​​​‌ (ParCol)20e Conférence en‌ Recherche d’Information et Applications‌​‌ (CORIA) 32ème Conférence sur​​ le Traitement Automatique des​​​‌ Langues Naturelles (TALN) 27ème‌ Rencontre des Étudiants Chercheurs‌​‌ en Informatique pour le​​ Traitement Automatique des Langues​​​‌ (RECITAL) Les 18e Rencontres‌ Jeunes Chercheurs en RI‌​‌ (RJCRI)Marseille, FranceATALA​​ & ARIA2025,​​​‌ 1-2HALback to‌ text

Conferences without proceedings‌​‌

  • 49 inproceedingsC.Colleen​​ Beaumard, V.Vincent​​​‌ P. Martin, C.‌Charles Brazier, Y.‌​‌Yaru Wu and J.-L.​​Jean-Luc Rouas. Détection​​​‌ de séquences de phonèmes‌ en parole spontanée pour‌​‌ la caractérisation de la​​ somnolence diurne excessive.​​​‌10e Journées de phonétique‌ clinique (JPC)Sète, France‌​‌June 2025HAL
  • 50​​ inproceedingsM.Marie Cousin​​​‌. Adding Communicative Structure‌ to the MTT into‌​‌ ACG Encoding.Congreso​​ Internacional sobre Estudios Teóricos​​​‌ y Aplicados de Léxico,‌ 2025 (CIETAL 2025)Madrid,‌​‌ SpainMay 2025HAL​​back to text
  • 51​​​‌ inproceedingsF.Fanny Ducel‌, K.Karën Fort‌​‌ and A.Aurélie Névéol​​. La linguistique appliquée​​​‌ pour une IA plus‌ éthique.NéALA 2025‌​‌ - Colloque sur Naturel​​ et Artificiel en Linguistique​​​‌ Appliquée : une époque‌ de paradoxesNancy, France‌​‌July 2025HALback​​ to text
  • 52 inproceedings​​​‌P.Philippe de Groote‌ and T.Timothée Bernard‌​‌. Worlds, events and​​ perspectives.CSSP 2025​​​‌ - 16ème Colloque de‌ Syntaxe et Sémantique de‌​‌ ParisParis, FranceNovember​​ 2025HALback to​​​‌ text
  • 53 inproceedingsV.‌ D.Valentin D. Richard‌​‌. Evaluating Chains Containing​​ an Interrogative Word in​​​‌ an Anaphorically Annotated Corpus‌.Journées scientifiques du‌​‌ réseau thématique LIFT2 -​​ linguistique informatique, formelle et​​​‌ de terrain (Lift2-2025)Paris,‌ FranceCNRSOctober 2025‌​‌HALback to text​​
  • 54 inproceedingsV. D.​​​‌Valentin D. Richard.‌ How to explain the‌​‌ divergence between normative discourses?​​​‌ The case of the​ French construction "verb (+​‌ preposition) + interrogative".​​Nouveaux regards sur la​​​‌ normeFribourg, Switzerland2025​, 75-77HALback​‌ to text
  • 55 inproceedings​​V. D.Valentin D.​​​‌ Richard. Raising Alternatives​ to Express Dependence: a​‌ compositional issue.The​​ 16th Syntax and Semantics​​​‌ Conference in Paris (CSSP​ 2025)Paris, FranceNovember​‌ 2025HALback to​​ text

Scientific book chapters​​​‌

Edition (books,​‌ proceedings, special issue of​​ a journal)

  • 57 proceedings​​​‌Proceedings of the 16th​ International Conference on Computational​‌ Semantics.International Conference​​ on Computational Semantics (IWCS)​​​‌Düsseldorf, GermanyAssociation for​ Computational Linguistics2025HAL​‌back to text

Doctoral​​ dissertations and habilitation theses​​​‌

  • 58 thesisS.Siyana​ Pavlova. Toward Scalable​‌ Semantic Annotation‎ : Bridging​​ Readability and a Wide​​​‌ Range of Phenomena into​ a Layered Meaning Representation​‌.Université de Lorraine​​June 2025HALback​​​‌ to text

Reports &​ preprints

Other scientific​‌ publications

Scientific​‌ popularization

  • 63 articleM.​​Maxime Amblard and N.​​​‌Nolwenn Le Jannic.​ Traitement automatique des langues​‌ : d’une lente progression​​ à des bouleversements fulgurants​​​‌.IntersticesJanuary 2025​HAL

11.3 Cited publications​‌

  • 64 miscATILF.​​ BEL-RL-fr.ORTOLANG (Open​​​‌ Resources and TOols for​ LANGuage) –www.ortolang.fr2025,​‌ URL: https://hdl.handle.net/11403/examples-ls-fr/back to​​ text
  • 65 miscATILF​​​‌. Réseau Lexical du​ Français (RL-fr).ORTOLANG​‌ (Open Resources and TOols​​ for LANGuage) –www.ortolang.fr2025​​​‌, URL: https://hdl.handle.net/11403/lexical-system-fr/back​ to text
  • 66 book​‌A.Ash Asudeh and​​ G.Gianluca Giorgolo.​​​‌ Enriched Meanings. Natural Language​ Semantics with Category Theory​‌.1Oxford Studies​​ in Semantics and Pragmatics​​13OxfordOxford University​​​‌ Press2020back to‌ text
  • 67 inproceedingsM.‌​‌Marie Cousin. Meaning-Text​​ Theory within Abstract Categorial​​​‌ Grammars: Towards Paraphrase and‌ Lexical Function Modeling for‌​‌ Text Generation.Proceedings​​ of the 15th International​​​‌ Conference on Computational Semantics‌ (IWCS)Nancy, FranceAssociation‌​‌ for Computational LinguisticsJune​​ 2023HALback to​​​‌ text
  • 68 inproceedingsM.‌Marie Cousin. Vers‌​‌ une implémentation de la​​ théorie sens-texte avec les​​​‌ grammaires catégorielles abstraites.‌Actes de CORIA-TALN 2023.‌​‌ Actes des 16e Rencontres​​ Jeunes Chercheurs en RI​​​‌ (RJCRI) et 25e Rencontre‌ des Étudiants Chercheurs en‌​‌ Informatique pour le Traitement​​ Automatique des Langues (RÉCITAL)​​​‌Paris, FranceATALAJune‌ 2023, 72-86HAL‌​‌back to text
  • 69​​ miscM.Mathilde Dargnat​​​‌. Les particules énonciatives‌.September 2024HAL‌​‌DOIback to text​​
  • 70 articleF.Fanny​​​‌ Ducel, A.Aurélie‌ Névéol and K.Karën‌​‌ Fort. ''You'll be​​ a nurse, my son!''​​​‌ Automatically Assessing Gender Biases‌ in Autoregressive Language Models‌​‌ in French and Italian​​.Language Resources and​​​‌ EvaluationOctober 2024,‌ 1495--1523HALDOIback‌​‌ to text
  • 71 inbook​​J. R.John Ruppert​​​‌ Firth. Studies in‌ Linguistic Analysis. Special volume‌​‌ of the Philological Society​​.Reprinted in: Palmer,​​​‌ F. R. (ed.) (1968).‌ Selected Papers of J.‌​‌ R. Firth 1952-59, pages​​ 168-205. Longmans, London.Oxford​​​‌Blackwell1957, A‌ Synopsis of Linguistic Theory,‌​‌ 1930-19551--32back to​​ text
  • 72 bookJ.​​​‌Jonatan Ginzburg. The‌ Interactive Stance.Oxford‌​‌Oxford University Press2012​​back to text
  • 73​​​‌ inproceedingsP.Philippe de‌ Groote. Deriving Formal‌​‌ Semantic Representations from~Dependency Structures​​.Logic and Engineering​​​‌ of Natural Language Semantics:‌ 19th International Conference, LENLS19,‌​‌ Tokyo, Japan, November 19--21,​​ 2022, Revised Selected Papers​​​‌Lecture Notes in Computer‌ Science14213Tokyo (JP),‌​‌ JapanSpringerNovember 2022​​, 157-172HALDOI​​​‌back to text
  • 74‌ inproceedingsP.Philippe de‌​‌ Groote. On the​​ semantics of dependencies: relative​​​‌ clauses and open clausal‌ complements - extended abstract‌​‌ -.Logic and​​ Engineering of Natural Language​​​‌ Semantics 20 (LENLS20)Osaka,‌ JapanNovember 2023HAL‌​‌back to text
  • 75​​ articleP.Philippe de​​​‌ Groote and S.Sylvain‌ Pogodalla. On the‌​‌ expressive power of Abstract​​ Categorial Grammars: Representing context-free​​​‌ formalisms.134‌http://www.springerlink.com/content/1572-9583/2004, 421--438‌​‌HALDOIback to​​ text
  • 76 inproceedingsP.​​​‌Philippe de Groote.‌ Towards a Montagovian account‌​‌ of dynamics.Proceedings​​ of the 16th Semantics​​​‌ and Linguistic Theory Conference‌ (SALT 16)2006DOI‌​‌back to text
  • 77​​ inproceedingsP.Philippe de​​​‌ Groote. Towards abstract‌ categorial grammars.Association‌​‌ for Computational Linguistics, 39th​​ Annual Meeting and 10th​​​‌ Conference of the European‌ ChapterColloque avec actes‌​‌ et comité de lecture.​​ internationale.Toulouse, FranceJuly​​​‌ 2001, 148--155HAL‌back to text
  • 78‌​‌ articleB.Bruno Guillaume​​ and G.Guy Perrier​​​‌. Interaction Grammars.‌72-42009,‌​‌ 171--208HALDOIback​​ to text
  • 79 article​​​‌Z. S.Zellig S.‌ Harris. Distributional Structure‌​‌.Word102-3​​​‌1954, 146-162DOI​back to text
  • 80​‌ inproceedingsP.Paola Herreño​​ Herreño Castañeda, J.​​​‌Jonathan Ginzburg and M.​Mathilde Dargnat. Discourse​‌ Markers for Topic Change​​.TrentoLogue: SemDial workshop​​​‌Università Di TrentoRoverto,​ ItalySEMDIALSeptember 2024​‌, 1-3HALback​​ to text
  • 81 inproceedings​​​‌K.Kris Heylen,​ Y.Yves Peirsman,​‌ D.Dirk Geeraerts and​​ D.Dirk Speelman.​​​‌ Modelling Word Similarity: an​ Evaluation of Automatic Synonymy​‌ Extraction Algorithms..Proceedings​​ of the Sixth International​​​‌ Conference on Language Resources​ and Evaluation (LREC'08)Marrakech,​‌ MoroccoEuropean Language Resources​​ Association (ELRA)May 2008​​​‌, URL: http://www.lrec-conf.org/proceedings/lrec2008/pdf/818_paper.pdfback​ to text
  • 82 inproceedings​‌ G.Ganesh Jawahar,​​ B.Benôit Sagot and​​​‌ D.Djamé Seddah.​ What does BERT learn​‌ about the structure of​​ language? ACL 2019 -​​​‌ 57th Annual Meeting of​ the Association for Computational​‌ Linguistics Florence, Italy July​​ 2019 HAL back to​​​‌ text
  • 83 unpublishedJ.​Jacques Jayez. (Innocent​‌ ?) Bias in argumentation.​​ The view from language​​​‌.January 2026,​ working paper or preprint​‌HALback to text​​
  • 84 inproceedingsJ.Jacques​​​‌ Jayez. Discourse markers​ are not special (but​‌ they can be complicated​​.Empirical Issues in​​​‌ Syntax and Semantics. Selected​ papers from CSSP 2023​‌Paris, France2025HAL​​back to textback​​​‌ to text
  • 85 inproceedings​V.Veronika Lux-Pogodalla and​‌ A.Alain Polguère.​​ Construction of a French​​​‌ Lexical Network: Methodological Issues​.Proceedings of the​‌ First International Workshop on​​ Lexical Resources, WoLeR 2011.​​​‌ An ESSLLI 2011 Workshop​Ljubljana, SloveniaAugust 2011​‌, 54--61URL: https://hal.inria.fr/hal-00686467​​back to text
  • 86​​​‌ articleC. D.Christopher​ D. Manning, K.​‌Kevin Clark, J.​​John Hewitt, U.​​​‌Urvashi Khandelwal and O.​Omer Levy. Emergent​‌ linguistic structure in artificial​​ neural networks trained by​​​‌ self-supervision.Proceedings of​ the National Academy of​‌ Sciences117482020​​, 30046--30054DOIback​​​‌ to text
  • 87 book​I.Igor Mel'čuk.​‌ Semantics: From Meaning to​​ Text.1Studies​​​‌ in Language Companion Series​129Amsterdam/PhiladelphiaJohn Benjamins​‌ Publishing Company2012back​​ to text
  • 88 article​​​‌S.Sebastian Padó and​ M.Mirella Lapata.​‌ Dependency-Based Construction of Semantic​​ Space Models.Computational​​​‌ Linguistics3322007​, 161--199URL: https://www.aclweb.org/anthology/J07-2002​‌DOIback to text​​
  • 89 inproceedingsM.Muntsa​​​‌ Padró, M.Marco​ Idiart, A.Aline​‌ Villavicencio and C.Carlos​​ Ramisch. Comparing Similarity​​​‌ Measures for Distributional Thesauri​.Proceedings of LREC​‌ 20142014, URL:​​ https://www.aclweb.org/anthology/L14-1496/back to text​​​‌
  • 90 inproceedingsY.Yves​ Peirsman, K.Kris​‌ Heylen and D.Dirk​​ Speelman. Finding semantically​​​‌ related words in Dutch:​ co-occurrences versus syntactic contexts​‌.Proceedings of the​​ 2007 Workshop on Contextual​​​‌ Information in Semantic Space​ Models: Beyond Words and​‌ Documents2007, 9-16​​URL: https://bibliotek.dk/eng/moreinfo/netarchive/870970-basis:28214510back to​​​‌ text
  • 91 inproceedingsG.​Guy Perrier. A​‌ French Interaction Grammar.​​RANLP 2007 - International​​​‌ Conference on Recent Advances​ in Natural Language Processing​‌IPP & BAS &​​ ACL-BulgariaBorovets, BulgariaINCOMA​​ Ltd, Shoumen, BulgariaSeptember​​​‌ 2007, 463--467HAL‌back to text
  • 92‌​‌ articleT.Thom Scott-Phillips​​ and C.Christophe Heintz​​​‌. Great ape interaction:‌ Ladyginian but not Gricean‌​‌.120422023​​DOIback to text​​​‌
  • 93 inproceedingsA.Ashish‌ Vaswani, N.Noam‌​‌ Shazeer, N.Niki​​ Parmar, J.Jakob​​​‌ Uszkoreit, L.Llion‌ Jones, A. N.‌​‌Aidan N. Gomez,​​ \.\L{}ukasz Kaiser and​​​‌ I.Illia Polosukhin.‌ Attention is All You‌​‌ Need.Proceedings of​​ the 31st International Conference​​​‌ on Neural Information Processing‌ SystemsNIPS'17Red Hook,‌​‌ NY, USALong Beach,​​ California, USACurran Associates​​​‌ Inc.2017, 6000–6010‌URL: https://dl.acm.org/doi/pdf/10.5555/3295222.3295349back to‌​‌ text
  • 94 inproceedingsJ.​​Julie Weeds, D.​​​‌David Weir and D.‌Diana McCarthy. Characterising‌​‌ Measures of Lexical Distributional​​ Similarity.COLING 2004:​​​‌ Proceedings of the 20th‌ International Conference on Computational‌​‌ LinguisticsGeneva, SwitzerlandCOLING​​2004, 1015--1021URL:​​​‌ https://www.aclweb.org/anthology/C04-1146back to text‌
  • 95 inproceedingsY.Yuqing‌​‌ Yang, Q.Qipeng​​ Guo, X.Xiangkun​​​‌ Hu, Y.Yue‌ Zhang, X.Xipeng‌​‌ Qiu and Z.Zheng​​ Zhang. An AMR-based​​​‌ Link Prediction Approach for‌ Document-level Event Argument Extraction‌​‌.Proceedings of the​​ 61st Annual Meeting of​​​‌ the Association for Computational‌ Linguistics (Volume 1: Long‌​‌ Papers)Toronto, CanadaAssociation​​ for Computational LinguisticsJuly​​​‌ 2023, 12876--12889URL:‌ https://aclanthology.org/2023.acl-long.720/DOIback to‌​‌ text